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HUMAN DNA SEQUENCES 
Background of the Invention 

Current methods for testing pharmacological substances rely on a three-stage testing 
approach to drug development. First, candidate compounds are typically screened in some 
sort of in vitro system, like inhibition of cancer cell growth. Candidates are then tested in 
an animal model, as a first approximation of systemic effects, including efficacy and 
toxicity. Compounds that still show promise after these initial in vivo screens, finally are 
tested in humans. Again, human testing typically occurs in three phases: toxicity; 
preliminary efficacy; and efficacy. The entire process can take more than a decade and cost 
hundreds of millions of dollars. Aside from the monetary costs and protracted time scale, 
moreover, current testing regimes waste the lives of countless laboratory animals and 
needlessly endanger the lives of human subjects. 

A need exists, therefore, for more sophisticated drug screening techniques that can 
be done rapidly in vitro. These screening techniques ideally will be reflective of systemic 
and/or organ-specific responses, so that they provide a reliable indicator of action in a 
human body. Current techniques, however, tend to utilize only a single or limited number 
of markers, thus answering only very simple questions that are of questionable medical 
import. For example, a typical in vitro assay may ask whether a lead compound binds a 
particular receptor, which has been implicated in a certain disorder. It is presumed that 
such binding is indicative of therapeutic usefulness, but it does not even purport to address 
systemic effects. 

Not only are screening techniques for efficacy inadequate, the available toxicity 
screens likewise are inadequate. Toxicity, on a first level, is usually measured by animal 
testing. Aside from the complications related to in vivo versus in vitro testing, such screens 
are insufficient because of differences in metabolism, uptake, etc., relative to humans. 
Thus, improved methods would be not only be in vifro-based, they would also be more 
"human. " 

With the increasing miniaturization of screening assays and the growing availability 
of targets for pharmaceutical intervention, there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed simultaneously. If such an 
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array contains a large enough population of targets, it can be used to essentially mimic the 
systemic response. In other words, the array becomes an in vitro surrogate for the human 
body. The more refined the array, the more accurate the predictive capability. In theory, 
an array could be constructed that can detect all of the known human expression products 
simultaneously, thereby, providing a very reliable indicator of the human response to a 
given compound. These arrays offer advantages over the present in vitro screening systems 
in that they can assay large numbers of responses simultaneously. They are superior to 
animal testing because they are more "human" and, thus, more predictive of human 
responses. 

In order to construct such arrays, however, the field is in need of further human 
targets. Advantageously, such targets will be provided with additional physiologically 
relevant information, such as whether the target is expressed in a particular tissue and 
whether it is related to a known functional class of targets. In this way, the artisan can 
focus as needed, for example, on tissue-specific effects or target class-specific effects, 
thereby providing information useful in evaluating efficacy and/or toxicity. 

In addition to a need for pharmacological screening targets, there is a need for 
further pharmacological substances. These substances can be used in the formulation of 
medicinal compositions and in treating a wide variety of disorders. 

The present invention responds to the aforementioned and other needs in the field by 
providing a population of novel targets useful, inter alia, in the profiling and medicinal 
contexts described above. 

Summary of the Invention 

It is an object of the invention, therefore, to provide a set of human cDNA clones. 
Further to this object, the invention provides sequences of human cDNA clones that were 
isolated from libraries generated from different human tissues. 

It is another object of the invention to provide assemblages of targets useful in 
profiling matrices for screening pharmacological test compounds. According to this object, 
assemblages comprising different populations of human nucleic acids, proteins and 
antibodies are provided. In different embodiments, cDNA library-specific assemblages and 
target-family-specific targets are provided. 
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It is a further object of the invention to provide a database of human nucleotide and 
protein sequences. Further to this object, novel human nucleotide and protein sequences 
are provided in electronic form. In one embodiment, one or more of these sequences is 
provided in a searchable database. 

It is still another object of the invention to provide biologically active target 
molecules useful in treating or detecting human disorders. Further to this object, the 
invention provides nucleic acid and protein molecules that have the capacity to affect 
disease etiology or symptoms or correlate with known disease states. Also further to this 
object, a database is provided which comprises the disclosed molecules in electronic form. 

It is still a further object of the invention to provide polypeptides encoded by the 
human cDNA clones disclosed herein. Further to this object, the invention provides 
antibodies and fragments thereof that are capable of binding to a specific portion of these 
polypeptides. 

It is yet another object of the invention to provide pharmaceutical compositions which 
comprise an effective amount of a pharmaceutical agent, wherein the pharmaceutical agent is 
selected from the group consisting of one or more polypeptides contemplated by the invention, 
variants or functional derivatives thereof, and antibodies thereto; and a physiologically 
acceptable carrier or excipient. 

It is still another object of the invention to provide expression vectors comprising one 
or more human cDNA clones disclosed herein or fragments thereof; and optionally a 
promoter operably linked to the cDNA clone or fragment thereof . Further to this object, the 
invention provides methodology for recombinantly producing a desired peptide, comprising 
expressing in a host cell a peptide encoded by a human cDNA clone disclosed herein. 

Detailed Description 

The invention results from a need in the art for new human nucleic acids and proteins. 
This need arises in several contexts. First, there is a need to identify targets for therapeutic 
intervention. Second, there is a need to identify molecules that may be adversely affected in a 
therapeutic context, thereby resulting in toxicity. Knowledge of these molecules will aid in 
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the design of new medicaments with enhanced efficacy and decreased toxicity. Finally, the 
need encompasses human nucleic acids and proteins that have medicinal applicability in their 
own right. 

In view of these needs, the present inventors set out to isolate and sequence human 
cDNAs from tissue-specific libraries. In this way, they represent subsets of molecules likely 
to be targets for therapeutic intervention or for avoiding toxicity. In addition, the inventors 
divided the molecules into various sub-categories, based on suspected functionality, structural 
similarity etc, which are of interest from a pharmacological perspective. These molecules are 
disclosed in provisional application serial nos. 60/149,499 and 60/156,503, filed August 18, 
1999, and September 28, 1999, respectively, both of which are hereby incorporated by 
reference in their entirety. 

GENERAL DESCRIPTION OF THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules that, in some 
instances, have similarities with known molecules. The inventive DNAs were cloned from 
five different human cDNA libraries. In addition to these DNA molecules, the invention 
provides their protein translations and antibodies derived from them. The inventive DNA and 
protein sequences are show individually, below. The inventive nucleic acids also include the 
complements of these DNA sequences, as well as their RNA counterparts. Methods of 
producing the molecules also are provided. Further, the invention provides methods for 
detecting all or part of the molecules and of detecting polynucleotides encoding all or part of 
the molecules. 

The inventive molecules derive from five cDNA libraries: human fetal brain; human 
fetal kidney; human mammary carcinoma; human testis; and human uterus. For convenience, 
each sequence bears a designation that indicates from which library it is derived. In 
particular, these designations are: "hfpbr" for human fetal brain; "hfkd" for human fetal 
kidney; "hmcf" for human mammary carcinoma; "htes" for human testis; and "hute" for 
human uterus. The individual libraries were constructed and screened as described below in 
the examples. 

The protein and DNA molecules of the invention are variously described herein as 
"target" molecules or "inventive" molecules. The sequences and other information pertinent 
to the nucleic acid and protein molecules of the invention are shown, below. 
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Interpreting the data disclosed with the Table and cDNA sequences, below: 

The table and data below provide the coding sequences of the inventive cDNAs as 
well as the protein sequences and other useful information, as set out below. 

Grouping 

The clones were assigned to the following fourteen functional and/or tissue-derived 

groups: 

1. Cell Cycle 

2. Cell Structure and Motility 

3. Differentiation/Development 

4. Intracellular Transport and Trafficking 

5. Metabolism 

6. Nucleic Acid Management 

7. Signal Transduction 

8. Transmembrane Protein 

9. Transcription Factors 

10. Brain derived 

1 1 . Kidney derived 

12. Mammary Carcinoma derived 

13. Testes derived 

14. Uterus derived 

Description of Clone Files 

The individual clone files are structured in the same pattern. The Sections are 
separated by paragraphs. 

1. Clone Name 

The clone names are deciphered with reference to the following example: 
DKFZphfkd2_24e23, wherein the code represents: 

• producer of library ("DKFZ") (for convenience, this reference may be 
eliminated) 

• a "p" for "plasmid cDNA library" (for convenience, this reference may be 
eliminated) 

• library name (e.g. hfbr — human fetal brain; hfkd = human fetal kidney; hmcf = 
human mammary carcinoma; htes = human testes; hute = human uterus) 

• an underscore ("_") to separate library information from plate information 

• plate number (e.g. " 1 6") 

• plate coordinates (letter first; e.g. "fl4") 

2. Group 

5 
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3. Introduction 

short review of the similarities, function of the protein and possible applications 

4. Short Information 

specifications about the cDNA (who sequenced, completeness of the cDNA, similarity, who 
sequenced, chromosomal localisation, length of cDNA, localisation of poly A tail and 
polyadenylation signal) 

5. cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public databases 

7. Medline Entries 

information about genes/proteins similar to the novel cDNA (if available) 

8. Putative Encoded Protein Information 

specifications about the encoded protein (ORF: length and localisation of the reading frame) 

9. Protein Sequence 

10. BLASTp Results 

search results of blasting the protein sequence against all public databases 

11. Pedant Information 

output of fully automated annotation: summarises peptide information, homologies, patterns 
as follows: 

[Length] 

- length of the protein = number of amino acid residues 

[MW] 
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- molecular weight of the protein 

[Pi] 

- isoelectric point 
[HOMOL] 

- shows protein with closest similarity to the cDNA-encoded protein 
[FUNCAT] 

- functional information according to a catalogue developed by Munich 
Information center for Protein Sequences (MIPS) 

[BLOCKS] 

- Blocks are multiply aligned ungapped segments corresponding to the most 
highly conserved regions of proteins. The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved regions in groups of proteins 
documented in the Prosite Database. The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern may or may not be 
contained in one of the blocks representing a group. These blocks are then calibrated 
against the SWISS-PROT database to obtain a measure of the chance distribution of 
matches. It is these calibrated blocks that make up the Blocks Database. The WWW 
versions of the Prosite and SWISS-PROT Databases that are used on this server are 
located at the ExPASy World Wide Web (WWW) Molecular Biology Server of the 
Geneva University Hospital and the University of Geneva. World Wide Web URL 
http://blocks.fhcrc.org/blocks/about_blocks.html/ is the entry point to the database. 

- here Blocks segments found in the analysed protein sequences are displayed 
[SCOP] 

Nearly all proteins have structural similarities with other proteins and, in some 
of these cases, share a common evolutionary origin. The scop database provides a 
detailed and comprehensive description of the structural and evolutionary 
relationships between all proteins whose structure is known, including all entries in 
Brookhaven National Laboratory's Protein Data Bank (PDB). It is available as a set of 
tightly linked hypertext documents which make the large database comprehensible 
and accessible. In addition, the hypertext pages offer a panoply of representations of 
proteins, including links to PDB entries, sequences, references, images and interactive 
display systems. World Wide Web URL http://scop.mrc-lmb.cam.ac.uk/scop/ is the 
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entry point to the database. Existing automatic sequence and structure comparison 
tools cannot identify all structural and evolutionary relationships between proteins. 
The scop classification of proteins has been constructed manually by visual inspection 
and comparison of structures, but with the assistance of tools to make the task 
manageable and help provide generality. Proteins are classified to reflect both 
structural and evolutionary relatedness. Many levels exist in the hierarchy, but the 
principal levels are family, superfamily and fold. The exact position of boundaries 
between these levels are to some degree subjective. Scop evolutionary classification is 
generally conservative: where any doubt about relatedness exists, we made new 
divisions at the family and superfamily levels. 

- - here SCOPE segments found in the analysed protein sequences are 
displayed 

[EC] 

ENZYME is a repository of information relative to the nomenclature of 
enzymes. It is primarily based on the recommendations of the Nomenclature 
Committee of the International Union of Biochemistry and Molecular Biology 
(IUBMB) and it describes each type of characterized enzyme for which an EC 
(Enzyme Commission) number has been provided. World Wide Web URL 
http://www.expasy.ch/enzyme/ is the entry point to the database. 

- here EC-number and name of enzymes with similarity to the analysed protein 
sequences are displayed 

[PIRKW] 

- functional information according to the Protein Information Resource (PIR) 
database catalogue developed by Munich Information Center for Protein Sequences 
(MIPS), the National Biomedical Research Foundation (NBRF) and the International 
Protein Information Database in Japan (JIPID). 

[SUPFAM] 

- information according to the Protein Information Resource (PIR) database 
catalogue of protein superfamilies developed by Munich Information Center for 
Protein Sequences (MIPS), the National Biomedical Research Foundation (NBRF) 
and the International Protein Information Database in Japan (JIPID). 
[PROSITE] 
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please refer to 12. PROSITE Motifs 
[PFAM] 

please refer to 13. PFAM Motifs 

[KW] 

- overall 2dimensional folding information 

- 3D indicates that the proteins is similar to a protein of which a 3 dimensional 
structure is known 

- overall structural information 

[] 

The last PEDANT-block depicts information about the folding structure of the 
protein generated by PREDATOR. PREDATOR is a secondary structure prediction 
program. It takes as input a single protein sequence to be predicted and can optimally 
use a set of unaligned sequences as additional information to predict the query 
sequence. The mean prediction accuracy of PREDATOR is 68% for a single sequence 
and 75% for a set of related sequences. PREDATOR does not use multiple sequence 
alignment. Instead, it relies on careful pairwise local alignments of the sequences in 
the set with the query sequence to be predicted. 

World Wide Web URL http://www.embl- 
heidelberg.de/argos/predator/predator_info.html is the entry point to the database. 

- H = helix, E = extended or sheet, = coil, T = transmembrane, B = beta 

- x indicates a low-complexity region with repeat-like structure which is 
omitted in all BLAST searches 

12. PROSITE Motifs 

PROSITE is a database of protein families and domains. It consists of biologically significant 
sites, patterns and profiles that help to reliably identify to which known protein family (if 
any) a new sequence belongs. World Wide Web URL http://www.expasy.ch/prosite/ is the 
entry point to the database. A description of the prosite consensus patterns is also provided, 
below. 

13. PFAM Motifs 

PFAM (protein families) is a large collection of multiple sequence alignments and hidden 
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Markov models covering many common protein domains. World Wide Web URL 
http://www.sanger.ac.uk/Pfam/ is the entry point to the database. 



Deposit of Clones 

Clones were deposited as a pool with the American Type Culture Collection under 



polynucleotide is obtainable. Each clone has been transfected into separate bacterial cells (E. 
coli) in this composite deposit. 

The clones may also be obtained from the Resource Center of the German Human 
Genome Project (Heubner Weg 6, 14059 Berlin, GERMANY). The Resource Center library 
numbers are slightly different that those presented here, but may be readily obtained by the 
following key or with the assistance of Resource Center personnel. 

The library name becomes a number: brain (hfbr2) becomes 564; kidney (hfkd2) 
becomes 566; mammary carcinoma (hmcfl) becomes 727; testis (htes3) becomes 434;and 
uterus (hutel ) becomes 586. Next, the plate number is converted to two digits (e.g., "2" 
becomes "02") and is moved behind the plate coordinate, and the underscore is dropped. The 
following examples are helpful: 

Listed Number Resource Center Number 



The libraries were constructed using two commercially available vectors. The brain 
(hfbr2 designations) and kidney (hfkd2 designations) libraries utilize pAMP 1 from Life 
Technologies and are maintained in XL-2Blue (Strategene); the uterus (hutel), testes (htes3) 
and mammary carcinoma (hmcfl) libraries are constructed in pSPORTl, also from Life 
Technologies, and are maintained in DH10B (LifeTechnologies). In addition to the following 
techniques, consultation with the commercial literature available on these clones will make 
evident all of the housekeeping techniques needed to propagate and isolate the individual 
constructs. All inserts may be excised with a Notl/Sall digestion. Alternatively, universal 
primers, flanking the cloning region, may be used to amplify the inserts using PCR methods. 



accession number 



, from which each clone comprising a particular 



DKFZphfbr2_16f21 

DKFZphfkd2_lj9 

DKFZphmcfl_lc23 

DKFZphtes3_14g5 

DKFZphutel_17k7 



DKFZp564F2116 

DKFZp566J091 

DKFZp727C231 

DKFZp434G0514 

DKFZp586K0717 
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Bacterial cells containing a particular clone can be obtained from the composite 
deposit as follows: 

An oligonucleotide probe or probes should be designed to the sequence that is known 
for that particular clone. This sequence can be derived from the sequences provided herein, 
or from a combination of those sequences. Methods of probe design are presented below. 

Oligonucleotide probes may be labeled with y- 32 P ATP (specific activity 6000 
Ci/mmole) and T4 polynucleotide kinase using commonly employed techniques for labeling 
oligonucleotides. Other, non-radioactive labeling techniques can also be used. 
Unincorporated label typically is removed by gel filtration chromatography or other 
established methods. The amount of radioactivity incorporated into the probe can be 
quantified by measurement in a scintillation counter. Preferably, specific activity of the 
resulting probe generally should be approximately 4X10 6 dmp/pmole. 

The bacterial culture containing the pool of full-length clones should preferably be 
thawed and 100 ul of the stock used to inoculate a sterile culture flask containing 25 ml of 
sterile L-broth containing ampicillin at 50 - 1 00 fig/ml (for XL-2Blue strains 25 ug/ml 
tetracycline should also be used). The culture should preferably be grown to saturation at 
37°C, and the saturated culture should preferably be diluted in fresh L-broth. Aliquots of 
these dilutions should preferably be plated to determine the dilution and volume which will 
yield approximately 5000 distinct and well-separated colonies on solid bacteriological media 
containing L-broth containing ampicillin at 1 00 ug/ml (for XL-2Blue strains 25 ug/ml 
tetracycline should also be used)and agar at 1.5% in a 150 mm petri dish when grown 
overnight at 37°C. Other known methods of obtaining distinct, well-separated colonies can 
also be employed. 

Standard colony hybridization procedures should then be used to transfer the colonies 
to nitrocellulose filters and lyse, denature and bake them. The filter is then preferably 
incubated at 65°C. for 1 hour with gentle agitation in 6 x SSC (20 x stock is 175.3 g 
NaCl/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) containing 0.5% SDS, 100 
ug/ml of yeast RNA, and 10 mM EDTA (approximately 10 mL per 150 mm filter). 
Preferably, the probe is then added to the hybridization mix at a concentration greater than or 
equal to 1X10 6 dpm/mL. The filter is then preferably incubated at 65°C. with gentle agitation 
overnight. The filter is then preferably washed in 500 mL of 2 x SSC/0.5% SDS at room 
temperature without agitation, preferably followed by 500 mL of 2 x SSC/0.1% SDS at room 
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temperature with gentle shaking for 15 minutes. A third wash with 0.1 x SSC/0.5% SDS at 
65°C. for 30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to 
autoradiography for sufficient time to visualize the positives on the X-ray film. Other known 
hybridization methods can also be employed. 

The positive colonies are picked, grown in culture, and plasmid DNA isolated using 
standard procedures. The clones can then be verified by restriction analysis, hybridization 
analysis, or DNA sequencing. 

Alternatively, clones may be grown as described above, and PCR used to isolate the 
insert DNAs. Methods of PCR are described below and are otherwise well known . 

ERROR SCREENING 

The DNA sequences found herein derive from individual clones, which are publicly 
available, as noted above. Thus, the skilled artisan will recognize that any specific sequence 
disclosed herein readily can be screened for errors by resequencing a particular fragment, in 
both directions (i.e., by sequencing both strands). Alternatively, error screening can be 
performed by amplifying and/or cloning any of the inventive DNAs, using for example RT- 
PCR, and sequencing the resulting amplified product. In the event that there is a sequencing 
error, reference should be made to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible to a wide variety of uses, 
based on functional and/or structural properties. The skilled worker will appreciate, based on 
the biological activities detailed below, and discussed with regard to the individual sequences 
disclosed below, that the inventive molecules will find usefulness in numerous therapeutic and 
diagnostic applications. 

The DNA molecules, especially the potassium salts thereof, can be used as fertilizer 
supplements due to their high nitrogen and phosphorus contents. Since the DNAs are of 
defined length, they are also useful in gel electrophoresis as molecular weight markers. Due 
to their similarity with known molecules, certain of the DNA molecules and their variants and 
derivatives may be used in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins. 
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The proteins themselves have many possible uses. They may be used as a nutritional 
supplement for humans, animals and even for laboratory use as, for example, medium for 
bacterial cultures. Moreover, since the proteins are of defined, known sizes, they may be 
used as molecular weight markers for gel electrophoresis and gel filtration. Because they are 
of defined sequences, they also have use in microsequencing and protein fingerprinting 
applications. 

Expression Profiling Applications 

Given their known tissue expression and functional associations, assemblages of the 
inventive proteins (or corresponding antibodies) and nucleic acids are particularly suited to 
expression profiling applications. Expression profiling generally entails constructing an array 
of indicators that signal the presence of a particular RNA or protein expression product. Such 
arrays can be used to evaluate, for example, pharmacological effectiveness and toxicity. In 
particular, expression profiles from such arrays can be generated from cells treated with 
known compounds, having known properties, and these profiles can be compared to profiles 
of unknowns to evaluate similarities and differences, which can be correlated with efficacy or 
toxicity. 

Additional uses of profiling include diagnosis, tracking development, and ascertaining 
signaling and metabolic pathways. For examples of references describing profiling and its 
uses, see Farr et al., U.S. Patent 5,811,231 (1998); Seilhamer # al, U.S. Patent 5,840,484 

(1998) ; Rine et al, U.S. Patent No. 5,777,888 (1998); WO 97/27317; WO 99/05323; WO 
99/09218; and WO 99/14369. For a device for implementing such techniques, see Lipshutz 
et al, U.S. Patent No. 5,856,174 (1999) and Anderson et al., U.S. Patent No. 5,922,591 

(1999) . 

In one embodiment, a subset of the inventive DNAs will be arrayed on a substrate, 
like a gene chip, a filter or a 96- well plate. Test samples containing cells are maintained in 
the presence of a label capable of incorporation into nascent mRNA. Samples are treated with 
test and control compounds, which will induce mRNA expression in the sample, resulting in 
incorporation of label. Whole mRNA is isolated and applied to the array such that it 
hybridizes with the DNAs contained therein. After washing, the amount of hybridization is 
quantified and a profile is generated. These steps are repeated with various control and test 
compounds, thereby generating a library of profiles, which can be used to ascertain the 
relationships relevant to pharmacological efficacy or toxicity. 
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The matrices used in such profiling, however, need not be limited to those utilizing 
DNAs. Rather, other nucleic acids, like RNAs and protein nucleic acids (PNAs), as well as 
the inventive proteins and antibodies corresponding to the inventive proteins may also be 
employed. Hence, for example, antibodies could form the array and the samples could be 
treated in order to label nascent proteins. Whole proteins then would be isolated and applied 
to the antibody matrix. Developing the resulting signal would result in a protein expression 
profile, which is useful in essentially the same manner as the nucleic acid profile. A protein 
matrix could be used, for example, in evaluating antibody responses to pharmaceutical agents 
in order to eliminate possible cross-reactivity. 

Moreover, where nucleic acids are used in the matrix, it is often beneficial to use 
variants (as defined below) of the molecules described herein. This can be used to account 
for genetic variations that are of little or no consequence to the function of the resultant gene 
product. Hence, they can account for wobble or conservative amino acid variations that do 
not perturb function, like variations in some of the protein motifs elucidated below. Thus, 
each position in the matrix can employ multiple nucleic acid probes that account for a series 
of variants. 

Expression profiling may also be done, in another embodiment, using two- 
dimensional protein gels in which the inventive proteins are detected. The resultant profiles 
can be used in the same way as described. 

Matrices useful for profiling may be constructed based on different criteria. Of 

course, the more relevant profiles will take into account expression of most human genes, 

preferably all of them. In certain situations, however, it is advantageous to look at a smaller 

subset. For example, if one were concerned about fetal neural toxicity, a fetal brain-specific 

matrix might be chosen. On the other hand, if one were interested in targeting mammary 

carcinoma tissue, a corresponding matrix could be used. Thus, matrices may be constructed 

using all of the sequences available from a tissue-specific library. 

* * * 

The following discussion relates to some of the various functional and structural 
groupings that would be of interest to the artisan wishing to construct profiling matrices. 
Of course, the artisan will also recognized that these functional descriptions may find 
additional applicability in the therapeutic and diagnostic applications discussed below. 
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Cell Cycle 

A proliferating cell must coordinate replication and chromosomal separation to ensure 
that the genome is replicated completely, and that a single copy, is correctly inherited by each 
daughter cell. The cell cycle is the coordinated series of events that achieves these aims. 
Many of the key events are initiated by a family of conserved Seiren/threonine protein 
kinases, the cyclin-dependent kinases (CDKs), that are activated by the cyclin family of 
proteins (cyclins A-H). In turn, the cyclin-CDK complexes are modulated by other protein 
kinases or phosphatases, and by binding specific inhibitor proteins. The enormous variety of 
ways in which CDK activity can be regulated allows the cell to respond to internal signals 
generated by preceding events in the cell cycle and to external growth signals. 

The somatic cell cycle is divided into four phases: DNA replication (S phase) and 
chromosome separation (M phase) are separated by gap phases (Gl and G2). At specific 
control points the decision to begin the next stage (DNA synthesis or mitosis) is carefully 
regulated. 

Cdc2, the primary kinase, is especially required for the G 1 -S transition and S phase. 
Cdc4 and Cdc6 are involved at the restriction point, where the cell can decide to proliferate or 
arrest (G1<->G0) and Cdc7 is a CDK activating kinase (CAK) as well as a subunit of TFIIH. 

The Cyclin-CDK complexes are regulated in various ways. One is through 
phosphorylation by CDK activating kinases (CAK), like the Y15 kinase (Weel) and 
dephosphorylation by CDK associated phosphatases (CAP), like Cdc25A a member of the 
Cdc25 family (Cdc25A, B and C). 

An other way of regulation occurs through two classes of CDK inhibitors (CKI), the 
INK4 proteins pl5, pl6, pi 8, and pl9, who negatively regulates the cyclin D CDK 
complexes and second the p21 family with p21, p27, and p57. 

The cell cycle is also regulated through ubiquitin-mediated proteolysis involving the 
destruction of both cyclins and CDK inhibitors by the 26S proteasome, that requires an 
ubiquitin conjugating enzyme (UBC) and an ubiquitin ligase. The instability is conferred by 
PEST regions (cyclin D and E) or a ten amino acid region in the amino terminus (degradation 
box) in the A- and B-type cyclins. 
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All these modifications play an important role for the cellular localization, because 
only the nuclear CDK-cyclin complexes are functional for cell cycle. During Gl phase of the 
cell cycle, cyclines A, E and D are synthesized and bind to their cyclin-dependent kinase 
(CDK) partners. CDK complexes containing cyclins A, E and Dl are then imported into and 
concentrated within nuclei. Cdk6- cyclin D3 has been localized to both cytoplasmic and 
nuclear compartments, although only the nuclear complex is active. As cells enter S phase, 
cyclin A and cyclin E complexes remain within the nucleus, whereas cyclin Dl relocalizes to 
the cytoplasm for proteolysis at the onset of S phase. Like Cdk2-cyclin A, Cdc2-cyclin A is 
nuclear and remains so until it is degraded during mitosis. By contrast, as a result of ongoing 
nuclear import and more rapid re-export, cyclin Bl, which binds to Cdc2 upon synthesis 
during S phase, is predominantly cytoplasmic. Cdc2-cyclin B2 is also cytoplasmic, although 
this might occur through anchoring of the complex to some cytoplasmic constituent. At 
prophase, phosphorylation of cyclin Bl promotes accumulation of Cdc2-cyclin Bl in the 
nucleus, whereas cyclin B2 remains in the cytoplasm until nuclear envelope breakdown. 

Two crucial regulators of Cdc2-cyclin B-Weel and Cdc25C exist and are responsible 
for the G2 to M control point. Weel is a nuclear protein throughout the cell cycle, whereas 
Cdc25C binds to 14-3-3 proteins during interphase and remains predominantly cytoplasmic. 
In some systems Cdc25C, like cyclin Bl, rushes precipitously into the nucleus just before 
entry into mitosis. 

The 1 1 0-kDa retinoblastoma (tumor suppressor) protein (RB), a pRB-family member 
is an important regulator of cell-cycle progression and differentiation. Like the E2F family 
(E2F1-5) or DP family (DP1-3) of transcription activators, RB suppresses inappropriate 
proliferation by arresting cells in Gl by repressing the transcription of genes required for the 
transition into S phase. Before the cell proceeds into S phase, RB becomes phosphorylated at 
multiple sites by the cyclin dependent protein kinases (CDKs) and loses its transcriptional 
repressing activity. Phosphorylation of RB during late Gl phase results in the dissociation of 
the E2F-RB repressor complex which allows S-phase specific genes to be transcribed. Cyclin 
E is the evolutionary conserved target for E2F and interacts together with CDC2 in late Gl. 

For a proliferating cell it is vital that only undamaged DNA is replicated because if 
DNA damage is substantial, its replication can lead to chromosome loss or rearrangement. 
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Thus, we find a G1<->S checkpoint in late Gl that requires tumor suppressor p53. A p53- 
dependent Gl arrest is effected by the cyclin dependent kinase inhibitor p21 through higher 
expression levels that inhibits almost all cyclin CDK complexes. 

The kinase responsible for phosphorylating the unidentified kinetochore component 
in metaphase may be a member of the MAP kinase family and appears to be the proto 
oncogene c-MOS, a cytostatic factor (CSF) in meiosis. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Cell cycle"and include, among others, the following: 

Tumor suppressors (e.g. N33) : Tumour- suppressor genes are known to be involved in 
the control of cell growth and division, interacting with proteins which control the cell cycle. 
The N33 gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. The N33 gene has been reported by OMIN OMIN 
(Online Mendelian Inheritance in Man at http://www.ncbi.nlm.nih.gov/htbin-post/Omin) to 
be associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) prostate cancer suppression (OMIN *601 385). Clones in this category 
include: fbr2_2kl4. 

C-TAK1 Cdc25c associated protein kinase : Cdc25C is a protein kinase that controls 
entry into mitosis by dephosphorylation of Cdc2. Cdc25C function is regulated by 
phosphorylation, too. Serine 216 phosphorylation of Cdc25C mediates the binding of 14-3-3 
protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein kinase) phosphorylates 
Cdc25C on serine 216 in vitro. Alterations in the gene coding for the above protein kinase 
has been reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) with Pancreatic cancer (OMIN *60278). Clones in this category 
include: tes3_7j3. 

Cell structure and motility 

One of the major differences between prokaryotes and eukaryotes is the ability of the 
eukaryotic cell to adopt very different shapes dependent on its function during the 
differentiation process. Animal cells vary from being round to extended cylindric forms like 
motorneurons or muscle cells. In humans, more than 100 different cell types can be 
distinguished, each having a characteristic shape. The form of a cell often is closely related to 
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its capacity to move. Some completely differentiated cells like fibroblasts can still change 
their form actively, thereby migrating. Other cell types serve as motor elements - 
"macroscopically" like muscle cells or "microscopically" like ciliated epithelia. Such tasks 
are fulfilled by a big class of proteins; on the one hand responsible for maintenance of cell 
structure and contacting neighbor cells or the intercellular matrix and on the other hand for 
cell motility. These topics cannot be regarded separately: The motility apparatus e.g. must be 
fixed in the cytoskeleton. Three different types of filaments can be distinguished: Actin 
filaments, tubulin filaments and intermediate filaments, each present in almost all types of 
cells. 

Actin filaments (F-actin) are built up of monomers (G-Actin). In muscle cells, actin, 
myosin, for both of which several paralogous genes are known, as well as many more 
proteins are constituents of the contractile apparatus. 

The "thin" and "thick filaments" in a muscle cell consist mainly of actin and myosin, 
respectively. 

Several different proteins are responsible for the anchoring of the actin filaments in 
the Z-disks (e.g. alpha-actinin and desmin) or at the end of the myofibers in the cell 
membrane. 

Troponin I, -C, -T and Tropomyosin - associated with actin - confer the Ca++- 
dependent triggering of contraction. 

Length of the sarcomere is controlled by the giant protein titin. 

In smooth muscle, there is no troponin. Contraction activity is controlled by 
phosphorylation / dephosphorylation of myosin by a specialized kinase instead. Contractile 
fibers are not organized in sarcomeres. 

Apart from contributing to muscle contraction, the actomyosin system is responsible 
for many other motions at cellular level, e.g. the amoeboid movement of pseudopodia or the 
fission of cells at the end of mitosis by a contractile ring. 

Besides this, actin fibers fulfill structural tasks like maintenance of the shape of 
stereocilia or microvilli. Here, actin filaments are connected by proteins like fimbrin. But not 
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only specialized structures like the mentioned ones contain actin fibers. There is a network 
covering the complete cell volume with F-actin as a major constituent. Whereas the actin 
filaments in the structures mentioned above are relatively stable, this F-actin is highly 
dynamic. Management of the network structure and turnover is achieved by connecting 
proteins like alpha-actinin, fimbrin or fill-in; turnover is regulated by gelsolin, villin, and 
different capping- and fragmentation-proteins. 

Microtubules are built up of alpha-beta tubulin heterodimers. Turnover of filaments is 
achieved by building-in and releasing of monomers with different time constant rates at both 
ends. The resulting cycle is called "treadmilling". Thirteen strings of tubulin duplets build up 
one subfiber, whereas one fiber contains two or three of those. A complete axoneme consists 
of 9 radial and 2 central fibers. This "9+2" - structure is the basis both of flagella, their basal 
bodies and centrioles. In flagella, several additional structures like radial elements exist. 
Nexin connects the fibers and dyneinc is the motor ATPase which shifts the fibers relative to 
each other. Several genetic diseases like the Cartageneric syndrome are caused by 
deficiencies of distinct proteins in cilia. 

Besides this, microtubules are abundant in all types of cells. They are part of a 
delivery system for organelles, e.g. in the golgi apparatus. A further very important system 
based on microtubules is the mitotic spindle, it is organized by the centrosomes. Besides 
many other components, the major part of a centrosome are two centrioles which are built up 
of nine microtubule-triplets. Most remarkably, new centrioles are not synthesized de novo but 
generated by duplication of old ones. 

Cytoplasmic microtubules are associated with many different proteins. Two major 
classes are known: The MAPs ("microtubule-associated proteins", with molecular masses 
between 200 and 300 kD) and the much smaller tau-Proteins with a MW between 60 and 70 
kD. These proteins regulate the treadmill-process and the interaction with other structures in 
the cell. 

Besides actin and myosin the so-called intermediate filaments constitute a third class 
of filaments. In contrast to the former two groups, they do not participate in motility, nor are 
they dynamic structures subject to a vivid turnover. The most important ones are 
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neurofilaments (in neurons), keratin filaments (mainly in epithelial cells), and vimentin 
filaments (in many sorts different cell types). 



The biological function of both the cytoskeleton as well as contractile apparatus of a 
cell does not end at the cell membrane. Cells must be embedded in the extracellular matrix, 
all cells of a muscle must act as one single mechanical unit and epithelia must resist 
macroscopic mechanical forces. Hence, cell adhesion and the extracellular matrix are closely 
connected to the cytoskeleton. Vincullin is one of the proteins which serve as an anchor for 
intracellular fibers (actin). Different types of desmosomes and tight junctions connect 
neighbor cells with intercellular fibers. On the inside, cytoplasmic plaques connect them to 
the cytoskeleton. These structures, on the one hand, serve as mechanical elements whereas 
gap junctions, on the other hand, connect cells metabolically. 

The extracellular matrix consists of a network of proteins, glycoproteins and 
polysaccharides. Different proteins are present in relation to different mechanical demands:. 
Elastin is found in tissues with high elasticity (lungs, heart) whereas collagen, a more hard- 
wearing protein, is found in tendons and ligaments. Fibronectin is an extracellular protein 
highly important for cell adhesion. 

Reference: Murray J et al (1992): Cell Motil Cytoskeleton 22: 21 1-223. 

Within the overall group of Cell Structure and Motility several categories of proteins 
are coded for by clones of the invention: 

Collagen alpha chain proteins : Proteins with the typical (xxG)n repeat of collagen 
proteins and Pfam von Willebrand factor type A domain(s) suggest they are collagen alpha 
chains. These proteins can find application in modulation of connective tissue, bone and 
cartilage development and maintainance. OMIN reports collagen alpha chains have 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) Osteogenesis imperfecta, type I (OMIN #166200); 2) Osteogenesis 
imperfecta congenita (OMIN #166210); 3) Alport Syndrome, X-linked (OMIN #301050); 4) 
Thrombastenia of Glanzmann and Naegeli (OMIN *273800); 5) Ehlers-Danlos Syndrome, 
Type VII (OMIN #130060); 6) Marfan Syndrome (OMIN #154700); 7) Alport Syndrome, 
Autosomal Recessive (OMIN #203780); 8) Alpha-2-Deficient Collagen Disease (OMIN 
203760); 9) Goodpasture Syndrome (Omin 233450); 10) Osteogenesis Imperfecta, 
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progressively deforming, with normal sclerae (OMIN #259420); 11)) Ehlers-Danlos 
Syndrome, Type VII Autosomal Recessive (OMIN *225410); and 12) ) Osteogenesis 
imperfecta, Type IV (OMIN #166220). OMIN reports that von Willebrand factor type A 
domains have associations (as potentially diagnostic, therapeutic, causative, and/or related, 
etc..) with the following diseases:: 1) Hemophilia A (OMIN *306700); 2) Von Willebrand 
Disease (OMIN * 193400); 3) Giant Platelet Syndrome (OMIN *231200); 4) Thrombastenia 
of Glanzmann and Naegeli (OMIN *273800); 5) Congenital Thrombotic Diseasae due to 
protein C deficiency (OMIN #176860); 6) Polycystic Kidney Disease 1 (OMIN *601313); 7) 
Nephrogenic Diabetes Insipidus (OMIN *304800); 8) Factor V Deficiency (OMIN *227400); 
and 9) Dentatorubral-Pallidoluysian Atrophy (Omin * 125370). Clones in this category 
include: fbr2_2b5. 

Radial spokehead protein: Radial spokehead proteins, e.g., Chlamydomonas 
reinhardtii radial spokehead protein of flagella or axoneme and the Strongylocentrotus 
purpuratus sea urchin spermatozoa protein p63, and human proteins with similarity thereto 
are important for the maintenance of a planar form of sperm flagellar beating. The human 
protein(s) can find application in modulating the structure of the human spermatozoa radial 
spoke head and modulation of sperm motility in men (e.g., in sterility). Clones in this 
category include: tes3_15i5. 

Ankyrins : Ankyrins are peripheral membrane proteins which interconnect integral 
proteins with the spectrin-based membrane skeleton. Thus these proteins are involved in 
coupling of cyto skeleton and cell membrane. OMIN reports that Ankyrins have associations 
(as potentially diagnostic, therapeutic, causative, and/or related, etc.. .) with the following 
diseases: 1) Heriditary Spherocytosis (OMIN * 182900); 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN 141700); 3) Atypical Elliptocytosis (OMIN 
225450); 4) Autosomal recessive spherocystosis (OMIN #270970); 5) Werner Syndrome 
(OMIN *277700); and 6) Rhesus-unlinked type Elliptocytosis (OMIN #130600). Clones in 
this category include: tes3_l 817. 

FGDl-related F-actin binding protein (Farbin/FGDl) : FGD1 -related F-actin-binding 
protein (Farbin/FGDl) is a novel F-actin-binding protein. The gene locus fgdl seems to be 
responsible for faciogenital dysplasia or Aarskog-Scott syndrome. (OMIN 305400). Frabin 
binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in Swiss 3T3 
cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase activation, as 
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described for FGD1. Because FGD1 has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 and 
the actin cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events 
and induces the JNK/SAPK protein kinase cascade, which leads to the activation of 
transcription factors within the nucleus. Clones in this category include: tes3_72kl5. 

Paramvosins : Paramyosin is a major structural component of thick filaments and 
invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. Clones in this category include: tes3_7b22. 

Tuftelin : Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved 
in calcification, these proteins are also expressed in the uterus matrix. The new protein can 
find application in modulation of tissue-calcification, especially the uterus. As reported by 
OMIN, tuftelin has been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc..) with amelogenesis imperfecta (OMIN *600087). Clones in this category 
include: utel_19g22. 

Cell Adhesion Regulator (CAR1V CAR1 is involved in the regulation of cell-cell 
adhesion. OMIN reports the association (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) of CAR1 with tumor suppression by the reduction of tumor invasion 
(OMIN *1 16935). Clones in this category include: utel_24j6. 

Differentiation/Development 

Almost every multicellular organism originates from meiotic cell divisions and the 
recombination of a paternal and a maternal set of chromosomes. After fertilization of the egg, 
all cells of a body originate from this one cell. Thus the cells of the developing body are 
initially genetically alike. But phenotypically they become very different. They are 
specialized to a certain cell type and arranged in an organized pattern to a certain type of 
tissue and the whole structure has the well-defined shape of an organ. All these features are 
determined by the DNA sequence of the genome, which is reproduced in every cell. Each cell 
acts on the genetic instructions given to a certain time and at a certain place of development 
and plays its individual part in the multicellular organism. Cell differentiation may be divided 
into three general steps: cell cycle exit, apoptosis protection and tissue specific gene 
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expression. These processes are coordinated to provide the final and unique tissue 
characteristics. 



An animal cell that has achieved a certain level of development is said to be 
determined. This differentiation of a cell may be irreversible and in that case the cell may be 
renewed only by simple duplication. Other cells are renewed by means of stem cells which 
are immortal ( e.g. stem cells of the bone marrow, epidermal stem cells). The genetic control 
of development is extensively studied in non- vertebrates and vertebrates. The classical animal 
model is the fruit fly Drosophilia and the modern model is the transgenic mouse. Animal 
transgenesis has proven to be useful for physiological as well as physiopathological studies. 
Besides the approach based on the random integration of a DNA construct in the mouse 
genome, gene targeting can be achieved using totipotent embryonic stem cells for targeted 
transgenesis. Transgenic mice are than derived from the embryonic stem cells. This allows 
the introduction of null mutations in the genome (so-called knock-out) or the control of the 
transgene expression by the endogeneous regulatory sequence of the gene of interest (so- 
called knock-in). Mice can be created that express wild-type genes, mutant genes, marker 
genes or cell lethal genes in a tissue specific manner. These animal models allow to follow 
changes in tissue and organ development and lead to a better understanding of the cellular 
function of many genes or to the generation of animal models for human diseases. 
Fundamental problems in immunology, onset and development of cancer, regulation in fatty 
acid metabolism, aspects of cardiovascular function, control of the central nervous system 
development, analysis of reproductive development and function are only some examples of 
research interests. 

The final stage of cell differentiation is growth arrest. In animal tissues with rapid cell 
turnover terminally differentiated cells undergo programmed cell death. The cells have the 
ability to kill themselves by activating an intrinsic cell suicide program when they are no 
longer needed or have become seriously damaged. The execution of this program is termed 
apoptosis. Apoptosis is of importance for development and homeostasis of animals. The key 
components of this program have been conserved in evolution from worms (C. elegans) to 
insects (Drosophilia) to humans. The roles of apoptosis include the sculpting of structures 
during development, deletion of unneeded cells and tissues, regulation of growth and cell 
number, and the elimination of abnormal and potentially dangerous cells. In this way 
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apoptosis provides "quality control mechanism" that limits the accumulation of harmful cells, 
such as virus-infected cells and tumor cells. On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases, including AIDS, neuro-degenerative disorders and 
ischemic stroke. Because it is now clear that apoptosis is a result of an active, gene-directed 
process, it should be eventually possible to manipulate this form of cell death by developing 
drugs that interact with its recently identified mechanisms of action. Inducers of cell 
differentiation, cell cycle arrest and apoptosis might be the novel molecular targets for new 
anticancer agents in addition to the signaling pathways for growth factors and cytokines. 

Proteins, factors, receptors and genes of importance in apoptosis : 

Proteases: 

- Calpain, an intracellular cysteine protease, exact role unknown. 

- Caspase-1 to Caspase-1 1, a family of proteases synthesized as an inactive 
proenzyme. Targets of the activated enzymes include: poly(ADP-ribose) polymerase, DNA- 
dependent protein kinase, Ul ribonucleoprotein, nuclear laminins and cytoskeleton 
components (actin). 

- Granzyme B, a serine protease released by cytotoxic T-cells. 
Receptors: 

- CD 95 (synonyms: Fas, APO-1), a receptor protein of the TNF -receptor family 
which includes TNF-R1 and TNF-R2 with the common characteristic of a 70 amino acid 
cytoplasmic domain. 

- FADD (synonym: MORT-1), a cytoplasmic protein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-family 

- DR-4 and DR-5 
Genes: 
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- ced-3, ced-4 and ced-9 encode the general apoptotic and antiapoptotic program in 
Caenorhabditis elegans. Apaf-3 is the mammalian homologue of ced-3. 

- Bcl-2 / Bcl-xL / Bax / Bcl-xS / Bak: a large gene family that can either inhibit or 
promote apoptosis. 

- Cytokine response modifier A, a cowpox virus gene whose gene product inhibits 
caspases. 

Others: 

- Caspase-activated DNase (CAD) and its inhibitor (ICAD), causes DNA 
fragmentation in the nucleus 

- Ceramide, a complex lipid that acts as a second messenger. 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- p53 protein, is essential for the induction of apoptosis as a response to chromosomal 
damage. 

- RAIDD, a death signal-transducing protein. 

- Receptor interacting protein (RIP) is an accessory protein with a death domain and a 
serine/threonine kinase activity. 

- Sphingomyelinase, an enzyme that hydrolyzes the complex lipid sphingomyelin to 
ceramide. 

- Tumor necrosis factor (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAF2), is an accessory protein that can bind to 
both TNF-R1 and TNF-R2. 

Within the overall group of Differentiation/Development, several categories of 
proteins are coded for by clones of the invention: 

Interleukins (e.g. Interleukin-7) : Interleukin precursors related to interleukin-7, for 
example, are expected to act as new growth factors for human B lineage cells. Additionally, 
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these proteins should induce the gene rearrangement of the T-cell receptor repertoire, leading 
to thymocyte commitment, and subsequently induce both cytotoxic T-cell- and lymphocyte- 
activated killer cells These interleukins could find clinical application in a variety of 
conditions of hematolymphopoietic failure and different tumours, because of its recruitment 
of B cell lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. (OMIN 
* 146660). Clones in this category include: tes3_35e21 . 

Testis-specific Y-encoded proteins : The TSPY genes are arranged in clusters on the Y 
chromosome of many mammalian species. TSPY is believed to function in early 
spermatogenesis and is a candidate for GBY, the putative gonadoblastoma-inducing gene on 
the Y. Proteins of the TSPY-SET-NAP1L1 family represent proteins closely related to 
TSPY. These proteins seem to be involved in early spermatogenesis. Clones in this category 
include: fbr2_2dl5. 

Intracellular transport and trafficking 

Eukaryotic cells rely for their viability on the partitioning of many basic cellular 
processes into membrane-bounded organelles. These are the nucleus, endoplasmic reticulum 
(ER), Golgi apparatus, endosomes, lysosomal compartments, mitochondria and peroxisomes. 
Most molecules destined for the lysosome, cell surface and outside the cell are routed through 
the ER and Golgi, which together with the vesicular intermediates between them, comprise 
the secretory pathway (Palade 1975). In the ER and Golgi compartments proteins are sorted, 
modified and often assembled into complexes en route to their final destination. Incorrectly 
assembled proteins are retained in the ER until they fold correctly or are targeted for 
degradation. Additional proteins are translocated into and function within the lumenal spaces 
of organelles or are secreted. Thus a large proportion of proteins synthesized require targeting 
to membranes either for insertion into or transport across them. A major purpose of this is 
growth. The secretory pathway is dependent on an intact cytoskeleton and also closely linked 
to general metabolism by affecting ribosome biogenesis (Mizuta and Warner, 1994). A huge 
number of proteins is required for targeting, translocation and sorting of newly synthesized 
proteins. 

The first step in sorting is the recognition of cis-acting targeting or signal sequences 
that organelle-targeted proteins contain. This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is targeted. In some cases the primary 
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sequences are extremely degenerate, with only the overall character being conserved 
(hydrophobicity for an ER signal sequence, helical amphiphilicity for mitochondrial targeting 
sequence (Kaiser et al., 1987; Lemire et al., 1989). Following the targeting step, proteins are 
either inserted into or transported across the membrane (translocated) through a proteinaceous 
apparatus (termed the translocon). The translocon include or recruit motors to drive the 
translocation process in the correct direction (Schatz and Dobberstein, 1996). 
Defined intracellular protein transport steps: 

•ER 

- targeting to the ER 

- translocation into the lumen of the ER, and, depending on the presence of 
certain signals in the peptide sequence transport through the golgi complex 

• Mitochondria 

- targeting 

- translocation 

• Peroxisomes 

• The general secretory pathway 

- protein modification, assembly and quality control in the ER 

- vesicle-mediated trafficking 

- vesicle docking and fusion 

- transport through the golgi apparatus and sorting at the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 

• Endocytosis 

• Specialized protein transport routes 

• Protein export from the cytoplasm 

References: Palade, G (1975) Science 189:347-358; Mizuta et al. (1994) Mol Cell 
Biol 14: 2493-2502; Kaiser et al. (1987) Science 235: 312-317; Lemire etal. (1989) JBiol 
Chem 264: 20206-20215; Schatz et al. (1996) Science 271 : 1519-1526. 

Rab proteins 

In eukaryotic cells the compartmentalisation of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
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and other molecules. Trafficking between organelles within the secretory pathway occurs as 
vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Sudhof, 
1998). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1 998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 

28 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 

inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
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homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991) Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998) Annu. Rev. Neurosci. 21, 75-95; Guo et al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996) J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997) Curr. Opin. Cell Biol. 9, 496-504; Peterson (1999) Curr. Biol. 9, 159- 
162; Poirier et al. (1998) Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998) EMBO J. 17, 
1941-1951; Wang et al. (1997) Nature. 388, 593-598; Yang et al. (1999) J. Biol. Chem. 274, 
5649-5653. 

Within the overall group of Intracellular Transport and Trafficking several categories 
of proteins are coded for by clones of the invention. 
Rab proteins : 

Rab IB is essential for the intracellular transport of nascent low density lipoprotein 
(LDL) receptor. It is discussed as a universal mediator of endoplasmatic reticulum to Golgi 
transport of membrane glycoproteins in mammalian cells. . Clones in this category include: 
fbr2_2il7, fbr2_3b!6. 
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RablO appear concentrated on membranes in the perinuclear region. Rab 10 has been 
associated (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the 
following diseases as reported by OMIN: 1) Choroideremia (OMIN *303199); and 2)RETT 
Syndrome (OMIN 312750). Clones in this category include: fbr2_62119. 

In mice, Rab 17 shows epithelial cell specificity. Rab 17 is discussed as candidate gene 
for the mouse mutations In (leaden), Tw (twirler), and ax (ataxia). Cloned from a brain cDNA 
library, the new putative Rab-protein is expected to be involved in vesicle trafficking within 
neuronal cells. These proteins can find application in modulating the transport of vesicles 
inside neuronal cells, which are essential for development of functional dendritic processes. . . 
Clones in this category include: fbr2_41ml5. 

Ankvrin G : The ankyrin 3 gene encodes a novel ankyrin, which is expressed in 
multiple tissues, with very high expression at the axonal initial segment and nodes of Ranvier 
of neurons in the central and peripheral nervous systems. Ankyrin G shows several tissue- 
specific alternative mRNA processing. The different ankyrin G proteins participate in 
maintenance/targeting of ion channels and cell adhesion molecules to nodes of Ranvier and 
axonal initial segments. Ankyrin G has been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) with Werner disease (OMIN *277700). Clones 
in this category include: fkd2_24p5. 

Zn-T-transporters : The Zn-T-transporters are membrane proteins that facilitates 
sequestration of zinc in endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved 
in the accumulation of zinc in synaptic vesicles. Zinc (Zn) is an essential element in normal 
development and metabolism. Recent studies show that in Alzheimer's disease, Zn functions 
as a double-edged sword, affording protection against Alzheimer's amyloid beta peptide (the 
major component of senile plaques) at low concentrations and enhancing toxicity at high 
concentrations by accelerated aggregation of the amyloid beta peptide. These proteins can 
find application in modulation of Zinc transport in neuronal cells, thus providing means for a 
modulation of Alzheimer's amyloid beta peptide plaque formation. (OMIN *602878, 
♦602095). Clones in this category include: fbr2_62fl 0. 

Metabolism 

This group includes proteins which are involved in the uptake and consumption of 
nutrients, and enzymes which are part of the biochemical pathways for energy metabolism or 



31 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 
which are involved in the supply of building blocks of nucleic acids, proteins (NTPs, dNTPs, 
amino acids) for DNA/RNA and protein synthesis, and fatty acids (membranes), to allow for 
the generation of higher order structures. This group constitutes the most important and 
largest group in prokaryotes and lower eukaryotes. The higher the evolutionary level of an 
organism is, however, the more other protein classes like 'signal transduction', 'cell cycle' 
and 'differentiation and development' increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds (here: other than 
nucleic acids or proteins) are usually the products of house keeping genes, they are often 
constitutively and/or ubiquitously expressed. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of Metabolism: 

NATl. ARD1 : In yeast, ARD1 and NAT1, are required for the expression of an N- 
terminal protein acetyltransferase 1. NATl controls full repression of the silent mating type 
locus HML, sporulation and entry into GO. ARD1 is involved in the assembly of the NAT 1- 
complex. These can find application modulating NAT assembly and action and therefore 
could be important in metabolism of drugs and environmental mutagens.(OMIN * 108345). 
Clones in this category include: fbr2_3g8. 

Apolipoprotein E receptor : In LDL-receptors the class A domains form the binding 
site for LDL and calcium. The acidic residues between the fourth and sixth cysteines are 
important for high-affinity binding of positively charged sequences in LDLR's ligands. These 
proteins can find application in modulation of cholesterol binding and transport by LDL- 
receptors and LDL-binding proteins. In normal individuals, chylomicron remnants and very 
low density lipoprotein (VLDL) remnants are rapidly removed from the circulation by 
receptor-mediated endocytosis in the liver. In familial dysbetalipoproteinemia, or type III 
hyperlipoproteinemia (HLP III), increased plasma cholesterol and triglycerides are the 
consequence of impaired clearance of chylomicron and VLDL remnants because of a defect 
in apolipoprotein E. Accumulation of the remnants can result in xanthomatosis and premature 
coronary and/or peripheral vascular disease. OMIN reports that apolipoprotein has 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) Familial hypercholesterolemia (OMIN 143890); 2) Familial combined 
hyperlipidemia (OMIN 144250); and 3) Alzheimer disease. (OMIN #104300). Clones in this 
category include: fbr2_62017. 
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Ubiquitin carboxyl-terminal hydrolases : Ubiquitin carboxyl-terminal hydrolases (EC 
3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol proteases that recognize and hydrolyze 
the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the 
processing of poly-ubiquitin precursors as well as that of ubiquinated proteins. OMIN reports 
that Ubiquitin-specific proteases have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with the following diseases: 1) Lung carcinoma (OMIN 
*603486); 2) x-linked retinal diseases (OMIN *300050); 3) oncogenesis (OMIN *300050);4) 
ovarian cancer (OMIN *300050). Clones in this category include: fbr2_78k24; htes3_27dl. 

Phosphoserine signature (phosphoglucomutases, phosphomannomutase) : These 
proteins take part in the conversion of hexose phosphates. OMIN reports that these proteins 
have associations (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with 
the following disease: Fanconi-Bickel Syndrome (OMIN #227810). Clones in this category 
include: fkd2_24bl5. 

NADH ubiquinone oxidoreductase: NADH:ubiquinone oxidoreductase is the first 
enzyme in the respiratory electron transport chain of mitochondria. It is a a membrane-bound 
multi-subunit protein. The bovine heart enzyme contains about 40 different polypeptides. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with the following disease: Brancio-oto-renal syndrome 
(OMIN *6601445). Clones in this category include: fkd2_3ol7. 

Transketolases : Transketolase requires thiamin pyrophosphate as cofactor and shows 
a wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) 
and R-CHOH-CO-CH(2)OH. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with the following 
diseases: Wernicke-Korsakoff Syndrome (OMIN *277730). Clones in this category include: 
tes3_17117. 

Fatty acid-CoA svnthetases/ligases : These proteins contain AMP-binding domain 
signature(s), which is present in enzymes which act via an ATP-dependent covalent binding 
of AMP to their substrate. This domain is found in several CoA synthetases, such as acetate- 
Co A ligase (EC 6.2.1.1), long-chain-fatty-acid-CoA ligase (EC 6.2.1.3), bile acid-Co A ligase. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
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causative, and/or related, etc . . .) with the following diseases: 1 ) Alport syndrome , mental 
retardation and elliptocytosis (OMIN *300157); 2) Adrenoleukodystrophy (OMIN *300100). 
Clones in this category include: tes3_35kl7. 

ADP/ATP or Adenine Nucleotide Translocataors : These proteins contain 
mitochondrial energy transfer signature(s) and are most abundant in mitochondria. In its 
functional state, it is a homodimer of 30-kD subunits embedded asymmetrically in the inner 
mitochondrial membrane. The dimer forms a gated pore through which ADP is moved from 
the matrix into the cytoplasm.. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc. ..) with the following 
diseases: 1) cardiomyopathy (OMIN * 103220); 2) myopathy (OMIN * 103220); 
3)Progressive external ophthalmoplegia (OMIN *601227). Clones in this category include: 
tes3_35nl2. 

Carboxvlesterases : OMIN reports that these proteins have associations (as potentially 
diagnostic, therapeutic, causative, and/or related, etc. . .) with the following diseases: 
l)hepatic carboxyl esterase with detoxification of foreign compounds (OMIN *1 14835); 2) 
non-Hodgkin lymphoma (OMIN *1 14835); 3) B-cell chronic lymphocytic leukemia (OMIN 
* 1 14835); 4) rheumatoid arthritis (OMIN * 1 14835). Clones in this category include: 
tes3_35n9. 

Heat shock proteins: OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with the following 
diseases: 1)27 kd heat shock protein has been correlated with thermotolerance in response to 
environmental challenges and developmental transitions. (OMIN * 602 1295). Clones in this 
category include: utell_23el3. 

Nucleic acid management 

The genetic information is stored in the form of nucleic acids in all organisms. Two 
kinds of nucleic acids exist, DNA and RNA. Whereas the more stable DNA in most 
organisms constitutes the storage form of the genetic information, the labile RNA and in 
particular mRNA is an intermediate used for the temporal expression of specific genes. 

In eukaryotes, DNA is usually a double stranded linear molecule consisting of two 
antiparallel strands and made up of a deoxyribose, a phosphorus backbone and the four bases 
A, C, G, and T. The DNA of some organisms has a ring structure. The structure of DNA was 
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unraveled years ago by Watson and Crick. DNA is directional molecule determined by the C- 
atoms of the sugar. 

The most important processes dealing with nucleic acids are: 

• replication (e.g. DNA polymerases, Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

• in addition, enzymes and proteins exist which require a nucleic acid (mostly RNA) in the 
active center to be functional (ribozymes - e.g. RNase, Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell cycle. Several enzymes carry 
out the task of doubling this nucleic acid. As all steps of the cell cycle, also the process of 
replication is tightly regulated. The enzyme DNA polymerase and several other proteins are 
involved in this process. Whereas many prokaryotes do have only one origin of replication 
(i.e., the starting point of the replication cycle), in eukaryotic DNAs (chromosomes) multiple 
such start points exist. The switch from the synthesis (S) phase to the subsequent G2 or M 
phases of the cell cycle are dependent on the completion of the replication. This makes clear, 
that a number of proteins are involved in the replication itself as well as in the control of the 
process. Since most eukaryotic chromosomes are linear structures, additional proteins and 
enzymes are necessary to make sure that the structure is maintained through successive 
generations. This includes those proteins necessary to build the three dimensional structure of 
chromosomes (e.g. histones) and the structural network of the nucleus and nucleolus 
(including the defined localization of transcriptionally active genes in the vicinity of nucleoli) 
but also such enzymes as telomerase which guarantees the integrity of the chromosomal ends. 

The expression of genes is usually performed in two steps. First a messenger RNA 
(mRNA) is produced (transcribed) in one to many copies and second this mRNA is translated 
into the protein product. The regulation of transcription is discussed under the separate 
heading 'transcription factors', but also the classes 'signal transduction', 'development', 'cell 
cycle' and others are affected as the expression of certain genes determines the fate of a cell 
or organism. 

The primary transcript (hnRNA - heterogeneous nuclear RNA) is a single stranded 
one-to-one copy of the gene as it is located on the chromosome. Before a protein can be 
translated, already during transcription the process of maturation is initiated. Firstly, a 5' cap 
structure is enzymatically and covalently added to the RNA, blocking the 5' end of the RNA. 
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Second, when the RNA polymerase has terminated polymerization, the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3' end of the transcript. This 
enzyme recognizes the sequence AAUAAA or AUUAAA (+ some minor variations), cuts the 
RNA 10-30 nucleotides downstream and adds the A residues. The size of the poly A 
sequence affects the stability of the RNA. Finally, in the process of splicing, the introns 
present on the genomic level and also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs. The finally maturated mRNA is exported 
to the cytoplasm where it is translated with help of the ribozymes. 

The half life of RNA is usually much shorter than that of DNA. Usually, the mRNA is 
degraded shortly after synthesis, to guarantee a very defined window of expression of a given 
gene. This regulation is necessary to specifically maintain or change the set of proteins 
present at any time in a cell. Specific regions in the 3'UTR (untranslated region) determine 
the stability of the mRNA in the cytoplasm before it is degraded by RNases, enzymes 
consisting both of protein and RNA. 

References: Watson and Crick (1953) Nature 171: 737-738. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Nucleic acid management"and include, among others, the following: 

RNA helicases including DEAD/H box helicases : RNA helicases comprise a large 
family of proteins that are involved in basic biological systems such as nuclear and 
mitochondrial splicing processes, RNA editing, rRNA processing, translation initiation, 
nuclear mRNA export, and mRNA degradation. RNA helicases are essential factors in cell 
development and differentiation, and some of them play a role in transcription and replication 
of viral single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. DEAD box proteins have been associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) as reported by with the following disease processes and/or 
genes: 1) ataxia-telangiectasia gene: "A human gene (DDX10) encoding a putative DEAD- 
box RNA helicase at 1 Iq22-q23" Genomics 33:199-206, 1996, Savitsky et al., (OMIN 
*601235); 2) hematopoetic tumors: "Cloning and expression of a murine cDNA homologous 
to the human RCK/P54, a lymphoma-linked chromosomal breakpoint 1 lq23", Gene 166:293- 
6, 1995, Seto et al. (OMIN *600326); 3) dermatomyositis: a) "The major dermatomyositis- 
specific Mi-2 autoantigen is a presumed helicase involved in transcriptional activation." 
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Arthritis Rheum. 38: 1389-1399, 1995, Seelig et al. (OMIN *603277); b) "Two forms of the 
major antigenic protein of the dermatomyositis-specific Mi-2 autoantigen." (Letter), Arthritis 
Rheum. 39: 1769-1771, 1996., Seelig etal. (OMIN *603277); c) "The dermatomyositis- 
specific autoantigen Mi2 is a component of a complex containing histone deacetylase and 
nucleosome remodeling activities", Cell 95: 279-289, 1998. Zhang et al. (OMIN *603277); 4) 
Muscular Dystrophy, Pseudohypertrophic Progressive Duchenne and Becker Types (OMIN 
*3 10200); 5) Mucopolysaccharidosis Type IVA (OMIN *253000); 6) Albinism I (OMIN 
*203100); 7) Wilms Tumor 1 (OMIN * 194070); 8) Spinocerebellar Ataxia 7 (OMIN 
*164500). Clones in this category include: fbr2_23M0, fbr2_3cl8, fbr2_6ol7, fbr2_82i24, 
andtes3_14h21. 

Inorganic pyrophosphatase : Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the 
enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed as the product 
of the many biosynthetic reactions that utilize ATP. All known PPases require the presence of 
divalent metal cations, with magnesium conferring the highest activity. Clones in this 
category include: fbr2_64al5. 

DNA-damage -inducible protein (dinP) or Proteins induced by DNA-Damage : The 
dinB/P pathway is a second SOS-pathway in E.coli. Genes related to this seem to be 
involved in modulating DNA repair and mutagenesis. Clones in this category include: 
fbr2_72bl8. 

Proteins with myc-type, helix-loop-helix dimerization domain signature(s) . This 
helix-loop-helix domain mediates protein dimerization has been found in proteins such as the 
myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, 
these proteins could be novel DNA-binding proteins. Clones in this category include: 
fbr2_72112. 

Cytosolic ribosomal proteins L36 : L36 seems to be part of the eukaryotic liposomal 
peptidyl transferase center and can find application in modulation of ribosome assembly, 
maintenance and activity. Clones in this category include: fkd2_3b2. 

Pvibonuclease H : Ribonuclease H proteins are RNA modificating proteins and have 
been associated (as potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with 
the following diseases as reported by OMIN: 1) Adenomatous Polyposis of the Colon (OMIN 
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* 175 100); 2) Retinoblastoma (OMIN * 180200) ; and 3) Von Hippel-Lindau Syndrome 
(OMIN * 193300). Clones in this category include: phtes3_15j3. 



Signal transduction 

Cells in higher order organisms need to continuously communicate with its 
environment especially with other cells of the same organism in order to maintain the 
function and specialization of the whole system these cells are part of. This important task of 
communication is performed with help of cell-surface receptors which receive and transmit 
signals from outside into the cell. 
G-proteins 

The largest known family of cell-surface receptors is that of the G-protein-coupled receptors, 
which mediate the transmission of diverse stimuli such as neurotransmitters, glycopeptides, 
hormones, peptides, odorant molecules, and photons. The functional unit of these receptors is 
composed of the receptor molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domains, the heterotrimeric G-protein which is 
composed of a and py-subunits (Got and GPy), and the effectors that interact with Ga and / or 
GPy. In particular, the dissociated Ga and Gpy can regulate the activities of a number of 
effector molecules such as adenylate cyclases, phopholipase C isoforms, ion channels, and 
tyrosine kinases, resulting in a variety of cellular functions. The process of signal 
transduction must be tightly regulated and reversible in order to avoid overstimulation, to 
achieve signal termination, and render the receptor responsive to subsequent stimuli 
[Iacovelly L. et al., (1999) FASEB J. 13, 1-8, Hamm, H.E. (1998) J. Biol Chem. 273, 669- 
672]. 

G-proteins are GTPases that, upon binding of GTP change their conformation which 
in return unmasks structural motives, in particular the so called effector loop, which can 
mediate the interactions to target proteins, or effectors, for the GTPases. This ability enables 
the GTPases to cycle between active, GTP-bound and inactive, GDP bound conformations 
and in the process to function as molecular traffic lights in a multitude of signal transduction 
pathways. The most important of these signal transduction pathways that are regulated with 
help of G-proteins are that of the phospholipase C / protein kinase C and that of the adenylate 
cyclase / protein kinase A. 
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The cycling of GTPases is tightly regulated by three main classes of proteins: The 
exchange of hydrolyzed GDP for a fresh GTP is facilitated by guanosine nucleotide exchange 
factors (GEFs), the hydrolysis of GTP to GDP is sped up by GTPase-activating proteins 
(GAPs), and the dissociation of GDP from the GTPases is inhibited by GDP dissociation 
inhibitors (GDIs) [Tapon and Hall (1997) Curr.Opin. Cell. Biol. 9, 86-92, Van Aelst and D- 
Souza-Schorey (1997) Genes Dev. 11, 2295-2322]. 

SOC-familv 

A conserved motif that was originally identified in proteins that negatively regulate 
the signaling action of cytokines was termed SOCS box, the Suppressor Of Cytokine 
Signaling. Based on homology, five distinct structural protein classes have been identified 
since that carry this motif. The function of most of these proteins is presently not known. 
Common to the proteins is only the SOCS box which is located near the C-terminus of the 
respective peptides. Recently, the SOCS box has been demonstrated to induce binding of 
proteins to elongins B and C which could target the proteins (and bound substrates) to the 
proteasomal protein degradation pathway (Kamura, T. et al. (1998) Genes Dev. 12, 3872- 
3881; Zhang, J.-G. et al. (1999) Proc. Natl. Acad. Sci. USA 96, 2071-2076). 

The class where the SOCS box was originally described contains several members 
(SOCS-l-SOCS-7 and CIS). In addition to the SOCS box, these proteins also contain a SH2 
(Src-homology 2) domain and a variable N-terminus. These SOCS proteins appear to form 
part of a classical negative feedback loop that regulates cytokine signal transduction. Upon 
cytokine stimulation, expression of SOCS proteins is rapidly induced and the proteins inhibit 
further cytokine action. The mode of action of the SOCS proteins is variable. While SOCS-1 
binds and inhibits the JAK (Janus kinases) family of cytoplasmic protein kinases [Narahzaki 
M. etal. (1998) Proc. Natl. Acad. Sci. USA 95, 13130-13134, Nicholson, S.E. et al. (1999) 
EMBO. J. 18, 375-385], CIS appears to act by competing with signaling molecules such as 
the STATs (Transducers and Activators of Transcription) family for binding to 
phosphorylated receptor cytoplasmic domains [Yoshimura, A. et al. (1995) EMBO J. 14, 
2816-2826; Matsumoto, A. et al. (1997) Blood $9, 3148-3154]. 

A second class of SOCS box protein contains additionally WD-40 repeats which were 
initially identified in the mouse WSB-1 and -2 proteins. The functions of WD-40 proteins are 
not completely understood but seem to be rather divergent. In Cdc4p the WD-40 repeats 
probably are necessary for binding the substrate for Cdc34p [Mathias, N. et al. (1999) Mol. 
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Cell Biol. 19, 1759-1767]. Cdc4p is a component of a ubiquitin ligase that tethers the 
ubiquitin-conjugating enzyme Cdc34p to its substrates. The posttranslational modification of 
a protein by ubiquitin usually results in rapid degradation of the ubiquitinated protein by the 
proteasome. The transfer of ubiquitin to substrate is a multistep process where WD-40 repeats 
might play an important function. 

Other WD-40 containing proteins (e.g. the retino blastoma binding protein RbAp48) 
have been shown to bind metal ions (Zinc) and that this metal binding might mediate and/or 
regulate protein-protein interactions which are functionally important in chromatin 
metabolism [Kenzior, A.L. and Folk, W.R. (1998) FEBS Lett. 440, 425-429]. These proteins 
are involved in the RAS-cAMP pathway that regulates cellular growth [Ach R.A. et al. 
(1997) Plant Cell 9, 1595-1606]. 

The SPRY domain has been identified in pyrin or marenostrin, a protein which is 
mutated in patients with Mediterranean fever and which is similar to the butyrophilin family. 
While butyrophilins seem to be involved in the lactation process in mammals, the function 
pyrin is unknown. Three proteins (SSB-1 to -3) have been identified to contain both SPRY 
and SOCS box motifs. The function of these proteins is also not known. 

Ankyrin repeat containing proteins share a 33-residue repeating motif, an L-shaped 
structure with protruding P-hairpin tips which mediate specific macromolecular interactions 
with cytoskeletal, membrane, and regulatory proteins. These proteins play fundamental roles 
in diverse biological activities including growth and development, intracellular protein 
trafficking, the establishment and maintenance of cellular polarity, cell adhesion signal 
transduction, and mRNA transcription. Three proteins that contain ankyrin repeats (ASB-1 to 
-3) have been identified to contain a C-terminal SOCS box additionally to the ankyrin 
repeats. The function of these proteins or the individual domains remains to be discovered 
[Hilton, D.J. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 114-119]. 

A few small GTPases (RAR and RAR like) do also contain a SOCS box. GTPases are 
involved in signal transduction during cellular communication. The function of the SOCS box 
in this type of proteins is currently unclear [Hilton, D.J. et al. (1998) Proc. Natl Acad. Sci. 
USA 95, 114-119]. 

Ca 2+ as second messenger 

The bivalent cation Ca 2+ is, besides cAMP, one of the two major second messengers 

in eukaryotic cells. Its intracellular concentration is tightly regulated and usually kept very 
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low compared to the cell's environment. Ca 2+ binding proteins and transporters (Gap junction, 
Voltage-gated, second messenger-gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the contraction of 
the muscle is dependent on the presence of Ca 2+ ions which are readily transported back into 
the organelles in order for the muscle to relax. In signal transduction, Ca 2+ functions as a 
second messenger that activates Ca 2 ~ dependent processes through the activation of 
Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major effector 
molecules of Ca 2+ . In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

cAMP 

The cyclic AMP is produced by the enzyme adenylate cyclase in response to 
extracellular signals. Certain G-proteins stimulate the activity of adenylate cyclase which 
converts ATP to cAMP and PPi. Two molecules of cAMP bind to each of two regulatory 
subunits of cAMP dependent protein kinase which in turn dissociate from the two catalytic 
subunits of the heterotetramer R 2 C 2 . Upon release of the C-subunits, they become active and 
phosphorylate substrate proteins at Ser and Thr residues. The process leading from binding of 
extracellular molecules to their receptors, the transmission of the stimuli into the cell, the 
activation of adenylate cyclase and the subsequent activation of cAMP dependent protein 
kinase is one of two major signal transduction pathways in eukaryotic cells. Since the 
phosphorylation of proteins is a posttranslational modification of proteins, the kinases are 
described in the class "signal transduction." 

SARA 

Members of the transforming growth factor B (TGF13) superfamily signal through a 
family of cell-surface transmembrane serine/threonine kinases, known as type I and type II 
receptors (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and Massague, 
1998). Ligand induces formation of heteromeric complexes of these receptors, and signaling 
is initiated when receptor I is phosphorylated and activated by the constitutively active kinase 
of receptor II (Wrana et al., 1994 ). The activated type I receptor kinase then propagates the 
signal to a family of intracellular signaling mediators known as Smads (contraction of the 
C.elegans Sma and Drosophila Mad genes which were the first identified members of this 
class of signaling effectors). 
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Three classes of Smads with distinct functions have been defined: the receptor- 
regulated Smads, which include Smadl, 2, 3, 5, and 8; the common mediator Smad, Smad4; 
and the antagonistic Smads, which include Smad6 and 7 (Heldin et al., 1997; Attisano and 
Wrana, 1998 ; Kretzschmar and Massague, 1998 ). Receptor-regulated Smads (R-Smads) act 
as direct substrates of specific type I receptors, and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved SSXS motif (Macias-Silva et 
al., 1996 ; Abdollah et al., 1997 ; Kretzschmar et al., 1997 ; Liu et al., 1997b ; Souchelnytskyi 
et al., 1997 ). Regulation of R-Smads by the receptor kinase provides an important level of 
specificity in this system. Thus, Smad2 and Smad3 are substrates of TGFB or activin 
receptors and mediate signaling by these ligands (Macias-Silva et al., 1996 ; Liu et al., 1997b 
; Nakao et al., 1997 ), whereas Smadl, 5, and 8 are targets of BMP receptors and propagate 
BMP signals (Hoodless et al., 1996 ; Chen et al., 1997b ; Kretzschmar et al., 1997 ; 
Nishimura et al., 1998 ). Once phosphorylated, R-Smads associate with the common Smad, 
Smad4 (Lagna et al., 1996 ; Zhang et al., 1997 ), and mediate nuclear translocation of the 
heteromeric complex. In the nucleus, Smad complexes then activate specific genes through 
cooperative interactions with DNA and other DNA-binding proteins such as FASTI, FAST2, 
and Fos/Jun (Chen et al., 1996 , Chen et al., 1997a ; Liu et al., 1997a ; Labbe et al., 1998 ; 
Zhang et al., 1998 ; Zhou et al., 1998 ). In contrast to R-Smads and Smad4, the antagonistic 
Smads, Smad6 and 7, appear to function by blocking ligand-dependent signaling (reviewed in 
Heldin et al., 1997 ). 

Phosphorylation of R-Smads by the type I receptor is essential for activating the 
TGFB signaling pathway (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and 
Massague, 1998 ). However, little is known of how Smad interaction with receptors is 
controlled. A novel Smad2/Smad3 interacting protein has been described (Tsukazaki T. et al., 
1998 ) that contains a double zinc finger, or FYVE domain, and which has been called SARA 
(Smad anchor for receptor activation). The SARA motif recruits Smad2 into distinct 
subcellular domains and co-localizes and interacts with TGFB receptors. TGFB signaling 
induces dissociation of Smad2 from SARA with concomitant formation of Smad2/Smad4 
complexes and nuclear translocation. Moreover, deletion of the FYVE domain in SARA 
causes mislocalization of Smad2 and inhibits TGFB-dependent transcriptional responses. 
Thus, SARA defines a component of TGFB signaling that functions to recruit Smad2 to the 
receptor by controlling the subcellular localization of Smad. 
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Calcium 

The bivalent cation Ca 2+ is, along with cAMP, one of the two major second 
messengers in eukaryotic cells. Its intracellular concentration is tightly regulated and usually 
kept very low compared to the cell's environment. Ca 2+ binding proteins and transporters 
(Gap junction, Voltage-gated, second messenger-gated) help to sequester huge amounts of the 
ion in various organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the 
contraction of the muscle is dependent on the presence of Ca 2+ ions which are readily 
transported back into the organelles in order for the muscle to relax. In signal transduction, 
Ca 2+ functions as a second messenger that activates Ca 2+ dependent processes through the 
activation of Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major 
effector molecules of Ca 2+ . In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

Rab proteins 

In eukaryotic cells the compartmentalization of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
and other molecules. Trafficking between organelles within the secretory pathway occurs as 
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vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998 ). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
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most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
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that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991). Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998). Annu. Rev. Neurosci. 21, 75-95; Guoet al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996). J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997). Curr. Opin. Cell Biol. 9, 496-504; Peterson et al. (1999). Curr. Biol. 9, 
159-162; Poirier et al. (1998). Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998). EMBO J. 17, 
1941-1951; Wang et al. (1997). Nature. 388, 593-598; Yang et al. (1999). J. Biol. Chem. 274, 
5649-5653. 

Kinases 

Reversible posttranslational modifications of proteins are major means of regulating 
cellular activities. Among the various modifications that are carried out by the cells, the 
addition of phosphoryl groups to Ser/Thr or Tyr residues is the most important and widely 
used. The phosphorylation of proteins is accomplished by protein kinases, while the reverse 
reaction, the removal of phosphoryl groups, is carried out by phosphatases. Kinases / 
Phosphatases regulate key positions e.g. in the processes of cell proliferation, differentiation 
and communication/signaling. These processes must be tightly regulated in order to maintain 
a steady state level of cellular fate. Mis-regulation of kinase activities (or that of 
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phosphatases) is made responsible for a multitude of disease processes such as oncogenesis, 
inflammatory processes, arteriosclerosis, and psoriasis. 

Protein kinases constitute the largest protein family that is currently known. Several 
hundred kinases have been identified already. Classically, kinases are subdivided into two 
classes based on the amino acid residues in their substrates that are phosphorylated by the 
particular enzymes. The kinases specifically add phosphoryl groups from adenosine 
triphosphate (ATP) or, less frequently, guanosine triphosphate (GTP), either to serine and/or 
threonine or to tyrosine residues of substrate proteins. An estimated 1,000 to 10,000 proteins 
present in a typical mammalian cell are believed to be regulated also by the action of protein 
kinases. 

Protein kinases are frequently integral parts of signaling cascades that transmit 
extracellular stimuli (e.g. hormones, neurotransmitters, growth- or differentiation factors) into 
the cell and result in various responses by the cells. The kinases play key roles in these 
cascades as they constitute a sort of 'molecular switches' turning on or off the activities of 
other enzymes and proteins, e.g. metabolic, regulatory, channels and pumps, receptors, 
cytoskeletal, transcription factors. 

The regulation of kinase activities is accomplished by various means: 

The best characterized example for the regulation via regulatory subunits is the 
cAMP-dependent protein kinase (PKA) which is also a prototype for second messenger 
activated protein kinases. This enzyme consists of a heterotetramer of two catalytic (C) and 
two regulatory (R) subunits. Upon binding of two molecules of second messenger (cAMP) in 
each R subunit, the catalytic subunits are released and active. Both of the catalytic and the 
regulatory subunits several isoforms exist. The combination of catalytic and regulatory 
subunits determines the localization of the holoenzyme and also the substrate spectrum that is 
available for phosphorylation. The consensus pattern necessary to be present in the substrate 
for PKA action is RRXS/T where X can be any amino acid. 

The casein kinase II comprises another examples for holoenzymes that consist of 
catalytic and regulatory subunits. Other kinases that are activated by second messengers are 
cGMP-dependent protein kinase and Protein kinase C (PKC) which is activated by 
diacylglycerol, which in turn is produced by phospholipases by cleavage of 
phosphatidylcholine. 
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Receptor kinases usually consists of an extracellular domain which can bind effector 
molecules (e.g. growth factors and hormones) and transfer the stimulus to the intracellular 
domain of these proteins which usually is a protein tyrosine kinase. Other tyrosine kinases 
lack an extracellular domain but are associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase enzyme (e.g. Src kinase family; 
Src, Blk, Fgr, Fyn, Lck Lyn, Yes and Janus kinase family; Jakl-3, Tyk2). 

Dysfunction of kinases, e.g. caused by non-functioning regulation, can be the cause of 
inflammatory diseases and uncontrolled proliferation. v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example for this process as v-Src does 
not contain the regulatory domain of the cellular gene and is thus constitutively active. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Signal transduction"and include, among others, the following: 

Neurocalcin (Recovering : Neurocalcin is a Ca(2+)-binding protein with three putative 
Ca(2+)-binding domains (EF-hands). In cattle, 6 isoforms are differentially expressed in the 
central nervous system, retina and adrenal gland. Homology with recoverin indicates 
involvement in Ca2+ dependent activation of guanylate cyclase.. These proteins can find 
application in modulating/blocking the guanylate cyclase-pathway. Diseases associated (as 
potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with these proteins 
include as reported by OMIN 1) autosomal dominant cone dystrophy (OMIN *600364); 2) 
cone dystrophy 3 (OMIN *600364); 3) cancer associated retinopathy (OMIN * 17961 8). 
Clones in this category include: fbr2_23b21. 

Proteins with a WW Domain : Proteins that contain a WW domain which has been 
originally described as a short conserved region in a number of unrelated proteins, among 
them dystrophin, the gene responsible for Duchenne muscular dystrophy. The domain, which 
spans about 35 residues, is repeated up to 4 times in some proteins. It has been shown to bind 
proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 
domains. This domain is frequently associated with other domains typical for proteins in 
signal transduction processes. Examples of proteins containing the WW domain are 
Dystrophin, Utrophin, vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotein), murine NEDD-4 (embryonic development and differentiation of the central 
nervous system), IQGAP (human GTPase activating protein acting on ras). Therefore these 
proteins should be involved in intracellular signal transduction. Diseases associated (as 
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potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with these proteins 
include as reported by OMIN 1) Muscular Dystrophy, Pseudohypertrophic Progressive 
Duchenne and Becker Types (OMIN *3 10200). Clones in this category include: fbr2_23nl6. 

Protein substrates for cAMP-dependent protein kinase : Acting as a choride channel or 
chloride channel inhibitor these proteins have been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc.) as reported by OMIN with Cystic Fibrosis 
(OMIN #219700). Clones in this category include fbr2_82il7. 

Sphingosine kinase : Sphingosine kinase is a new type of lipid kinase, which is 
regulated by growth factors. The enzyme phosphorylates sphingosine, which subsequently 
exerts intracellular and extracellular actions. Intracellulary, sphingosine 1 -phosphate (SPP) 
promotes proliferation and inhibits apoptosis. In yeast, survival of cells exposed to heat shock 
indicates is dependent on SPP. Extracellulary, SPP inhibits cell motility and influences cell 
morphology, effects that appear to be mediated by the G protein-coupled receptor EDG1 . 
These proteins have been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc..) as reported by OMIN with Gaucher Disease, Type I (OMIN *230800). Clones 
in this category include fbr2_82m6. 

Vanilloid Receptors : VR1 seems to play an important role in the activation and 
sensitization of nociceptors. It is the receptor for e.g. capsaicin, a selective activator of 
nociceptors, a natural product of capsicum peppers. Related can find application as a target 
for the development of new nociception-modulating drugs. Clones in this category include 
tes3_20k2. 

RCC1 (Regulator of chromosome condensation): RCC1 (regulator of chromosome 
condensation) is a eukaryotic protein which binds to chromatin and interacts with ran, a 
nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP with GTP, acting 
as a guanine-nucleotide dissociation stimulator. These proteins can find application in the 
regulation of gene expression by activition of nuclear GTP-binding proteins. The X-linked 
retinitis pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. OMIN also reports that RCC1 has associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc. . .) with retinitis pigmentosa (OMIN *312610). Clones in this 
category include tes3_21d4. 

Ras inhibitor proteins : Ras is a signal transducting molecule involved in the receptor 
tyrosine kinase/RAS/Map kinase signalling cascade. Ras proteins bind GDP/GTP and show 
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intrinsic GTPase activity. Mutations in ras, which change aa 12, 13 or 61 activate the 
potential of ras to transform cultured cells and are implicated in a variety of human tumours. 
Ras inhibitor proteins have been associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc. . .) with many disease processes as reported by OMIN including: 1) 
Tumors of the lung, breast, brain, pituitary, pancrase, bone, skin, bladder, kidney, ovary, 
prostate and lymphocyte, Melanoma (OMIN *600160); 2) X-linked non-specific mental 
retardation (OMIN * 300 104); 3)adenomatouspolyposis of the colon (OMIN * 175 100); 4) 
Beckwith-Wieddemann Syndrome (#130650); and 5) Major affective disorder 1 (OMIN 
* 125480). Clones in this category include utel_22g21. 

Mammalian proteins cornicon involving the EGF-receptor : Cornicon proteins are part 
of a signal transduction pathway involving the EGF-receptor. The EGF-receptor has been 
reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) with the following diseases: 1) Familial hypercholesterolemia (OMIN 
143890); 2) Leprechaunism (OMIN #246200); 3) Hemophilia B (OMIN *306900); 4) 
Ectodermal dysplasia 1; 5) Kartagenerer syndrome (OMIN *244400) and 6) Glioma of the 
brain (OMIN * 137800). ). Clones in this category include utel_22el2. 

Transmembrane proteins 

Membrane region prediction was effected using the ALOM2 software (Klein et al., 
1985; version 2 by K. Nakai). Similar to many other methods, the Kyte & Doolitle (1982) 
amino acid hydrophobicity scale is used in ALOM2 as the primary variable for classifying 
sequences in terms of their localization. High prediction accuracy is achieved through the 
system of intelligent decision rules and the utilization of a carefully selected training data set. 
The method also generates reliability estimates which makes it possible to distinguish 
between membrane-spanning proteins (I, intrinsic) and globular proteins with regions of high 
hydrophobicity buried in the core. 

For a protein of length L, the block of length / with maximum hydrophobicity is 

found: 



k+l-l 

max// = max(l//) I H i 

i=k 

k=\ I.-l+l 
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where //, represents the hydrophobicity of an individual residue. 



Let P(I/maxH) and P(E/maxH) be the conditional probabilities that a protein is 
integral or peripheral, respectively, given its value of maximal hydrophobicity maxH, and let 
P(I) and P(E) be the prior probabilities of intrinsic and extrinsic membrane proteins estimated 
from the training set. Then a sequence is assigned to E if 

P(E/maxH) > P(I/maxH) 

or, after applying the Bayes rule, 

P(E)P(maxH/E) > P(I)P(maxH/I), 

where the conditional probabilities P(maxH/E) and P(maxH/I) can be determined 
based on the estimates of probability distributions of maxH in both groups. 

Discriminant analysis allows to simplify this task by calculating the odds 
P(E/MaxH):P(I/maxH) as e b , where b is the left-hand side of a linear or quadratic inequality. 
For example, for the window of length 17, the protein is allocated to the peripheral category E 
based on the empirically derived quadratic inequality: 

1.05(maxH) 2 +12.30maxH+17.49 >0, 

whereas the optimal inequality for assigning membrane proteins (category 1) is linear: 
-9.02maxH + 14.27 > 0 

The odds parameter can be made more or less stringent. For example, one can require 
odds at least 1 : 10 for a protein to be classified as integral. This leads to higher selectivity but 
less sensitivity. 

The boundaries of membrane-spanning regions in putative membrane proteins are 
detected by means of an iterative procedure whereby the most hydrophobic region 
corresponding to the value maxH is considered to be membrane and removed from the 
sequence. The classification procedure is then repeated again for the remaining sequence, 
and, if such a protein is again classified as integral, the next most hydrophobic region is 
considered. 
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Reference: Klein, P., Kanehisa, M., DeLisi, C. (1985) The detection and 
classification of membrane-spanning proteins. Biochem Biophys Acta 815: 468-476 



Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate promoter-specific 
transcription. A family of factors that collectively confer RNAPII promoter specificity is 
known as the general transcription factors (GTFs). They include the TATA-binding Protein 
(TBP) TFIIB, TFIIE, TFIIF and TFI IH. These factors are conserved among all eukaryotes. 

RNAPII complexes containing the entire set of GTFs or a subset of GTFs together 
with other proteins have been isolated from mammalian and yeast cells. Although purified 
RNAPII and GTFs are sufficient for promoter-specific initiation, this system fails to respond 
to activators. This is mediated by a further complex termed mediator complex which 
associates with the carboxy-terminal heptapeptide domain (CTD) of the largest subunit of 
RNAPII. 

Purification of human RNAPII complexes resulted in two distinct forms of human 
RNAPII after analysis of functional properties. One complex contained chromatin remodeling 
activities but was devoid of GTFs. The other complex did not contain factors that modify 
chromatin but contained a subset of SRB/mediator subunits and GTFs and other polypeptides 
that mediate transcriptional activation, a scenario similar to that reported for yeast. 

A complex designated NAT (-20 SU) for negative regulator of transcription contains 
RNAPII, Cdk8, homologs of the yeast mediator complex as well as Rgrl and SrblO/1 1 
known as negative regulators of transcription. 

A complex with striking similar structural and functional properties to NAT has been 
identified designated SMCC (-15 SU) (SRB/mediator coactivator complex), that can also 
mediate transcriptional activation. 

The SMCC complex includes all reported NAT subunits including subunits of the 
TRAP complex. TRAP is a coactivator complex isolated on the basis of its interaction with 
the thyroid hormone receptor. Another coactivator complex DRIP, isolated on the basis of its 
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ability to interact with the vitamin D3 receptor, contains novel subunits as well as subunits of 
NAT/SMCC and TRAP complexes. 



The effects of each of these coactivator complexes is dependent on the TFIID 
complex. It is not known if the T AF subunits of TFIID are required. It is likely that new 
coactivator complexes will be uncovered containing both novel and previously defined 
components. 

Beside the huge amount of transcription factors which can be part of the RNAIIP 
holoenzyme or the coactivator complexes there is an even larger quantity of specific 
transcription factors binding to promoter elements within the DNA sequences of a given gene 
leading to activation or repression of transcription. A broad range of cellular responses like 
differentiation, proliferation, cell death and others are elicited through activating or 
repressing the transcription of target genes. 

There are at least five superclasses of transcription factors: 

1 . Superclass contains members with characteristic basic domains: 
Members are: 

Leucine zipper factors, where the basic domain is followed by a leucine zipper of 
repeated leucine residues at every seventh position. The zipper mediates protein dimerization 
as a prerequisite for DNA-binding. 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic region followed by a 
motif of two potential amphipathic alpha-helices connected by a loop of variable length also 
mediating dimerization. 

Factors with a combination of Helix-loop-helix and leucine zipper. 

Further members of this superclass are NF-1, RF-X, and bHSH like proteins. 

2. Superclass comprises factors containing zinc-coordinating DNA-binding domains. 
Members are: 
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Proteins with Cys4 zinc finger of nuclear receptor type, where two such motifs 
differing in size, composition and function are present in each receptor molecule. Each finger 
comprises 4 cysteine residues coordinating one zinc ion. The second half including the 
second cysteine pair has alpha-helix conformation and the helix of the first finger binds to the 
DNA through the major groove. The sequence between the first two cysteines of the second 
finger mediates dimerization upon DNA-binding. This class includes the steroid hormone 
receptors and the thyroid hormone receptor-like factors. Other diverse cys4 zinc fingers have 
a motif of GATA-type. 

Proteins with Cys2His2 zinc finger domain(s). Each finger comprises 2 cysteine and 2 
histidine residues coordinating one zinc ion, and in some cases one histidine is replaced by 
another cysteine. The zinc ion is essential for DNA-binding. 

Proteins with Cys6 cysteine-zinc cluster(s). Six cysteine residues coordinate two zinc 
ions, i. e. two of the thiol groups are coordinating two zinc ions each. Present in many fungal 
regulators. 

Zinc fingers of alternating composition. 

3. Superclass contains factors of helix-turn-helix type. 

Members are: 

Proteins with homeo domains. Homeo domains are three consecutive alpha-helix 
structures. Helix 3 contacts mainly the major groove of the DNA, some contacts at the minor 
groove are observed as well. Helix 2 and 3 resemble the helix-turn-helix structure of 
prokaryotic regulators. 

Proteins with Paired box domain(s). This is a DNA-binding domain of approximately 
130 amino acid residues. Its N-terminal half is basic, its C-terminal half is highly charged in 
general. It probably comprises 3 alpha-helices. 

Proteins with Fork head / winged helix domain(s). This domain was identified by 
homology between HNF-3A and fkh. The domain comprises approx. 110 AA. Analysis of the 
crystal structure has revealed a compact structure of three alpha-helices, the third alpha-helix 
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being exposed towards the major groove of the DNA. The domain also exerts minor groove 
contacts. Upon binding to DNA, it induces a bend of 13 degree. 



Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters comprise several 
tryptophan residues with a spacing of 12-2 1 amino acid residues; the subclass of myb-type 
DNA-binding domains typically exhibit a spacing of 19-21 amino acid residues. 

Proteins with TEA domain(s). The TEA domain has been identified as a region which 
is conserved among the transcription factors TEF-1, TEC1 and abaA. This domain in TEF-1 
has been shown to interact with DNA, although two additional regions may also contribute to 
DNA-binding. It is predicted to fold into three alpha-helices, with a randomly coiled region of 
16-18 amino acid residues between helices 1 and 2, and a short stretch between helices 2 and 
3 of 3-8 residues. 

4. Superclass contains beta-Scaffold Factors with Minor Groove Contacts 
Members are: 

Proteins with RHR (Rel homology) region. 

The structure of the Rel-type DBD exhibits a bipartite subdomain structure, each 
subdomain comprising a beta-barrel with five loops that form an extensive contact surface to 
the major groove of the DNA. Particularly, the first loop of the N-terminal subdomain (the 
highly conserved recognition loop) performs contacts with the recognition element on the 
DNA, but other loops are involved. The fact that the main DNA-contacts are made through 
loops has been suggested to provide a high degree of flexibility in binding to a range of 
different target sequences. Augmenting interactions are achieved by two alpha-helices within 
the N-terminal Part that form strong minor groove contacts to the A/T-rich center of the B- 
element. In p65, the sequence between both alpha-helices is much shorter and even helix 2 is 
truncated. The second, C-terminal domain is necessary mainly for protein dimerization. 

p53 proteins 
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MADS (MCMl-agamous-deficiens-SRF) box proteins. Proteins of this class comprise 
a region of homology. The DNA-binding domain also comprises the dimerization capability. 
In the DNA-bound dimer (shown for SRF), two antiparallel amphipathic alpha-helices (alpha- 
I), form a coiled coil and are oriented approximately parallel on the minor groove. These 
helices make minor and major groove contacts, the N-terminal extensions form minor groove 
contacts. The bound DNA is bent and wrapped around the protein. It exhibits a compressed 
minor groove in the center and widened minor groove in the flanks. 

Beta-Barrel alpha-helix transcription factors. 

TATA-binding proteins 

HMG proteins 

Proteins of this class comprise a region of homology with the chromosomal non- 
histone HMG proteins such as HMG1. This region comprises the DNA-binding domain 
which in some instances such as HMG1 mediates sequence-unspecific, in other cases such 
LEF-1 sequence-specific binding to DNA. This domain exhibits a typical L-shaped 
conformation made up of 3 alpha-helices and an extended N-terminal extension of the first 
helix. The latter together with helix 1, which contains a kink, form the long arm of the L, 
whereas helices 1 and 2 form the short arm. Binding to the minor groove induces a sharp 
bending of the DNA by more than 90 degree, away from the bound protein. The overall 
topology of the DNA-protein complexes resembles somewhat that of the TBP-TATA box 
complex. 

Heteromeric CCAAT factors 

Proteins with Grainyhead domain(s) 

Cold-shock domain factors. Cold-shock domain proteins are characterized by a highly 
conserved region first found in prokaryotic cold-shock proteins. This domain is a single- 
stranded nucleic acid-binding structure interacting with DNA or RNA. It consists of an 
antiparallel five-stranded beta-barrel, the strands of which are connected by turns and loops. 
Within this structure, a three-stranded beta-strand contains a conserved RNA-binding motif, 
RNP1. Not all CSD proteins are transcription factors. Those which specifically bind to a 
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certain sequence are termed Y-box proteins. Proteins of this class were previously called 
protamine-like domain proteins because of having a highly positively charged domain with 
interspersed proline residues. 

Proteins with Runt homology domain 

The members of this transcription factor class have been identified on the basis of 
their homology to a defined region within the Drosophilia protein Runt. The runt domain is 
part of the DNA-binding domain of these factors. It consists mainly of beta-strands, does not 
contain alpha-helical regions and seems to be most similar to the palm domain found in DNA 
polymerase beta (rat). 

5. Superclass contains other transcription factors like Copper fist proteins. HMGKY). 
STAT. Pocket domain proteins and Ap2/EREBP-related factors. 

The classification of transcription factors originates from TRANSFAC database: 

http: //transfac.gbf.de/TRANSFAC/ 

Reference: Heinemeyer 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Transcription Factors".and include, among others, the following: 

Dcoh : Dcoh is a bifunctional protein, complexed with biopterin. It serves as 
dimerization cofactor of hepatocyte nuclear factor- 1 and catalyzes the dehydration of the 
biopterin cofactor of phenylalanine hydroxylase. The Dcoh protein has been reported by 
OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or related, 
etc..) with the following diseases: 1) hyperphenylalanemia (OMIN 126090, #264070). 
Clones in this category include fkd2_46kl2. 

Signal transducing proteins : Beta-transducin subunits of G-proteins contain WD-40 
repeats. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. Due to the zinc finger the novel protein 
seems to be a new molecule involved in signal transduction and transcription. These proteins 
have been reported by OMIN to be associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc. . .) with the following diseases: 1) essential hypertension 
(OMIN * 139130). Clones in this category include utel_li2. 
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* * * 

The invention, therefore, specifically contemplates the following assemblages of 
materials, which track the above-identified fourteen functional groupings, that are useful in 
practicing the profiling aspects of the invention. One type of assemblage is nucleic acid- 
based and can include the following groupings of sequences and their derivatives: all 
sequences; human fetal brain sequences; brain derived sequences; human fetal kidney 
library sequences; kidney derived sequences; human mammary carcinoma library 
sequences; mammary carcinoma derived sequences; human testis library sequences; testes 
derived sequences; cell cycle genes; cell structure and motility genes; differentiation and 
development genes; intracellular transport and trafficking genes; metabolism genes; nucleic 
acid management genes; signal transduction genes; transmembrane protein genes; and 
transcription factor genes. Other assemblages contain proteins or their corresponding 
antibodies or antibody fragments, divided along the same groupings. 

Database Applications 

Because they are human genes and gene products, the inventive molecules are useful 
as members of a database. Such a database may be used, for example, in drug discovery 
and rationale drug design or in testing the novelty and non-obviousness of newly sequenced 
materials. In addition, they are particularly suited in designing variants for the profiling 
(and other) applications described herein. Hence, the following discussion of electronic 
embodiments applies equally to such variants, which, naturally, will be generated and 
stored using a computer using known methodologies. 

Accordingly, one aspect of the invention contemplates a database of at least one of 
the inventive sequences stored on computer readable media. Again, the individual 
sequences may be grouped with regard to the individual functional and structural groups 
mentioned above. While the individual sequences of a database may exist in printed form, 
they are preferably in electronic form, as in an ascii or a text file. They may also exist as 
word processing files or they may be stored in database applications like DB2, Sybase, 
Oracle, GCG and GenBank. One skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the invention. 

"Computer readable media" refers to any medium which can be read and accessed 
by a computer. These include: magnetic storage media, like floppy discs, hard drives and 
magnetic tape; optical storage media, like CD-ROM; electrical storage media, like RAM 
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and ROM; and hybrids of these categories, like magnetic/optical storage media. One 
skilled in the art will readily understand the scope of computer readable media and how to 
implement them. 

Biological Activities and Assays for Implementing Therapeutic and Diagnostic 
Applications 

This section provides assays for biological activity that are useful in characterizing 
and quantifying the biological activity of the inventive molecules and their derivatives, 
which is relevant to the pharmacological effects of the inventive molecules. As used in this 
section, it will be understood that "protein" may also refer to the inventive antibodies 
(including fragments). 

Cytokine and Cell Proliferation/Differentiation Activity 

A protein of the present invention may exhibit cytokine, cell proliferation (either 
inducing or inhibiting) or cell differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations. Many protein factors 
discovered to date, including all known cytokines, have exhibited activity in one or more 
factor dependent cell proliferation assays, and hence the assays serve as a convenient 
confirmation of cytokine activity. The activity of a protein of the present invention is 
evidenced by any one of a number of routine factor dependent cell proliferation assays for 
cell lines including, without limitation, 32D, DA2, DA1G, T10, B9, B9/11, BaF3, 
MC9/G, M + (preB M + ), 2E8, RB5, DAI, 123, T1165, HT2, CTLL2, TF-1, Mo7e and 
CMK. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for T-cell or thymocyte proliferation include without limitation those 
described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function 3.1-3.19; Chapter 
7, Immunologic studies in Humans); Takai et al., J. Immunol. 137:3494-3500, 1986; 
Bertagnolli et al., J. Immunol. 145:1706-1712, 1990; Bertagnolli et al., Cellular 
Immunology 133:327-341, 1991; Bertagnolli, etal., I. Immunol. 149:3778-3783, 1992; 
Bowman etal., I. Immunol. 152:1756-1761, 1994. 
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Assays for cytokine production and/or proliferation of spleen cells, lymph node cells 
or thymocytes include, without limitation, those described in: Polyclonal T cell stimulation, 
Kruisbeek, A. M. and Shevach, E. M. In Current Protocols in Immunology. J. E. e.a. 
Coligan eds. Vol 1 pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; and 
Measurement of mouse and human interleukin gamma , Schreiber, R. D. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.8.1-6.8.8, John Wiley and 
Sons, Toronto. 1994. 

Assays for proliferation and differentiation of hematopoietic and lymphopoietic cells 
include, without limitation, those described in: Measurement of Human and Murine 
Interleukin 2 and Interleukin 4, Bottomly, K., Davis, L. S. and Lipsky, P. E. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and 
Sons, Toronto. 1991; deVries et al., J. Exp. Med. 173:1205-1211, 1991; Moreau et al., 
Nature 336:690-692, 1988; Greenberger et al., Proc. Natl. Acad. Sci. U.S.A. 80:2931- 
2938, 1983; Measurement of mouse and human interleukin 6-Nordan, R. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John Wiley and 
Sons, Toronto. 1991; Smith etal., Proc. Natl. Aced. Sci. U.S.A. 83:1857-1861, 1986; 
Measurement of human Interleukin 11-Bennett, F., Giannotti, J.; Clark, S. C. and Turner, 
K. J. In Current Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John 
Wiley and Sons, Toronto. 1991; Measurement of mouse and human Interleukin 9-Ciarletta, 
A., Giannotti, J., Clark, S. C. and Turner, K. J. In Current Protocols in Immunology. J. 
E. e.a. Coligan eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991. 

Assays for T-cell clone responses to antigens (which will identify, among others, 
proteins that affect APC-T cell interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include, without limitation, those described in: 
Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. H. 
Margulies, E. M. Shevach, W Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function; Chapter 6, 
Cytokines and their cellular receptors; Chapter 7, Immunologic studies in Humans); 
Weinberger et al., Proc. Natl. Acad. Sci. USA 77:6091-6095, 1980; Weinberger et al., 
Eur. J. Immun. 11:405-411, 1981; Takaietal., J. Immunol. 137:3494-3500, 1986; Takai 
et al., J. Immunol. 140:508-512, 1988. 
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Immune Stimulating or Suppressing Activity 

A protein of the present invention may also exhibit immune stimulating or immune 
suppressing activity, including without limitation the activities for which assays are 
described herein. A protein may be useful in the treatment of various immune deficiencies 
and disorders (including severe combined immunodeficiency (SOD)), e.g., in regulating 
(up or down) growth and proliferation of T and/or B lymphocytes, as well as effecting the 
cytolytic activity of NK cells and other cell populations. These immune deficiencies may be 
genetic or be caused by vital (e.g., HIV) as well as bacterial or fungal infections, or may 
result from autoimmune disorders. More specifically, infectious diseases causes by viral, 
bacterial, fungal or other infection may be treatable using a protein of the present invention, 
including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishmania 
spp., malaria spp. and various fungal infections such as candidiasis. Of course, in this 
regard, a protein of the present invention may also be useful where a boost to the immune 
system generally may be desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be treated using a protein of the present invention 
include, for example, connective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre 
syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, 
graft- versus-host disease and autoimmune inflammatory eye disease. Such a protein of the 
present invention may also to be useful in the treatment of allergic reactions and conditions, 
such as asthma (particularly allergic asthma) or other respiratory problems. Other 
conditions, in which immune suppression is desired (including, for example, organ 
transplantation), may also be treatable using a protein of the present invention. 

Using the proteins of the invention it may also be possible to modify immune 
responses, in a number of ways. Down regulation may be in the form of inhibiting or 
blocking an immune response already in progress or may involve preventing the induction 
of an immune response. The functions of activated T cells may be inhibited by suppressing 
T cell responses or by inducing specific tolerance in T cells, or both. Immunosuppression 
of T cell responses is generally an active, non-antigen-specific, process which requires 
continuous exposure of the T cells to the suppressive agent. Tolerance, which involves 
inducing non-responsiveness or anergy in T cells, is distinguishable from 
immunosuppression in that it is generally antigen-specific and persists after exposure to the 
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tolerizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence of the tolerizing agent. 

Down regulating or preventing one or more antigen functions (including without 
limitation B lymphocyte antigen functions (such as, for example, B7)), e.g., preventing 
high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, 
skin and organ transplantation and in graft- versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue destruction in tissue 
transplantation. Typically, in tissue transplants, rejection of the transplant is initiated 
through its recognition as foreign by T cells, followed by an immune reaction that destroys 
the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, 
monomeric form of a peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B lymphocyte antigen (e.g., B7- 
1, B7-3) or blocking antibody), prior to transplantation can lead to the binding of the 
molecule to the natural ligand(s) on the immune cells without transmitting the 
corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter 
prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an 
immunosuppressant. Moreover, the lack of costimulation may also be sufficient to anergize 
the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of 
these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, 
it may also be necessary to block the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection 
or GVHD can be assessed using animal models that are predictive of efficacy in humans. 
Examples of appropriate systems which can be used include allogeneic cardiac grafts in rats 
and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to 
examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in 
Lenschow et al., Science 257:789-792 (1992) and Turka et al., Proc. Natl. Acad. Sci USA, 
89:11102-11105 (1992). In addition, murine models of GVHD (see Paul ed., Fundamental 
Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the 
effect of blocking B lymphocyte antigen function in vivo on the development of that 
disease. 
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Blocking antigen function may also be therapeutically useful for treating 
autoimmune diseases. Many autoimmune disorders are the result of inappropriate activation 
of T cells that are reactive against self tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the diseases. Preventing the activation of 
autoreactive T cells may reduce or eliminate disease symptoms. Administration of reagents 
which block costimulation of T cells by disrupting receptor: ligand interactions of B 
lymphocyte antigens can be used to inhibit T cell activation and prevent production of 
autoantibodies or T cell-derived cytokines which may be involved in the disease process. 
Additionally, blocking reagents may induce antigen-specific tolerance of autoreactive T 
cells which could lead to long-term relief from the disease. The efficacy of blocking 
reagents in preventing or alleviating autoimmune disorders can be determined using a 
number of well-characterized animal models of human autoimmune diseases. Examples 
include murine experimental autoimmune encephalitis, systemic lupus eryfhmatosis in 
MRL/lpr/lpr mice or NZB hybrid mice, murine autoimmune collagen arthritis, diabetes 
mellitus in NOD mice and BB rats, and murine experimental myasthenia gravis (see Paul 
ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 840-856). 

Upregulation of an antigen function (preferably a B lymphocyte antigen function), as 
a means of up regulating immune responses, may also be useful in therapy. Upregulation of 
immune responses may be in the form of enhancing an existing immune response or 
eliciting an initial immune response. For example, enhancing an immune response through 
stimulating B lymphocyte antigen function may be useful in cases of viral infection. In 
addition, systemic viral diseases such as influenza, the common cold, and encephalitis 
might be alleviated by the administration of stimulatory forms of B lymphocyte antigens 
systemically. 

Alternatively, anti-vital immune responses may be enhanced in an infected patient 
by removing T cells from the patient, costimulating the T cells in vitro with viral antigen- 
pulsed APCs either expressing a peptide of the present invention or together with a 
stimulatory form of a soluble peptide of the present invention and reintroducing the in vitro 
activated T cells into the patient. Another method of enhancing anti-viral immune responses 
would be to isolate infected cells from a patient, transfect them with a nucleic acid encoding 
a protein of the present invention as described herein such that the cells express all or a 
portion of the protein on their surface, and reintroduce the transfected cells into the patient. 
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The infected cells would now be capable of delivering a costimulatory signal to, and 
thereby activate, T cells in vivo. 

In another application, up regulation or enhancement of antigen function (preferably 
B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) 
transfected with a nucleic acid encoding at least one peptide of the present invention can be 
administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the 
tumor cell can be transfected to express a combination of peptides. For example, tumor 
cells obtained from a patient can be transfected ex vivo with an expression vector directing 
the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide 
having B7-l-like activity and/or B7-3-like activity. The transfected tumor cells are returned 
to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in 
vivo. 

The presence of the peptide of the present invention having the activity of a B 
lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation 
signal to T cells to induce a T cell mediated immune response against the transfected tumor 
cells. In addition, tumor cells which lack MHC class I or MHC class II molecules, or 
which fail to reexpress sufficient mounts of MHC class I or MHC class II molecules, can 
be transfected with nucleic acid encoding all or a portion of (e.g., a cytoplasmic-domain 
truncated portion) of an MHC class I alpha chain protein and beta 2 microglobulin protein 
or an MHC class II alpha chain protein and an MHC class II beta chain protein to thereby 
express MHC class I or MHC class II proteins on the cell surface. Expression of the 
appropriate class I or class II MHC in conjunction with a peptide having the activity of a B 
lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated immune response 
against the transfected tumor cell. Optionally, a gene encoding an antisense construct which 
blocks expression of an MHC class II associated protein, such as the invariant chain, can 
also be cotransfected with a DNA encoding a peptide having the activity of a B lymphocyte 
antigen to promote presentation of tumor associated antigens and induce tumor specific 
immunity. Thus, the induction of a T cell mediated immune response in a human subject 
may be sufficient to overcome tumor-specific tolerance in the subject. 
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The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for thymocyte or splenocyte cytotoxicity include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Herrmann et al., Proc. 
Natl. Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 
1982; Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., I. Immunol. 137:3494- 
3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Herrmann et al., Proc. Natl. 
Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; 
Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., J. Immunol. 137:3494-3500, 
1986; Bowmanet al., J. Virology 61:1992-1998; Takai et al., J. Immunol. 140:508-512, 
1988; Bertagnolli et al., Cellular Immunology 133:327-341, 1991; Brown et al., J. 
Immunol. 153:3079-3092, 1994. 

Assays for T-cell-dependent immunoglobulin responses and isotype switching 
(which will identify, among others, proteins that modulate T-cell dependent antibody 
responses and that affect Thl/Th2 profiles) include, without limitation, those described in: 
Maliszewski, J. Immunol. 144:3028-3033, 1990; and Assays for B cell function: In vitro 
antibody production, Mond, J. J. and Brunswick, M. In Current Protocols in Immunology. 
J. E. e.a. Coligan eds. Vol 1 pp. 3.8.1-3.8.16, John Wiley and Sons, Toronto. 1994. 

Mixed lymphocyte reaction (MLR) assays (which will identify, among others, 
proteins that generate predominantly Thl and CTL responses) include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Takai et al., J. Immunol. 
137:3494-3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Bertagnolli et al., J. 
Immunol. 149:3778-3783, 1992. 

Dendritic cell-dependent assays (which will identify, among others, proteins 
expressed by dendritic cells that activate naive T-cells) include, without limitation, those 
described in: Guery et al., J. Immunol. 134:536-544, 1995; Inaba et al., Journal of 
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Experimental Medicine 173:549-559, 1991; Macatonia et al., Journal of Immunology 
154:5071-5079, 1995; Porgador et al., Journal of Experimental Medicine 182:255-260, 
1995; Nair et al., Journal of Virology 67:4062-4069, 1993; Huang et al., Science 264:961- 
965, 1994; Macatonia et al., Journal of Experimental Medicine 169:1255-1264, 1989; 
Bhardwaj et al., Journal of Clinical Investigation 94:797-807, 1994; and Inaba et al., 
Journal of Experimental Medicine 172:631-640, 1990. 

Assays for lymphocyte survival/apoptosis (which will identify, among others, 
proteins that prevent apoptosis after superantigen induction and proteins that regulate 
lymphocyte homeostasis) include, without limitation, those described in: Darzynkiewicz et 
al., Cytometry 13:795-808, 1992; Gorczyca et al., Leukemia 7:659-670, 1993; Gorczyca et 
al., Cancer Research 53:1945-1951, 1993; Itoh et al., Cell 66:233-243, 1991; Zacharchuk, 
Journal of Immunology 145:4037-4045, 1990; Zamai et al., Cytometry 14:891-897, 1993; 
Gorczyca et al., International Journal of Oncology 1:639-648, 1992. 

Assays for proteins that influence early steps of T-cell commitment and development 
include, without limitation, those described in: Antica et al., Blood 84:111-117, 1994; Fine 
et al., Cellular Immunology 155:111-122, 1994; Galy et al., Blood 85:2770-2778, 1995; 
Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551, 1991. 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be useful in regulation of hematopoiesis and, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal 
biological activity in support of colony forming cells or of factor-dependent cell lines 
indicates involvement in regulating hematopoiesis, e.g. in supporting the growth and 
proliferation of erythroid progenitor cells alone or in combination with other cytokines, 
thereby indicating utility, for example, in treating various anemias or for use in conjunction 
with irradiation/chemotherapy to stimulate the production of erythroid precursors and/or 
erythroid cells; in supporting the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., traditional CSF activity) useful, for 
example, in conjunction with chemotherapy to prevent or treat consequent myelo- 
suppression; in supporting the growth and proliferation of megakaryocytes and 
consequently of platelets thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia, and generally for use in place of or complimentary to 
platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic 
stem cells which are capable of maturing to any and all of the above-mentioned 
hematopoietic cells and therefore find therapeutic utility in various stem cell disorders (such 
as those usually treated with transplantation, including, without limitation, aplastic anemia 
and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell 
compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction 
with bone marrow transplantation or with peripheral progenitor cell transplantation 
(homologous or heterologous)) as normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for proliferation and differentiation of various hematopoietic lines 
are cited above. 

Assays for embryonic stem cell differentiation (which will identify, among others, 
proteins that influence embryonic differentiation hematopoiesis) include, without limitation, 
those described in: Johansson et al. Cellular Biology 15:141-151, 1995; Keller et al., 
Molecular and Cellular Biology 13:473-486, 1993; McClanahan et al., Blood 81:2903- 
2915, 1993. 
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Assays for stem cell survival and differentiation (which will identify, among others, 
proteins that regulate lympho-hematopoiesis) include, without limitation, those described 
in: Methylcellulose colony forming assays, Freshney, M. G. In Culture of Hematopoietic 
Cells. R. I. Freshney, et al. eds. Vol pp. 265-268, Wiley-Liss, Inc., New York, N.Y. 
1994; Hirayama et al., Proc. Natl. Acad. Sci. USA 89:5907-5911, 1992; Primitive 
hematopoietic colony forming cells with high proliferative potential, McNiece, I. K. and 
Briddell, R. A. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 23- 
39, Wiley-Liss, Inc., New York, N.Y. 1994; Neben et al., Experimental Hematology 
22:353-359, 1994; Cobblestone area forming cell assay, Ploemacher, R. E. In Culture of 
Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 1-21, Wiley-Liss, Inc., New York, 
N.Y. 1994; Long term bone marrow cultures in the presence of stromal cells, Spooncer, 
E., Dexter, M. and Allen, T. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. 
Vol pp. 163-179, Wiley-Liss, Inc., New York, N.Y. 1994; Long term culture initiating cell 
assay, Sutherland, H. J. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol 
pp. 139-162, Wiley-Liss, Inc., New York, N.Y. 1994. 

Tissue Growth Activity 

A protein of the present invention also may have utility in compositions used for 
bone, cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for 
wound healing and tissue repair and replacement, and in the treatment of burns, incisions 
and ulcers. 

A protein of the present invention, which induces cartilage and/or bone growth in 
circumstances where bone is not normally formed, has application in the healing of bone 
fractures and cartilage damage or defects in humans and other animals. Such a preparation 
employing a protein of the invention may have prophylactic use in closed as well as open 
fracture reduction and also in the improved fixation of artificial joints. De novo bone 
formation induced by an osteogenic agent contributes to the repair of congenital, trauma 
induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic 
plastic surgery. 

A protein of this invention may also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may provide an environment to attract 
bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of 
progenitors of bone-forming cells. A protein of the invention may also be useful in the 
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treatment of osteoporosis or osteoarthritis, such as through stimulation of bone and/or 
cartilage repair or by blocking inflammation or processes of tissue destruction (collagenase 
activity, osteoclast activity, etc.) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the 
protein of the present invention is tendon/ligament formation. A protein of the present 
invention, which induces tendon/ligament-like tissue or other tissue formation in 
circumstances where such tissue is not normally formed, has application in the healing of 
tendon or ligament tears, deformities and other tendon or ligament defects in humans and 
other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein 
may have prophylactic use in preventing damage to tendon or ligament tissue, as well as 
use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 
defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced 
by a composition of the present invention contributes to the repair of congenital, trauma 
induced, or other tendon or ligament defects of other origin, and is also useful in cosmetic 
plastic surgery for attachment or repair of tendons or ligaments. The compositions of the 
present invention may provide environment to attract tendon- or ligament-forming cells, 
stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors 
of tendon- or ligament-forming cells, or induce growth of tendon/ligament cells or 
progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the 
invention may also be useful in the treatment of tendonitis, carpal tunnel syndrome and 
other tendon or ligament defects. The compositions may also include an appropriate matrix 
and/or sequestering agent as a carrier as is well known in the art. 

The protein of the present invention may also be useful for proliferation of neural 
cells and for regeneration of nerve and brain tissue, i.e. for the treatment of central and 
peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic 
disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. 
More specifically, a protein may be used in the treatment of diseases of the peripheral 
nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized 
neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's 
disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. 
Further conditions which may be treated in accordance with the present invention include 
mechanical and traumatic disorders, such as spinal cord disorders, head trauma and 
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cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from 
chemotherapy or other medical therapies may also be treatable using a protein of the 
invention. 

Proteins of the invention may also be useful to promote better or faster closure of 
non-healing wounds, including without limitation pressure ulcers, ulcers associated with 
vascular insufficiency, surgical and traumatic wounds, and the like. 

It is expected that a protein of the present invention may also exhibit activity for 
generation or regeneration of other tissues, such as organs (including, for example, 
pancreas, liver, intestine, kidney, skin, endothelium), muscle (smooth, skeletal or cardiac) 
and vascular (including vascular endothelium) tissue, or for promoting the growth of cells 
comprising such tissues. Part of the desired effects may be by inhibition or modulation of 
fibrotic scarring to allow normal tissue to regenerate. A protein of the invention may also 
exhibit angiogenic activity. 

A protein of the present invention may also be useful for gut protection or 
regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, 
and conditions resulting from systemic cytokine damage. 

A protein of the present invention may also be useful for promoting or inhibiting 
differentiation of tissues described above from precursor tissues or cells; or for inhibiting 
the growth of tissues described above. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for tissue generation activity include, without limitation, those described in: 
International Patent Publication No. WO95/16035 (bone, cartilage, tendon); International 
Patent Publication No. WO95/05846 (nerve, neuronal); International Patent Publication 
No. W09 1/07491 (skin, endothelium). 

Assays for wound healing activity include, without limitation, those described in: 
Winter, Epidermal Wound Healing, pps. 71-112 (Maibach, H. I. and Rovee, D. T., eds.), 
Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. 
Invest. Dermatol 71:382-84 (1978). 

Activin/Inhibin Activity 

A protein of the present invention may also exhibit activin- or inhibin-related 
activities. Inhibins are characterized by their ability to inhibit the release of follicle 
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stimulating hormone (FSH), while activins and are characterized by their ability to 
stimulate the release of follicle stimulating hormone (FSH). Thus, a protein of the present 
invention, alone or in heterodimers with a member of the inhibin alpha family, may be 
useful as a contraceptive based on the ability of inhibins to decrease fertility in female 
mammals and decrease spermatogenesis in male mammals. Administration of sufficient 
amounts of other inhibins can induce infertility in these mammals. Alternatively, the protein 
of the invention, as a homodimer or as a heterodimer with other protein subunits of the 
inhibin- beta group, may be useful as a fertility inducing therapeutic, based upon the ability 
of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for 
example, U.S. Pat. No. 4,798,885. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature mammals, so as to increase the 
lifetime reproductive performance of domestic animals such as cows, sheep and pigs. 

" The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for activin/ inhibin activity include, without limitation, those described in: 
Vale et al., Endocrinology 91:562-572, 1972; Ling et al., Nature 321:779-782, 1986; Vale 
et al., Nature 321:776-779, 1986; Mason et al., Nature 318:659-663, 1985; Forage et al., 
Proc. Natl. Acad. Sci. USA 83:3091-3095, 1986. 

Chemotactic/Chemokinetic Activity 

A protein of the present invention may have chemotactic or chemokinetic activity 
(e.g., act as a chemokine) for mammalian cells, including, for example, monocytes, 
fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial cells. 
Chemotactic and chemokinetic proteins can be used to mobilize or attract a desired cell 
population to a desired site of action. Chemotactic or chemokinetic proteins provide 
particular advantages in treatment of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction of lymphocytes, monocytes or 
neutrophils to tumors or sites of infection may result in improved immune responses against 
the tumor or infecting agent. 

A protein or peptide has chemotactic activity for a particular cell population if it can 
stimulate, directly or indirectly, the directed orientation or movement of such cell 
population. Preferably, the protein or peptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein has chemotactic activity for a population of 
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cells can be readily determined by employing such protein or peptide in any known assay 
for cell chemotaxis. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for chemotactic activity (which will identify proteins that induce or prevent 
chemotaxis) consist of assays that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to induce the adhesion of one cell 
population to another cell population. Suitable assays for movement and adhesion include, 
without limitation, those described in: Current Protocols in Immunology, Ed by J. E. 
Coligan, A. M. Kruisbeek, D. H. Marguiles, E. M. Shevach, W. Strober, Pub. Greene 
Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and 
beta Chemokines 6.12.1-6.12.28; Taub et al. J. Clin. Invest. 95:1370-1376, 1995; Lind et 
al. APMIS 103:140-146, 1995; Muller et al Eur. J. Immunol. 25:1744-1748; Gruber et al. 
J. of Immunol. 152:5860-5867, 1994; Johnston et al. J. of Immunol. 153:1762-1768, 1994. 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or thrombolytic activity. As a 
result, such a protein is expected to be useful in treatment of various coagulation disorders 
(including hereditary disorders, such as hemophilias) or to enhance coagulation and other 
hemostatic events in treating wounds resulting from trauma, surgery or other causes. A 
protein of the invention may also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of conditions resulting therefrom (such as, for 
example, infarction of cardiac and central nervous system vessels (e.g., stroke). 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assay for hemostatic and thrombolytic activity include, without limitation, those 
described in: Linet et al., J. Clin. Pharmacol. 26:131-140, 1986; Burdick et al., 
Thrombosis Res. 45:413-419, 1987; Humphrey et al., Fibrinolysis 5:71-79 (1991); Schaub, 
Prostaglandins 35:467-474, 1988. 

Receptor/Ligand Activity 

A protein of the present invention may also demonstrate activity as receptors, 
receptor ligands or inhibitors or agonists of receptor/ligand interactions. Examples of such 
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receptors and ligands include, without limitation, cytokine receptors and their ligands, 
receptor kinases and their ligands, receptor phosphatases and their ligands, receptors 
involved in cell-cell interactions and their ligands (including without limitation, cellular 
adhesion molecules (such as selectins, integrins and their ligands) and receptor/ligand pairs 
involved in antigen presentation, antigen recognition and development of cellular and 
humoral immune responses). Receptors and ligands are also useful for screening of 
potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A 
protein of the present invention (including, without limitation, fragments of receptors and 
ligands) may themselves be useful as inhibitors of receptor/ligand interactions. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for receptor-ligand activity include without limitation those 
described in:Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 7.28, Measurement of Cellular Adhesion under static conditions 
7.28.1-7.28.22), Takai et al., Proc. Natl. Acad. Sci. USA 84:6864-6868, 1987; Bierer et 
al., J. Exp. Med. 168:1145-1156, 1988; Rosenstein et al., J. Exp. Med. 169:149-160 
1989; Stoltenborg et al., J. Immunol. Methods 175:59-68, 1994; Stitt et al., Cell 80:661- 
670, 1995. 

Anti-Inflammatory Activity 

Proteins of the present invention may also exhibit anti-inflammatory activity. The 
anti-inflammatory activity may be achieved by providing a stimulus to cells involved in the 
inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for 
example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the 
inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or 
suppressing production of other factors which more directly inhibit or promote an 
inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions), including without limitation intimation 
associated with infection (such as septic shock, sepsis or systemic inflammatory response 
syndrome (SIRS)), ischemia-reperfusion injury, endotoxin lethality, arthritis, complement- 
mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, 
inflammatory bowel disease, Crohn's disease or resulting from over production of 
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cytokines such as TNF or IL-1 . Proteins of the invention may also be useful to treat 
anaphylaxis and hypersensitivity to an antigenic substance or material. 



Tumor Inhibition Activity 

In addition to the activities described above for immunological treatment or 
prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A 
protein may inhibit tumor growth directly or indirectly (such as, for example, via ADCC). 
A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such 
as, for example, by inhibiting angiogenesis), by causing production of other factors, agents 
or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting 
factors, agents or cell types which promote tumor growth. 

Other Activities 

A protein of the invention may also exhibit one or more of the following additional 
activities or effects: inhibiting the growth, infection or function of, or killing, infectious 
agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting 
(suppressing or enhancing) bodily characteristics, including, without limitation, height, 
weight, hair color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ 
or body part size or shape (such as, for example, breast augmentation or diminution, 
change in bone form or shape); effecting biorhythms or caricadic cycles or rhythms; 
effecting the fertility of male or female subjects; effecting the metabolism, catabolism, 
anabolism, processing, utilization, storage or elimination of dietary fat, lipid, protein, 
carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); 
effecting behavioral characteristics, including, without limitation, appetite, libido, stress, 
cognition (including cognitive disorders), depression (including depressive disorders) and 
violent behaviors; providing analgesic effects or other pain reducing effects; promoting 
differentiation and growth of embryonic stem cells in lineages other than hematopoietic 
lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of 
the enzyme and treating deficiency-related diseases; treatment of hyperproliferative 
disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for 
example, the ability to bind antigens or complement); and the ability to act as an antigen in 
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a vaccine composition to raise an immune response against such protein or another 
material or entity which is cross-reactive with such protein. 



Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications for certain embodiments of 
the invention. In the interest of economy, applications relevant to multiple embodiments are 
not duplicated in this list. Other embodiments described in below have similar 
characteristics, as described therein. The artisan is directed, therefore, to this section for 
similar descriptions of the functions of other embodiment. 
Testes 

htes3_15c24: The new protein can find application in modulation of 2-hydroxyacid 
dehydrogenases-dependent pathways and as a new enzyme for biotechnologic 
production processes. 

htes3_15i5: The new protein can find application in modulating the structure of the 
human spermatozoa radia spoke head and modulation of sperm motility in men. 

htes3_l 5kl 1 : The novel protein contains a protein kinase ATP-binding region 
signature and a serine/threonine protein kinase active-site signature. The new protein 
can find application in modulation of intracellular signal pathways dependent on this 
kinase. 

htes3_17nl2: The new protein can find application in modulating/blocking the 
expression of SOX-controlled genes. 

htes3_20k2: The new protein can find application as a target for the development of 
new nociception-modulating drugs. 

htes3_20ml8: The new protein can find application in modulation of mitochondrial 
DNA replication and maintenance. 

htes3_20d4: The new protein can find application in the regulation of gene 
expression by activition of nuclear GTP-binding proteins. The X-linked retinitis 
pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. 
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htes3_21jl5: NY-CO-33 is a protein recognised by autologous antibodies of human 
colon cancer patients. The novel protein contains 4 C2H2 Zinc fingers and is a new 
putativ transcription factor. The new protein can find application in 
modulating/blocking the expression of genes controlled by this transcription factor. 

The new protein can find application in modulating chromosome transport in mitosis 
and meiosis and modulation of cell division. 

htes3_26g22: The new protein can find application in modulating chromosome 
transport in mitosis and meiosis and modulation of cell division. The novel TBP- 
binding protein is considered to participate in transcription regulation through the 
interaction with TBP. The new protein can find application in modulation of gene 
transcription. 

htes3_21116: The new protein can find application in modulation of protein 
translocation into the endoplasmic reticulum. 

htes3_27dl : The novel protein can find application in modulation of ubiquitin- and 
protein metabolism in cells. 

htes3_2ml8: The novel protein can find application as multifunctional nuclease / 
exoribonuclease. 

htes3_35b4: The new protein can find application in modulation of the mitotic 
spindle. 

htes3_35b5: The novel protein can find application in modulating the v-ATPase 
activity in endocytic and secretory organelles. 

htes3_35e21 : Due to the close relationship to human interleukin-7, the novel 
interleukin is expected to act as a new growth factor for human B lineage cells. 
Additionally, the protein should induce the gene rearrangement of the T-cell receptor 
repertoire, leading to thymocyte commitment, and subsequently induce both cytotoxic 
T-cell- and lymphocyte-activated killer cells. This new interleukin could find clinical 
application in a variety of conditions of hematolymphopoietic failure and different 
tumours, because of its recruitment of B cell lineage cells, cytotoxic T-cell- and 
lymphocyte-activated killer cells. 



76 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 

htes3_35kl6: Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35nl2: The new protein can find application in modulation of ADP-transport 
and energy metabolism in cells/mitochondria. 

htes3_35n9: The new protein can find application in modulation of carboxylester 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35p22: The novel protein is closely raleted to human tre-2 and other enzymes 
involved in the degradation of ubiquitinated proteins. The human tre-2 oncogene 
encodes a deubiquitinating enzyme, indicating a role for the ubiquitin system in 
mammalian growth control. The novel protein can find application in cancer 
diagnostics and treatment, and in regulating protein stability and growth control via 
regulation of ubiquitination. 

htes3_4h6: The novel kinesin protein can find application in modulating the function 
of kinesin and modulating intracellular transport via/on microtubules. 

htes3_72kl5: FGDl-related F-actin-binding protein (Farbin/FGDl) is a novel F-actin- 
binding protein. The gene locus fgdl seems to be responsible for faciogenital 
dysplasia or Aarskog-Scott syndrome. Frabin binds F-actin and shows F-actin-cross- 
linking activity. Overexpression of frabin in Swiss 3T3 cells and COS7 cells induces 
cell shape change and c-Jun N-terminal kinase activation, as described for FGDl. 
Because FGDl has been shown to serve as a GDP/GTP exchange protein for Cdc42 
small G protein, it is likely that frabin is a direct linker between Cdc42 and the actin 
cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated 
protein morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin- 
dependent events and induces the JNK/SAPK protein kinase cascade, which leads to 
the activation of transcription factors within the nucleus. The novel protein seems to 
be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as 
well as modulation of the JNK/SAPK pathway. 
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htes3_72pl6: As Mem3, the novel protein is similar to yeast VPS (vacuolar protein 
sorting) 35. The null allele of VPS35 results in yeast in a differential defect in the 
sorting of vacuolar carboxypeptidase Y (CPY), proteinase A (PrA), proteinase B 
(PrB), and alkaline phosphatase (ALP). The new protein can find application in 
modulation the sorting of proteins into different compartments. 

htes3_7b22: The novel protein is related to paramyosin, a major structural component 
of thick filaments and invertebrate muscle. Paramyosins are promising antigens for 
immunization against several parasites, such as Schistosoma mansoni. The new 
protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 

htes3_7j3: The new protein is closely related to C-Takl and therefore should be 
involved in cell-cycle regulation, too. The new protein can find application in 
modulating/blocking the cell cycle. 

htes3_7p9: The nuclear domain (ND)10 also described as POD or Kr bodies is 
involved in the development of acute promyelocytic leukemia and virus-host 
interactions. The NDP52 protein is part of this complex structure. In vivo, NDP52 is 
transcribed in all human tissues, but is redistributed upon viral infection and interferon 
treatment. ND10 plays an important role in the viral life cycle. The novel protein is 
similar to NDP52. It contains three leucine zippers and a RGD cell attachment site. 
This protein seems to be a novel part of the ND81 9) complex. The new protein can 
find application in modulation of viral infections and tumour events. 

htes3_8ml0: The poly(A)-binding protein (PABP) binds to the messenger (mRNA) 
3'-poly(A) tail found on most eukaryotic mRNAs and together with the poly(A) tail 
has been implicated in governing the stability and the translation of mRNA. The new 
protein can find application in modulation of mRNA translation and 
processing/stability. 

Kidney 

hfkd2_24b 1 5 : The new protein can find application in modulation of hexose 
metabolism pathways and as a new enzyme for biotechnologic production processes. 
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hfkd2_24n20: The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane/cyto skeleton. The new protein can find 
application in modulating cell adhesion/motility and membrane/cyto skeleton 
structure and dynamics. 

hfkd2_3ol7: The new protein can find application in modulation of the respiratory 
electron transport chain pathways of mitochondria. 

hfkd2_46j20: The new protein can find application in modulating the 
homoprotocatechuate degradative pathway and as a enzyme for biotechnologic 
production processes. 

hfkd2_46kl9: The new protein can find application in modulating/blocking the 
expression of genes controlled by the hepatocyte nuclear factor- 1 . 

hfkd2_46m4: SARI proteins are involved in vesicular transport between the 
endoplasmic reticulum and the Golgi apparatus. 

hfkd2_46kl4: rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 
The new protein can find application in modulating the transport of vesicles inside the 
Golgi apparatus. 

Uterus Associated: 

hutel_18il9: The SREBP-2 protein is embedded in the membranes of the nucleus and 
endoplasmic reticulum. In cholesterol-depleted cells the proteins are cleaved to release 
soluble NH2-terminal fragments that enter the nucleus and activate genes encoding 
the low density lipoprotein receptor and enzymes of cholesterol synthesis. The new 
protein is a putative transcription factor capable of protein-protein interaction via a 
lim domain and additionally shows similarity to the common sunflower transcription 
factor SF3. 

hutel_1811 : The novel protein is similar to several 40S ribosomal proteins and 
therefore seems to part of the corresponding ribosome sub-unit. 

hutel_19g22: The new protein can find application in modulation of tissue- 
calcification, especially the uterus. 
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hutel_19hl7: The new protein can find application in modulating the response of 
cells to oxysterols. 

hutel_20bl9: The novel protein seems to be a novel enzyme with sarcosine oxidase 
activity. The new protein can find application in modulation of sarcosine metabolism 
and as a new enzyme for biotechnologic production processes. 

hutel_20g21 : The novel protein seems to be a new ras inhibitor protein. The new 
protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 

hutel_20hl3: The novel protein is a new human alpha-adaptin. The new protein can 
find application in modulating endocytosis and vesicle trafficking in cells. 

hutel_20ml 1 : The new protein can find application in modulating/blocking the 
activity of protein phosphatase- 1 and in modulating the cell cycle. 

hutel_20m24: This protein is a putative mannosyl transferase that is involved in the 
assembly of the core oligosaccharide Glc3Man9GlcNAc2. The new protein can find 
application in modulation of glycosylation of proteins and as a new enzyme for 
biotechnologic production processes. 

hutel_22el2: The new protein can find application in modulating the cornichon 
modulated signal transduction way and also the EGF receptor signaling processes. 

hutel_23el3: The novel protein contains a serine protease of the subtilase family with 
an aspartic acid-containing active site. The new protein can find application in 
modulation of proteinase activity in cells and as a new enzyme for proteomics and 
biotechnologic production processes. 

hutel_24j6: The new protein can find application in modulation of cell-cell-adhesion. 

hutel_24h3: The new protein can find application as a useful marker for chondro- 
osteogenic cell differentiation and for the modulation of chondro-osteogenic cell 
differentiation. 

Fetal Brain: 

hfbr2_16cl6: The new protein can find application in modulating/blocking of cyto 
skeleton-membrane protein interaction. 
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hfbr2_23b21 : The new protein can find application in modulating/blocking the 
guanylate cyclase-pathway. 

hfbr2_23bl0: The new protein can find application in modulation of splicing. 

hfbr2_2b5: The novel protein contains the typical (xxG)n repeat of collagen proteins 
and a Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a 
new collagen alpha chain. The new protein can find application in modulation of 
connective tissue, bone and cartilage development and maintainance. 

hfbr2_2cl7: The new protein can find application in modulating/blocking G-protein- 
dependent pathways. 

hfbr2_2dl5: The new protein can find application in modulating early 
spermatogenesis. 

hfbr2_2il7: The new protein can find clinical application in modulating the transport 
of glycoproteins inside cells, especially of the LDL receptor. 

hfbr2_2kl4: Tumour-suppressor genes are known to be involved in the control of cell 
growth and division, interacting with proteins which control the cell cycle. The N33 
gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. In addition, the novel protein contains a 
RGD cell attachment site. Therefore the novel protein is a new putative tumour- 
suppressor gene. 

hfbr_3cl 8: RNA helicases comprise a large family of proteins that are involved in 
basic biological systems such as nuclear and mitochondrial splicing processes, RNA 
editing, rRNA processing, translation initiation, nuclear mRNA export, and mRNA 
degradation. RNA helicases are essential factors in cell development and 
differentiation, and some of them play a role in transcription and replication of viral 
single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this 
subgroup. 

hfbr_3g8: The new protein can find application modulating NAT assembly and action 
and therefore be important in metabolism of drugs and environmental mutagens. 
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hfbr2_62bl 1 : The rac small GTPase is associated with type-I phosphatidylinositol 4- 
phosphate 5-kinase and regulating the production of phosphatidylinositol 4,5- 
bisphosphate. The new protein is expected to activate p21 rac -related small GTPases. 

hfbr2_62ol7: The new protein can find application in modulation of cholesterol 
binding and transport by LDL-receptors and LDL-binding proteins. 

hfbr_6b24: The new protein can find application in modulation of rhamnose 
metabolism and as a new enzyme for biotechnologic production processes. 

hfbr_72bl8: The new protein can find application in modulating DNA repair and 
mutagenesis. 

hfbr_78c4: The new protein can find application in modulating/blocking the response 
of cells to interferons. 

hfbr_78k24: These enzymes are involved in the processing of poly-ubiquitin 
precursors as well as that of ubiquinated proteins. The new protein can find 
application in modulation of protein stability/degradation in cells. 

hfbr_82e4: The new protein can find clinical application in modulating/blocking 
calmodulin-mediated pathways in human neuronal cells. 

VARIANTS OF THE INVENTIVE DNA MOLECULES 
Variants in General 

"Variants," according to the invention, include DNA and/or protein molecules that 
resemble, structurally and/or functionally, those set forth in herein. Variants may be isolated 
from natural sources ("homologs"), may be entirely synthetic or may be based in pan on both 
natural and synthetic approaches. 

The section set forth below presents various structural and functional characteristics of 
molecules within the invention. Preferred molecules are characterized by a combination of 
one or more of these characteristics. For instance, some preferred molecules are described 
with reference to at least two structural characteristics, while others may be described with 
reference to at least one structural and at least one functional characteristic. 

It will be recognized by the skilled artisan that structure ultimately defines function, 
i.e. the functions of the molecules described herein derives from the structures of those 
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molecules. Accordingly, the structural variants described below that bear the closest 
structural relationship (as variously defined below) to the inventive molecules are the variants 
that most likely will preserve biological function. This relationship between structure and 
function will guide the skilled artisan in identifying the preferred embodiments of the 
invention. 

Splicing Variants 

It is well-known that eukaryotic structural genes are comprised of both protein coding 
and non-coding portions. When the messenger RNA is transcribed from the DNA template, 
it contains introns, which are non-coding, and exons, which are coding. In order to form a 
translation competent mRN A, the introns must be "spliced" out of this initial pre mRNA. 

Specific sequences within the pre mRNA represent "splice junctions" that direct the 
cellular splicing machinery to the appropriate position. The splice junctions are loosely 
conserved sequence regions of the pre mRNA, which almost invariably begin with GT and 
end with AG (DNA perspective). The 5' end of the splice junction typically contains about 
nine somewhat conserved residues, for example, C/AAGTA/GAGT . The 3' end usually 
contains a pyrimidirie rich stretch of at least about 11 nucleotides, followed by NC/TAGG. 
Splicing occurs before the GT and after the AG. Mount, Nucleic Acids Res. 10:459-72 
(1982). 

Interestingly, exons often correspond to discrete functional domains of the protein 
product. The intron/exon arrangement thus creates a linear array of nucleotides which can be 
correlated to discrete, and often interchangeable, functional protein fragments. Go, Nature 
291:90-92 (1981); Branden et al., EMBO J. 3:1307-10 (1984). This linear arrangement 
creates the possibility of generating multiple different full length proteins by rearranging the 
order of the different functional portions in the array. For example, if a set of exons are 
arranged 1-2-3-4, where (-) represents the introns separating the exons, a splicing event need 
not simply produce 1234, but may produce 123, 134, 124 and so on. Production of different 
mRNA products in this way is commonly called "alternative splicing. " Andreadisef al. , Ann. 
Rev. Cell Biol. 3:207-42(1987). 

Some of the present DNA molecules can be represented in modular fashion in terms of 
their coding regions. Essentially, these modules are exons (though each "exon" may in fact 
be made up of several exons), which may be combined in different ways to form a variety of 
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different DNA molecules, each encoding a different functional protein. Splicing variants are 
indicated below. 



Degenerate Variants 

One aspect of the present invention provides "degenerate variants" of the nucleic acid 
fragments of the present invention. A "degenerate variant" is a nucleotide fragment which 
differs from those of inventive molecules by nucleotide sequence, but due to the degeneracy 
of the genetic code, encodes an identical polypeptide sequence. 

Given the known relationship between DNA sequences and the proteins they encode, 
degenerate variants typically are described by reference to this relationship. It is well known 
that the degeneracy of the genetic code results in many possible DNA sequences which 
encode a particular protein. Indeed, of the three bases which comprise an amino acid- 
encoding triplet, the third position, and often the second, almost always may vary. This fact 
alone allows for a class of variant DNA molecules which encode protein sequences identical 
to those disclosed herein, yet have about 30% sequence variation. In other words, the variant 
DNA molecules are about 70% identical to the inventive DNAs, having no additional or 
deleted sequences. Thus, one aspect of the invention provides degenerate variant DNA 
molecules encoding the inventive protein sequences. 

In one embodiment, these variants have at least about 70% sequence identity with the 
DNA molecules described herein. In a preferred embodiment, these variants have at least 
about 80% sequence identity to the inventive molecules. In a more preferred embodiment 
these variants have at least about 90% sequence identity with the inventive molecules. 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that conserve the overall 
molecular structure of the encoded proteins. Given the properties of the individual amino 
acids comprising the disclosed protein products, some rational substitutions will be recognized 
by the skilled worker. Amino acid substitutions, i.e. "conservative substitutions," may be 
made, for instance, on the basis of similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues involved. 

For example: (a) nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; (b) polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine; 
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(c) positively charged (basic) amino acids include arginine, lysine, and histidine; and (d) 
negatively charged (acidic) amino acids include aspartic acid and glutamic acid. Substitutions 
typically may be made within groups (a)-(d). In addition, glycine and proline may be 
substituted for one another based on their ability to disrupt a-helices. Similarly, certain 
amino acids, such as alanine, cysteine, leucine, methionine, glutamic acid, glutamine, 
histidine and lysine are more commonly found in a-helices, while valine, isoleucine, 
phenylalanine, tyrosine, tryptophan and threonine are more commonly found in (3-pleated 
sheets. Glycine, serine, aspartic acid, asparagine, and proline are commonly found in turns. 
Some preferred substitutions may be made among the following groups: (i) S and T; (ii) P and 
G; and (iii) A, V, L and I. Given the known genetic code, and recombinant and synthetic 
DNA techniques, the skilled scientist readily can construct DNAs encoding the conservative 
amino acid variants. 

As used herein, "sequence identity" between two polypeptide sequences indicates the 
percentage of amino acids that are identical between the sequences. "Sequence similarity" 
indicates the percentage of amino acids that either are identical or that represent conservative 
amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the invention may be described 
with reference to the product they encode. As shown below, some of the inventive DNA 
molecules encode a protein having a degree of homology with known proteins, or protein 
domains. It is expected, therefore, that they will have some or all of the requisite functional 
features of such molecules. These "functionally equivalent variants" products are 
characterized by the fact that they are functionally equivalent, with respect to biological 
activity, to certain known molecules. 

The instant invention provides information on common structural motifs, including 
consensus sequences that will guide the artisan in constructing functionally equivalent 
variants. It will be understood that the motifs, identified for each inventive protein, may be 
modified within the identified consensus sequences. Thus, the invention contemplates the 
proteins disclosed herein that contain variability in the consensus sequences identified, and the 
invention further contemplates the full range of nucleic acids encoding them, and the 
complements of those nucleic acids. 
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Hybridizing Variants 

DNA variants within the invention also may be described by reference to their 
physical properties in hybridization. One skilled in the field will recognize that DNA can be 
used to identify its complement and, since DNA is double stranded, its equivalent or 
homolog, using nucleic acid hybridization techniques. It will also be recognized that 
hybridization can occur with less than 100% complementarity. However, given appropriate 
choice of conditions, hybridization techniques can be used to differentiate among DNA 
sequences based on their structural relatedness to a particular probe. For guidance regarding 
such conditions see, for example, Sambrook et al, 1989, MOLECULAR CLONING, A 
LABORATORY MANUAL, Cold Spring Harbor Press, N.Y.; and Ausubel et al, 1989, 
CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Green Publishing Associates and 
Wiley Interscience, N.Y. 

Structural relatedness between two polynucleotide sequences can be expressed as a 
function of "stringency" of the conditions under which the two sequences will hybridize with 
one another. As used herein, the term "stringency" refers to the extent that the conditions 
disfavor hybridization. Stringent conditions strongly disfavor hybridization, and only the 
most structurally related molecules will hybridize to one another under such conditions. 
Conversely, non-stringent conditions favor hybridization of molecules displaying a lesser 
degree of structural relatedness. Hybridization stringency, therefore, directly correlates with 
the structural relationships of two nucleic acid sequences. The following relationships are 
useful in correlating hybridization and relatedness (where T m is the melting temperature of a 
nucleic acid duplex): 

a. T m = 69.3 + 0.41(G+C)% 

b. The T m of a duplex DNA decreases by 1°C with every increase of 1 % in the 
number of mismatched base pairs. 

c (TJ^ - (T m ) M , = 18.5 log 10 u2/ul 

where ul and u2 are the ionic strengths of two solutions. 

Hybridization stringency is a function of many factors, including overall DNA 
concentration, ionic strength, temperature, probe size and the presence of agents which 
disrupt hydrogen bonding. Factors promoting hybridization include high DNA 
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concentrations, high ionic strengths, low temperatures, longer probe size and the absence of 
agents that disrupt hydrogen bonding. 

Hybridization usually is done in two stages. First, in the "binding" stage, the probe is 
bound to the target under conditions favoring hybridization. Stringency is usually controlled 
at this stage by altering the temperature. For high stringency, the temperature is usually 
between 65°C and 70°C, unless short (<20 nt) oligonucleotide probes are used. A 
representative hybridization solution comprises 6X SSC, 0.5% SDS, 5X Denhardt's solution 
and lOOug of non-specific carrier DNA. See Ausubel et al, supra, section 2.9, supplement 
27 (1994). Of course many different, yet functionally equivalent, buffer conditions are 
known. Where the degree of relatedness is lower, a lower temperature may be chosen. Low 
stringency binding temperatures are between about 25°C and 40°C. Medium stringency is 
between at least about 40°C to less than about 65°C. High stringency is at least about 65°C. 

Second, the excess probe is removed by washing. It is at this stage that more stringent 
conditions usually are applied. Hence, it is this "washing" stage that is most important in 
determining relatedness via hybridization. Washing solutions typically contain lower salt 
concentrations. One exemplary medium stringency solution contains 2X SSC and 0.1 % SDS. 
A high stringency wash solution contains the equivalent (in ionic strength) of less than about 
0.2X SSC, with a preferred stringent solution containing about 0. IX SSC. The temperatures 
associated with various stringencies are the same as discussed above for "binding." The 
washing solution also typically is replaced a number of times during washing. For example, 
typical high stringency washing conditions comprise washing twice for 30 minutes at 55° C. 
and three times for 15 minutes at 60° C. 

The present invention includes nucleic acid molecules that hybridize to the inventive 
molecules under high stringency binding and washing conditions. More preferred molecules 
(from an mRNA perspective) are those that are at least 50 % of the length of any one of those 
depicted in below. Particularly preferred molecules are at least 75 % of the length of those 
molecules. 

Substitutions, Insertions, Additions and Deletions 

In a general sense, the preferred DNA variants of the invention are those that retain 
the closest relationship, as described by "sequence identity" to the inventive DNA molecules. 
According to another aspect of the invention, therefore, substitutions, insertions, additions 
and deletions of defined properties are contemplated. It will be recognized that sequence 
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identity between two polynucleotide sequences, as defined herein, generally is determined 
with reference to the protein coding region of the sequences. Thus, this definition does not at 
all limit the amount of DNA, such as vector DNA, that may be attached to the molecules 
described herein. Preferred DNA sequence variants include molecules encoding proteins 
sharing some or all of any relevant biological activity of the native molecule. 

In creating these variants, the skilled worker will be guided by reference to the protein 
structure. First, insertions and deletions in any recognized functional domain, above, 
generally should be avoided, except as noted below in the section entitled "Proteins," where 
this domain is discussed in detail. Alterations in such domains usually will be limited to 
conservative amino acid substitutions. In addition, where insertions and deletions are desired, 
this may be accomplished at the N- and/or C-terminus of the protein molecule (or the 
corresponding coding regions of the DNA). If insertions or deletions are made within the 
protein, deletions of major structural features usually should be avoided. Thus, a preferred 
place to make insertion or deletion variants is in non-structural regions, such as linker regions 
between two alpha helices. 

"Substitutions" generally refer to alterations in the DNA sequence which do not 
change its overall length, but only alter one or more nucleotide positions, substituting one for 
another in the common sense of the word. One class of preferred substitutions, "degenerate 
substitutions, " are those that do not alter the encoded amino acid sequence. Some subsitutions 
retains 50%, 55%, 60% or 65% identity. Preferred substitutions retain at least about 70% 
identity, more preferably at least 70% or 75 % identity, with the inventive DNAs. Some more 
preferred molecules have at least about 80% identity, more preferably at least 80% or 85% 
identity. Particularly preferred DNAs share at least about 90% identity, more preferably at 
least 90% or 95% identity. 

"Insertions," unlike substitutions, alter the overall length of the DNA molecule, and 
thus sometimes the encoded protein. Insertions add extra nucleotides to the interior (not the 
5' or 3' ends) of the subject DNAs. Preferred insertions are made with reference to the 
protein sequence encoded by the DNA. Thus, it is most preferred to provide an insertion in 
the DNA at a location that corresponds to an area of the encoded protein which lacks 
structure. For instance, it typically would not be beneficial, if the preservation of biological 
activity is desired, to provide an insertion within an alpha-helical region or a beta-pleated 
sheet. Accordingly, non-structural areas, such as those containing helix-breaking glycines 
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and proline residues, are most preferred sites of insertion. Other preferred sites of insertion 
are the splice sites, which are indicated above in the description of the inventive DNA 
molecules. 

While the optimal size of insertions will vary depending upon the site of insertion and 
its effect on the overall conformation of the encoded protein, some general guides are useful. 
Generally, the total insertions (irrespective of their number) should not add more than about 
30% (or preferably not more than 30%) to the overall size of the encoded protein. More 
preferably, the insertion adds less than about 10-20% (yet more preferably 10-20%) in size, 
with less than about 10% being most preferred. The number of insertions is limited only by 
the number of suitable insertions sites, and secondarily by the foregoing size preferences. 

"Additions," like insertions, also add to the overall size of the DNA molecule, and 
usually the encoded protein. However, instead of being made within the molecule, they are 
made on the 5' or 3' end, usually corresponding to the N- or C- terminus of the encoded 
protein. Unlike deletions, additions are not very size -dependent. Indeed, additions may be of 
virtually any size. Preferred additions, however, do not exceed about 100% of the size of the 
native molecule. More preferably, they add less than about 60 to 30% to the overall size, 
with less than about 30% being most preferred. 

"Deletions" diminish the overall size of the DNA and, therefore, also reduce the size 
of the protein encoded by that DNA. Deletions may be made from either end of the molecule 
or internal to it. Typical preferred deletions remove discrete structural features of the 
encoded protein. For example, some deletions will comprise the deletion of one or more 
exons which may define a structural feature. Preferred deletions remove less than about 30% 
of the size of the subject molecule. More preferred deletions remove less than about 20% and 
most preferred deletions remove less than about 10% . 

Computer-Defined Variants and Definition of "Sequence Identity " 

In general, both the DNA and protein molecules of the invention can be defined with 
reference to "sequence identity." As used herein, "sequence identity" refers to a comparison 
made between two molecules using, for example, the standard Smith-Waterman algorithm 
that is well known in the art. 

Some molecules have at lease about 50%, 55% or 60% identity. Preferred molecules 
are those having at least about 65% sequence identity, more preferably at least 65% or 70% 
sequence identity. Other preferred molecules have at least about 80%, more preferably at 
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least 80% or 85%, sequence identity. Particularly preferred molecules have at least about 
90% sequence identity, more preferably at least 90% sequence identity. Most preferred 
molecules have at least about 95%, more preferably at least 95%, sequence identity. As used 
herein, two nucleic acid molecules or proteins are said to "share significant sequence identity" 
if the two contain regions which possess greater than 85% sequence (amino acid or nucleic 
acid) identity. 

"Sequence identity" is defined herein with reference the Blast 2 algorithm, which is 
available at the NCBI (http://www.ncbi.nlm.nih.gov/BLAST), using default parameters. 
References pertaining to this algorithm include: those found at 

http://www.ncbi.nlm.nih.gov/BLAST/blast_references.html; Altschul, S.F., Gish, W., Miller, 
W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 
215:403-410; Gish, W. & States, D.J. (1993) "Identification of protein coding regions by 
database similarity search." Nature Genet. 3:266-272; Madden, T.L., Tatusov, R.L. & Zhang, 
J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141; Altschul, 
S.F., Madden, T.L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 
Nucleic Acids Res. 25:3389-3402; and Zhang, J. & Madden, T.L. (1997) "PowerBLAST: A 
new network BLAST application for interactive or automated sequence analysis and 
annotation." Genome Res. 7:649-656. 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive molecules can be constructed in 
several different ways. For example, they may be constructed as completely synthetic DNAs. 
Methods of efficiently synthesizing oligonucleotides in the range of 20 to about 150 
nucleotides are widely available. See Ausubel et al, supra, section 2.11, Supplement 21 
(1993). Overlapping oligonucleotides may be synthesized and assembled in a fashion first 
reported by Khorana et al, J. Mol. Biol. 72:209-217 (1971); see also Ausubel et al, Section 
8.2. The synthetic DNAs are designed with convenient restriction sites engineered at the 5' 
and 3' ends of the gene to facilitate cloning into an appropriate vector. 

An alternative method of generating variants is to start with one of the inventive 
DNAs and then to conduct site-directed mutagenesis. See Ausubel et al, supra, chapter 8, 
Supplement 37 (1997). In a typical method, a target DNA is cloned into a single-stranded 
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DNA bacteriophage vehicle. Single-stranded DNA is isolated and hybridized with a 
oligonucleotide containing the desired nucleotide alteration(s). The complementary strand is 
synthesized and the double stranded phage is introduced into a host. Some of the resulting 
progeny will contain the desired mutant, which can be confirmed using DNA sequencing. In 
addition, various methods are available that increase the probability that the progeny phage 
will be the desired mutant. These methods are well known to those in the field and kits are 
commercially available for generating such mutants. 

ISOLATING HOMOLOGS 

Methods 

By using the sequences disclosed herein as probes or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs. 
"Homologs" are essentially naturally-occurring variants and include allelic, species-specific 
and tissue-specific variants. 

Region-specific primers or probes derived from the nucleotide sequence(s) provided 
can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies 
containing cloned DNA encoding a homolog using known methods (Innis et al., PCR 
Protocols, Academic Press, San Diego, CA (1990)). Such an application is useful in 
diagnostic methods, as described in more detail below, as well as in preparing full-length 
DNAs from various sources. The PCR primers are preferably at least 15 bases, and more 
preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that 
the primer pairs have approximately the same G/C ratio, so that melting temperatures are 
approximately the same. As a general guide, the formula 3(G+C) + 2(A+T) = °C, is 
useful. 

When using primers derived from the inventive sequences, one skilled in the art will 
recognize that by employing high stringency conditions {e.g., annealing at 50-60°C), only 
sequences with greater than 75% sequence identity to the primer will be amplified. By 
employing lower stringency conditions (e.g., annealing at 35-37°C), sequences which have 
greater than 40-50% sequence identity to the primer also will be amplified. 

The PCR product may be subcloned and sequenced to confirm that it indeed displays 
the expected sequence identity. The PCR fragment may then be used to isolate a full length 
cDNA clone by a variety of methods. For example, the amplified fragment may be labeled 
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and used to screen a bacteriophage cDNA library. Alternatively, the labeled fragment may be 
used to screen a genomic library. 

PCR technology may also be utilized to isolate full length cDNA sequences. For 
example, RNA may be isolated, following standard procedures, from an appropriate cellular 
or tissue source. A reverse transcription reaction may be performed on the RNA using an 
oligonucleotide primer specific for the most 5 ' end of the amplified fragment for the priming 
of first strand synthesis. The resulting RNA/DNA hybrid may then be "tailed" with guanines 
using a standard terminal transferase reaction, the hybrid may be digested with RNAase H, 
and second strand synthesis may then be primed with a poly-C primer. Thus, cDNA 
sequences upstream of the amplified fragment may easily be isolated. For a review of cloning 
strategies which may be used, see e.g., Sambrooket al., 1989, supra. 

When using DNA probes derived from the inventive sequences for colony/plaque 
hybridization, one skilled in the art will recognize that by employing medium to high 
stringency conditions (e.g., hybridizing at 50-65°C in 5X SSPC and 50% formamide, and 
washing at 50-65°C in 0.5X SSPC), sequences having regions with greater than 90% 
sequence identity to the probe can be obtained, and that by employing lower stringency 
conditions (e.g., hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 
42°C in SSPC), sequences having regions with greater than 35-45% sequence identity to the 
probe will be obtained. 

Suitably, genomic or cDNA libraries can be constructed and screened in accord with 
the previous paragraph. The libraries should be derived from a tissue or organism that is 
known to express the gene of interest, or that is suspected of expressing the gene. The clone 
containing the homolog may then be purified through methods routinely practiced in the art, 
and subjected to sequence analysis. 

Additionally, an expression library can be constructed utilizing DNA isolated from or 
cDNA synthesized from a tissue or organism that is known to express the gene of interest, or 
that is suspected of expressing the gene. In this manner, clones may be induced and screened 
using standard antibody screening techniques in conjunction with antibodies raised against the 
normal gene product, as described herein. (For screening techniques, see, for example, 
Harlow, E. and Lane, eds., 1988, ANTIBODIES: A LABORATORY MANUAL, Cold 
Spring Harbor Press, Cold Spring Harbor Press.) 

Human Homologs 
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Any organism or tissue can be used as the source for homologs of the present 
invention so long as the organism or tissue naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs is human. 

PROTEINS OF THE INVENTION 

One class of proteins included within the invention is encoded by the inventive DNA 
molecules presented. Other proteins according to the invention are those encoded by the 
DNA variants described above. As noted, these variants are designed with the encoded 
proteins in mind. 

A preferred class of protein fragments includes those fragments which retain any 
biological activity. These molecules share functional features common the family of proteins, 
although these characteristics may vary in degree. 

According to one aspect of the invention fragments of the inventive proteins are 
contemplated. Some preferred fragments are those which are capable of eliciting an immune 
response. Generally these "antigenic" fragments will be from about five amino acids in 
length to about fifty amino acids in length. Some preferred antigenic fragments are from five 
to about twenty amino acids long. "Antigenic" response may refer to a T cell response, a B 
cell response or a response by cells of the macrophage/monocyte lineages. In most cases, 
however, it will refer to the immune response involved in the generation of antibodies. In 
other words, the relevant immune response is that of helper T cells and/or B cells. These 
preferred molecules comprise one or more T cell and /or B cell epitopes. 

ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments of the invention also are 
contemplated by the invention. Described below are antibody products and methods for 
producing antibodies capable of specifically recognizing one or more epitopes of the presently 
described proteins and their derivatives. 

Antibodies include, but are not limited to polyclonal antibodies, monoclonal antibodies 
(mAbs), humanized or chimeric antibodies, single chain antibodies including single chain Fv 
(scFv) fragments, Fab fragments, Ffab'^ fragments, fragments produced by a Fab expression 
library, anti-idiotypic (anti-Id) antibodies, epitope-binding fragments, and humanized forms of 
any of the above. 
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As known to one in the art, these antibodies may be used, for example, in the 
detection of a target protein in a biological sample. They also may be utilized as part of 
treatment methods, and/or may be used as part of diagnostic techniques whereby patients may 
be tested for abnormal levels or for the presence of abnormal forms of the such proteins. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as 
hybridomas capable of producing the desired antibody are well known in the art (Campbell, 
A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. 
Groth et al., /. Immunol. Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497 
(1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., 
Immunology Today 4:72 (1983); Cole et al., in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985), pp. 77-96). Antibodies may also be generated by the known 
techniques of phage display and in vitro immunization. 

Polyclonal Antibodies 

Polyclonal antibodies are heterogeneous populations of antibody molecules derived 
from the sera of animals immunized with an antigen, such as an inventive protein or an 
antigenic derivative thereof. 

Polyclonal antiserum, containing antibodies to heterogeneous epitopes of a single 
protein, can be prepared by immunizing suitable animals with the expressed protein described 
above, which can be unmodified or modified, as known in the art, to enhance 
immunogenicity. Immunization methods include subcutaneous or intraperitoneal injection of 
the polypeptide. 

Effective polyclonal antibody production is affected by many factors related both to 
the antigen and to the host species. For example, small molecules tend to be less 
immunogenic than other and may require the use of carriers and/or adjuvant. In addition, 
host animal response may vary with site of inoculation. Both inadequate or excessive doses 
of antigen may result in low titer antisera. In general, however, small doses (high ng to low 
ug levels) of antigen administered at multiple intradermal sites appears to be most reliable. 
Host animals may include but are not limited to rabbits, mice, chickens and rats, to name but 
a few. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al, J. 
Clin. Endocrinol. Metab. 33:988-991 (1971). 
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The protein immunogen may be modified or administered in an adjuvant in order to 
increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are 
well known in the art and include, but are not limited to coupling the antigen with a 
heterologous protein (such as globulin p-galactosidase) or through the inclusion of an adjuvant 
during immunization. Adjuvants include Freund's (complete and incomplete), mineral gels 
such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, and 
potentially useful human adjuvants such as BCG (bacille Calmette-Guerin) and 
Corynebacterium parvum. 

Booster injections can be given at regular intervals, with at least one usually being 
required for optimal antibody production. The antiserum may be harvested when the 
antibody titer begins to fall. Titer may be determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen. See, for 
example, Ouchterlony et al, Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, 
Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12 uM). The antiserum may be purified by affinity chromatography 
using the immobilized immunogen carried on a solid support. Such methods of affinity 
chromatography are well known in the art. 

Affinity of the antisera for the antigen may be determined by preparing competitive 
binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical 
Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, 
Washington, D.C. (1980). 

In addition to using protein an the immunogen, DNA molecules may be used directly. 
In this manner, a DNA encoding the protein immunogen is administered. Boosting and 
harvesting is done in a manner analogous to that detailed above. Yet another method of 
producing antibodies entails immunizing chickens and harvesting the antibodies from then- 
eggs. 

Monoclonal Antibodies 

Monoclonal antibodies (MAbs), are homogeneous populations of antibodies to a 
particular antigen. They may be obtained by any technique which provides for the production 
of antibody molecules by continuous cell lines in culture or in vivo. MAbs may be produced 
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by making hybridomas which are immortalized cells capable of secreting a specific 
monoclonal antibody. 

Monoclonal antibodies to any of the proteins, peptides and epitopes thereof described 
herein can be prepared from murine hybridomas according to the classical method of Kohler, 
G. and Milstein, C, Nature 256:495^97 (1975) (and U.S. Patent No. 4,376,110) or 
modifications of the methods thereof, such as the human B-cell hybridoma technique (Kosbor 
et al, 1983, Immunology Today 4:72; Cole et al, 1983, Proc. Natl. Acad. Sci. USA 80: 
2026-2030), and the EBV-hybridoma technique (Cole et al., 1985, MONOCLONAL 
ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc., pp. 77-96). 

In one method a mouse is repetitively inoculated with a few micrograms of the 
selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen are isolated. 

The spleen cells are fused, typically using polyethylene glycol, with mouse myeloma 
cells, such as SP2/0-Agl4 myeloma cells. The excess, unfused cells are destroyed by growth 
of the system on selective media comprising aminopterin (HAT media). The successfully 
fused cells are diluted, and aliquots are plated to microliter plates where growth is continued. 

Antibody-producing clones (hybridomas) are identified by detection of antibody in the 
supernatant fluid of the wells by immunoassay procedures. These include ELISA, as 
originally described by Engvall, Meth. Enzytnol. 70:419 (1980), western blot analysis, 
radioimmunoassay (Lutz et al, Exp. Cell Res. 175:109-124 (1988)) and modified methods 
thereof. 

Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in 
Davis, L. et al. BASIC METHODS IN MOLECULAR BIOLOGY, Elsevier, New York. 
Section 21-2 (1989). The hybridoma clones may be cultivated in vitro or in vivo, for instance 
as ascites. Production of high titers of mAbs in vivo makes this the presently preferred 
method of production. Alternatively, hybridoma culture in hollow fiber bioreactors provides 
a continuous high yield source of monoclonal antibodies. 

The antibody class and subclass may be determined using procedures known in the art 
(Campbell, A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry 
and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)). 
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MAbs may be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD and any 
subclass thereof. Methods of purifying monoclonal antibodies are well known in the art. 



Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of the antibody which is 
capable of binding the target antigen, or a specific portion thereof. Antibody derivatives 
include poly-specific (e.g., bi-specific) antibodies, which contain binding sites specific for two 
or more different epitopes. These epitopes may be from the same or different inventive 
molecules or one or more epitope may be from a molecule not specifically disclosed here. 

Antibody fragments specifically include ¥(ab\ Fab, Fab' and Fv fragments. These 
can be generated from any class of antibody, but typically are made from IgG or IgM. They 
may be made by conventional recombinant DNA techniques or, using the classical method, by 
proteolytic digestion with papain or pepsin. See CURRENT PROTOCOLS IN 
IMMUNOLOGY, chapter 2, Coligane? al. , eds. , (John Wiley & Sons 1991-92). 

F(ab') 2 fragments are typically about 110 kDa (IgG) or about 150 kDa (IgM) and 
contain two antigen-binding regions, joined at the hinge by disulfide bond(s). Virtually all, if 
not all, of the Fc is absent in these fragments. Fab' fragments are typically about 55 kDa 
(IgG) or about 75 kDa (IgM) and can be formed, for example, by reducing the disulfide 
bond(s) of an F(ab')2 fragment. The resulting free sulfhydryl group(s) may be used to 
conveniently conjugate Fab' fragments to other molecules, such as detection reagents (e.g., 
enzymes). 

Fab fragments are monovalent and usually are about 50 kDa (from any source). Fab 
fragments include the light (L) and heavy (H) chain, variable (V L and V H , respectively) and 
constant (C L C H , respectively) regions of the antigen-binding portion of the antibody. The H 
and L portions are linked by an intramolecular disulfide bridge. 

Fv fragments are typically about 25 kDa (regardless of source) and contain the 
variable regions of both the light and heavy chains (V L and V H , respectively). Usually, the V L 
and V H chains are held together only by non-covalent interacts and, thus, they readily 
dissociate. They do, however, have the advantage of small size and they retain the same 
binding properties of the larger Fab fragments. Accordingly, methods have been developed 
to crosslink the V L and V H chains, using, for example, glutaraldehyde (or other chemical 
crosslinkers), intermolecular disulfide bonds (by incorporation of cysteines) and peptide 
linkers. The resulting Fv is now a single chain (i.e. , SCFv). 
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Other antibody derivatives include single chain antibodies (U.S. Patent 4,946,778; 
Bird, Science 242:423-426 (1988); Huston etal, Proc. Natl. Acad. Sci. USA 85:5879-5883 
(1988); and Ward et al. , Nature 334:544-546 (1989)). Single chain antibodies are formed by 
linking the heavy and light chain fragments of the Fv region via an amino acid bridge, 
resulting in a single chain FV (SCFv). 

One preferred method involves the generation of scFvs by recombinant methods, 
which allows the generation of Fvs with new specificities by mixing and matching variable 
chains from different antibody sources. In a typical method, a recombinant vector would be 
provided which comprises the appropriate regulatory elements driving expression of a cassette 
region. The cassette region would contain a DNA encoding a peptide linker, with convenient 
sites at both the 5' and 3' ends of the linker for generating fusion proteins. The DNA 
encoding a variable region(s) of interest may be cloned in the vector to form fusion proteins 
with the linker, thus generating an scFv. 

In an exemplary alternative approach, DNAs encoding two Fvs may be ligated to the 
DNA encoding the linker, and the resulting tripartite fusion may be ligated directly into a 
conventional expression vector. The scFv DNAs generated any of these methods may be 
expressed in prokaryotic or eukaryotic cells, depending on the vector chosen. 

Antibody fragments which recognize specific epitopes may be generated by known 
techniques. For example, such fragments include but are not limited to: the F(ab'^ fragments 
which can be produced by pepsin digestion of the antibody molecule and the Fab fragments 
which can be generated by reducing the disulfide bridges of the F(ab^ fragments. 
Alternatively, Fab expression libraries may be constructed (Huse et al., 1989, Science, 
246: 1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the 
desired specificity. 

Derivatives also include "chimeric antibodies" (Morrison et al, Proc. Natl. Acad. 
Sci., 81:6851-6855 (1984); Neuberger et al., Nature, 312:604-608 (1984); Takeda et al., 
Nature, 314:452-454 (1985)). These chimeras are made by splicing the DNA encoding a 
mouse antibody molecule of appropriate specificity with, for instance, DNA encoding a 
human antibody molecule of appropriate specificity. Thus, a chimeric antibody is a molecule 
in which different portions are derived from different animal species, such as those having a 
variable region derived from a murine mAb and a human immunoglobulin constant region. 
These are also known sometimes as "humanized" antibodies and they offer the added 
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They are, therefore, 



Labeled Antibodies 

The present invention further provides the above-described antibodies in detectably 
labeled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity 
labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline 
phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, 
etc. Procedures for accomplishing such labeling are well-known in the art, for example see 
(Sternberger et al., J. Histochem. Cytochem. 18:315 (1970); Bayer et al., Meth. Enzym. 
62:308 (1979); Engval et al., Immunol. 109:129 (1972); Goding, J. Immunol. Meth. 13:215 
(1976)). The labeled antibodies of the present invention can be used for in vitro, in vivo, and 
in situ diagnostic assays. 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid support. Examples of 
such solid supports include plastics such as polycarbonate, complex carbohydrates such as 
agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. 
Techniques for coupling antibodies to such solid supports are well known in the art (Weim 
al., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, 
Oxford, England, Chapter 10 (1986); Jacoby et al., Meth. Enzym. 34 Academic Press, N.Y. 
(1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, 
and in situ assays as well as for immunoaffinity purification of the proteins of the present 
invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSITIONS 

The proteins, antibodies and polynucleotides of the present invention can be 
formulated according to known methods to prepare pharmaceutically useful compositions, 
whereby these materials, or their functional derivatives, are combined in admixture with a 
pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive 
of other human proteins, e.g., human serum albumin, are described, for example, in 
Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, EastonPA (1980)). In 
order to form a pharmaceutically acceptable composition suitable for effective administration, 
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such compositions will contain an effective amount of one or more of the agents of the present 
invention, together with a suitable amount of carrier vehicle. 



Pharmaceutical compositions for use in accordance with the present invention may be 
formulated in conventional manner using one or more physiologically acceptable carriers or 
excipients. Thus, the compounds and their physiologically acceptable salts and solvate may 
be formulated for administration by inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions may take the form of, for 
example, tablets or capsules prepared by conventional means with pharmaceutically 
acceptable excipients such as binding agents (e.g., pregelatinised maize starch, 
polyvinylpyrrolidone or hydroxypropyl mefhylcellulose); fillers (e.g., lactose, 
microcrystalline cellulose or calcium hydrogen phosphate); lubricants (e.g., magnesium 
stearate, talc or silica); disintegrants (e.g. , potato starch or sodium starch glycolate); or 
wetting agents (e.g., sodium lauryl sulphate). The tablets may be coated by methods well 
known in the art. Liquid preparations for oral administration may take the form of, for 
example, solutions, syrups or suspensions, or they maybe presented as a dry product for 
constitution with water or other suitable vehicle before use. Such liquid preparations may be 
prepared by conventional means with pharmaceutically acceptable additives such as 
suspending agents (e.g. , sorbitol syrup, cellulose derivatives or hydrogenated edible fats); 
emulsifying agents (e.g., lecithin or acacia); non-aqueous vehicles (e.g., almond oil, oily 
esters, ethyl alcohol or fractionated vegetable oils); and preservatives (e.g. , methyl or propyl- 
p-hydroxybenzoates or sorbic acid). The preparations may also contain buffer salts, 
flavoring, coloring and sweetening agents as appropriate. 

Preparations for oral administration may be suitably formulated to give controlled 
release of the active compound. For buccal administration the composition may take the form 
of tablets or lozenges formulated in conventional manner. 

For administration by inhalation, the compounds for use according to the present 
invention are conveniently delivered in the form of an aerosol spray presentation from 
pressurized packs or a nebuliser, with the use of a suitable propellant, e.g., 
dichlorodifluoromethane, trichlorofluoromethane, dichlorotetrafluoroethane, carbon dioxide 
or other suitable gas. In the case of a pressurized aerosol the dosage unit may be determined 
by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g. gelatin for 
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use in an inhaler or insufflator may be formulated containing a powder mix of the compound 
and a suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral administration by injection, e.g. , by 
bolus injection or continuous infusion. Formulations for injection may be presented in unit 
dosage form, e.g., in ampules or in multi-dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or emulsions in oily or aqueous 
vehicles, and may contain formulatory agents such as suspending, stabilizing and/or 
dispersing agents. Alternatively, the active ingredient may be in powder form for constitution 
with a suitable vehicle, e.g., sterile pyrogen- free water, before use. 

The compounds may also be formulated in rectal compositions such as suppositories 
or retention enemas, e.g., containing conventional suppository bases such as cocoa butter or 
other glycerides. 

In addition to the formulations described previously, the compounds may also be 
formulated as a depot preparation. Such long acting formulations may be administered by 
implantation (for example subcutaneously or intramuscularly) or by intramuscular injection. 
Thus, for example, the compounds may be formulated with suitable polymeric or 
hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange 
resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt. 

The compositions may, if desired, be presented in a pack or dispenser device which 
may contain one or more unit dosage forms containing the active ingredient. The pack may 
for example comprise metal or plastic foil, such as a blister pack. The pack or dispenser 
device may be accompanied by instructions for administration. 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA constructs comprising one 
or more of the nucleotide sequences of the present invention. The recombinant constructs of 
the present invention comprise a vector, such as a plasmid or viral vector, into which a DNA 
or DNA fragment, typically bearing an open reading frame, is inserted, in either orientation. 

The gene products encoded by the subject DNAs may be produced by recombinant 
DNA technology using techniques well known in the art. See, for example, the techniques 
described in Sambrook et al., 1989, supra, and Ausubel et al., 1989, supra. Alternatively, 
the DNA sequences may be chemically synthesized using, for example, synthesizers. See, for 
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example, the techniques described in OLIGONUCLEOTIDE SYNTHESIS, 1984, Gait, ed., 
IRL Press, Oxford, which is incorporated by reference herein in its entirety. They may be 
assembled from fragments and short oligonucleotide linkers, or from a series of 
oligonucleotides. The are preferably made by RT-PCR methods. The resulting synthetic 
gene is capable of being expressed in a recombinant vector. 

In some cases the recombinant constructs will be expression vectors, which are 
capable of expressing the RNA and/or protein products of the encoded DNA(s). Thus, the 
vector may further comprise regulatory sequences, including for example, a promoter, 
operably linked to the open reading frame (ORF). The vector may further comprise a 
selectable marker sequence. 

Specific initiation signals may also be required for efficient translation of inserted 
target gene coding sequences. These signals include the ATG initiation codon and adjacent 
sequences. In cases where a target DNA includes its own initiation codon and adjacent 
sequences is inserted into the appropriate expression vector, no additional translation control 
signals may be needed. However, in cases where only a portion of an ORF is used, 
exogenous translational control signals, including, perhaps, the ATG initiation codon, must be 
provided. Furthermore, the initiation codon must be in phase with the reading frame of the 
desired coding sequence to ensure translation of the entire target. These exogenous 
translational control signals and initiation codons can be of a variety of origins, both natural 
and synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate 
transcription enhancer elements, transcription terminators, etc. (see Bittner et al., Methods in 
Enzymol. 153:516-544 (1987)). Some appropriate cloning and expression vectors for use 
with prokaryotic and eukaryotic hosts are described by Sambrook, et al, in Molecular 
Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, New York (1989), the 
disclosure of which is hereby incorporated by reference. 

If desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular expression 
organism, as explained by Hatfield etal, U.S. Patent No. 5,082,767. 

The present invention further provides host cells containing at least one of the DNAs 
of the present invention. The host cell can be virtually any cell for which expression vectors 
are available. It may be, for example, a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a prokaryotic 
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cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can 
be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or 
electroporation (Davis et al. , Basic Methods in Molecular Biology (1986)). 

A wide variety of expression systems are available, such as: yeast (e.g. 
Saccharomyces, Pichia) transformed with recombinant yeast expression vectors containing the 
target DNA; insect cell systems infected with recombinant virus expression vectors (e.g. , 
baculovirus) containing the target DNA sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic 
virus, TMV) or transformed with recombinant plasmid expression vectors (e.g. Ti plasmid) 
containing target DNA coding sequences; or mammalian cell systems (e.g. COS, CHO, 
BHK, 293, 3T3) harboring recombinant expression constructs containing promoters derived 
from the genome of mammalian cells (e.g. , metallothionein promoter) or from mammalian 
viruses (e.g. , the adenovirus late promoter; the vaccinia virus 7.5K promoter). 

Depending on the system chosen, the resulting product may differ. For example, 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylation 
modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern 
different from that expressed in mammalian cells. 

Vectors 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting selection of the host cell, e.g. , the ampicillin resistance gene of 
E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly -expressed gene to 
direct transcription of a downstream structural sequence. Such promoters can be derived 
from operons encoding glycolytic enzymes such as 3 -phosphogly cerate kinase (PGK), 
a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequence, and in one aspect of the invention, a leader sequence capable of 
directing secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N-terminal or 
C-terminal identification peptide imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 
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Bacterial Expression 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 
termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to ensure 
maintenance of the vector and, if desirable, to provide amplification within the host. Suitable 
prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella 
typhimurium and various species within the genera Pseudomonas, Streptomyces, and 
Staphylococcus, although others may, also be employed as a matter of choice. 

Bacterial vectors may be, for example, bacteriophage-, plasmid- or cosmid-based. 
These vectors can comprise a selectable marker and bacterial origin of replication derived 
from commercially available plasmids typically containing elements of the well known 
cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
GEM 1 (Promega Biotec, Madison, WI, USA), pBs, phagescript, PsiX174, pBluescript SK, 
pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, 
pKK232-8, pDR540, and pRIT5 (Pharmacia). 

These "backbone" sections are combined with an appropriate promoter and the 
structural sequence to be expressed. Bacterial promoters include lac, T3, T7, lambda P R or 
P L , trp, and ara. 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter is derepressed/induced by appropriate means 
(e.g., temperature shift or chemical induction) and cells are cultured for an additional period. 
Cells are typically harvested by centriftigation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 

In bacterial systems, a number of expression vectors may be advantageously selected 
depending upon the use intended for the protein being expressed. For example, when a large 
quantity of such a protein is to be produced, for the generation of antibodies or to screen 
peptide libraries, for example, vectors which direct the expression of high levels of fusion 
protein products that are readily purified may be desirable. Such vectors include, but are not 
limited, to the E. coli expression vector pUR278 (Ruther et al., 1983, EMBO J. 2:1791), in 
which the coding sequence may be ligated into the vector in frame with the lac Z coding 
region so that a fusion protein is produced; pIN vectors (Inouye et al. 1985, Nucleic Acids 
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Res. 13:3101-3109; Van Heeke et al., 1989, /. Biol. Chem. 264:5503-5509); pET vectors, 
Studier et al. , Methods in Enzymology 185: 60-89 (Academic Press 1990); and the like. 

Moreover, pGEX vectors may be used to express foreign polypeptides as fusion 
proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble 
and easily can be purified from lysed cells by adsorption to glutathione-agarose beads 
followed by elution in the presence of free glutathione. The pGEX vectors are designed to 
include thrombin or factor Xa protease cleavage sites so that the cloned target gene protein 
can be released from the GST moiety. 

In a one embodiment, full length cDNA sequences are appended with in-frame2tamffl 
sites at the amino terminus and EcoRl sites at the carboxyl terminus using standard PCR 
methodologies (Innis et al., 1990, supra) and ligated into the pGEX-2TK vector (Pharmacia, 
Uppsala, Sweden). The resulting cDNA construct contains a kinase recognition site at the 
amino terminus for radioactive labeling and glutathione S-transferase sequences at the 
carboxyl terminus for affinity purification (Nilsson, et al. 1985, EMBO J. 4: 1075; Zabeau 
and Stanley, 1982, EMBO J. 1:1217. 

Eukaryotic Expression 

Various mammalian cell culture systems can also be employed to express recombinant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of 
expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell 
lines. Mammalian expression vectors will comprise an origin of replication, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, 
splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for 
example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be 
used to provide the required nontranscribed genetic elements. 

Mammalian promoters include CMV immediate early, HSV thymidine kinase, early 
and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Exemplary mammalian 
vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, 
and pSVL (Pharmacia). Selectable markers include CAT (chloramphenicol transferase). 

In mammalian host cells, a number of viral-based expression systems may be utilized. 
In cases where an adenovirus is used as an expression vector, the coding sequence of interest 
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may be ligated to an adenovirus transcription/translation control complex, e.g., the late 
promoter and tripartite leader sequence. This chimeric gene may then be inserted in the 
adenovirus genome by in vitro or in vivo recombination. Insertion in a non-essential region of 
the viral genome (e.g., region El or E3) will result in a recombinant virus that is viable and 
capable of expressing a target protein in infected hosts. (E.g. , See Logan et al. , 1984, Proc. 
Natl. Acad. Sci. USA 81:3655-3659). 

In one embodiment, cDNA sequences encoding the full-length open reading frames 
are ligated into pCMVG replacing the 6-galactosidase gene such that cDNA expression is 
driven by the CM V promoter (Alam, 1990, Anal. Biochem. 188: 245-254; MacGregore? al., 
1989, Nucl. Acids Res. 17: 2365; Norton etal. 1985, Mol. Cell. Biol. 5: 281). 

In addition, a host cell strain may be chosen which modulates the expression of the 
inserted sequences, or modifies and processes the gene product in the specific fashion desired. 
Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products 
may be important for the function of the protein. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and modification of proteins. 

Appropriate cell lines or host systems can be chosen to ensure the correct modification 
and processing of the foreign protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the primary transcript, glycosylation, 
and phosphorylation of the gene product may be used. Such mammalian host cells include 
but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, etc. 

For long-term, high-yield production of recombinant proteins in eukaryotic cells, 
stable expression is preferred. Rather than using expression vectors which contain viral 
origins of replication, host cells can be transformed with DNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylation sites, etc.), and a selectable marker. 

Following the introduction of the foreign DNA, engineered cells may be allowed to 
grow for 1-2 days in an enriched media, and then are switched to a selective media. The 
selectable marker in the recombinant plasmid confers resistance to the selection and allows 
cells to stably integrate the plasmid into their chromosomes and grow to form foci which in 
turn can be cloned and expanded into cell lines. This method may advantageously be used to 
engineer cell lines which express the target protein. Such engineered cell lines may be 
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particularly useful in screening and evaluation of compounds that affect the endogenous 
activity of the protein. 

A number of selection systems may be used, including but not limited to the herpes 
simplex virus thymidine kinase (Wigler, et al.. Cell 11:223 (1977)), hypoxanthine-guanine 
phosphoribosyltransferase(Szybalskaera/., Proc. Natl. Acad. Sci. USA 48:2026 (1962)), and 
adenine phosphoribosyltransferase(Lowy, et al, Cell 22:817 (1980)) genes can be employed 
in tk", hgprt" or aprt" cells, respectively. Also, antimetabolite resistance can be used as the 
basis of selection for dhfr, which confers resistance to methotrexate (Wigler, et al. , Proc. 
Natl. Acad, Sci. USA 77:3567 (1980)); O'Hare, et al., 1981, Proc. Natl. Acad. Sci. USA 
78:1527); gpt, which confers resistance to mycophenolic acid (Mulligans al., Proc. Natl. 
Acad. Sci. USA 78:2072 (1981)); neo, which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin, et al. , 1981 , J. Mol. Biol. 150: 1); and hydro, which confers resistance to 
hygromycin (Santerre, et al. , 1984, Gene 30: 147) genes. 

An alternative fusion protein system allows for the ready purificationof non-denatured 
fusion proteins expressed in human cell lines (Janknecht, et al. , Proc. Natl. Acad. Sci. USA 
88: 8972-8976 (1991)). In this system, the gene of interest is subcloned into a vaccinia-based 
plasmid such that the gene's open reading frame is translationally fused to an amino-terminal 
tag consisting of six histidine residues. Extracts from cells infected with recombinant 
vaccinia virus are loaded onto Ni 2 + nitriloacetic acid-agarose columns and histidine-tagged 
proteins are selectively eluted with imidazole-containing buffers. 

In an insect system, Autographa californica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The target coding sequence may be cloned individually into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter 
(for example the polyhedrin promoter). Successful insertion of a target gene coding sequence 
will result in inactivation of the polyhedrin gene and production of non-occluded recombinant 
virus (i.e., virus lacking the proteinaceous coat coded for by the polyhedrin gene). These 
recombinant viruses are then used to infect Spodoptera frugiperda cells in which the inserted 
gene is expressed. (E.g., see Smith et al., 1983, J. Virol. 46: 584; Smith, U.S. Patent No. 
4,215,051). 
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While the present proteins can be expressed in recombinant systems, as described 
above, cell-free translation systems can also be employed to produce such proteins using 
RNAs derived from the DNA constructs of the present invention. 

Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell lysis. This may be 
followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography 
steps. Finally, high performance liquid chromatography (HPLC) can be employed for final 
purification steps. Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use 
of cell lysing agents, like lysozyme and chelators. 

If inclusion bodies are formed in bacterial systems, they may be extracted from cell 
pellets using, for example, detergents, reducing agents, salts, urea, guanidinium chloride and 
extremes of pH (e.g. <4 or >10). If denaturation occurs, protein refolding steps (e.g., 
dialysis) can be used, as necessary, in completing configuration of the mature protein. If 
disulfide bridges are present in the native protein, they may be reoxidized using known 
methods. 

By way of specific non-limiting example, the recombinant bacterial cells, for example 
E. coli, are grown in any of a number of suitable media, for example LB, and the expression 
of the recombinant protein induced by adding IPTG (e.g. , lac operator-promoter) to the media 
or switching incubation to a higher temperature (e.g. , A. cl 857 ). After culturing the bacteria for 
a further period of between 2 and 24 hours, the cells are collected by centrifugation and 
washed to remove residual media. The bacterial cells are then lysed, for example, by 
disruption in a cell homogenizer and centrifuged to separate the cell membranes from the 
soluble cell components. If the protein aggregates into inclusion bodies, this centrifugation 
can be performed under conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the buffer and centrifugation at a 
selective speed. The inclusion bodies can then be washed in any of several solutions to 
remove some of the contaminating host proteins, then solubilized in solutions containing high 
concentrations of urea (e.g. 8M) or chaotropic agents such as guanidinium hydrochloride in 
the presence of reducing agents such as fi-mercaptoethanolor DTT (dithiothreitol). 

At this stage it may be advantageous to incubate the protein for several hours under 
conditions suitable for the protein to undergo a refolding process into a conformation which 
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more closely resembles that of the native protein. Such conditions generally include low 
protein concentrations less than 500 (ig/ml), low levels of reducing agent, concentrations of 
urea less than 2 M and often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of disulphide bonds within the protein 
molecule. The refolding process can be monitored, for example, by SDS-PAGE or with 
antibodies which are specific for the native molecule. Following refolding, the protein can 
then be purified further and separated from the refolding mixture by chromatography on any 
of several supports including ion exchange resins, gel permeation resins or on a variety of 
affinity columns. 

Labeling Proteins 

When used as a component in assay systems such as those described, below, the target 
protein may be labeled, either directly or indirectly, to facilitate detection of the present res- 
like molecules either in vitro or in vivo. Any of a variety of suitable labeling systems may be 
used including but not limited to radioisotopes such as 125 I; enzyme labeling systems that 
generate a detectable colorimetric signal or light when exposed to substrate; and fluorescent 
labels. 

Where recombinant DNA technology is used for protein production the, it may be 
advantageous to engineer fusion proteins that can facilitate labeling, immobilization and/or 
detection. These fusion proteins may, for example, add amino acids which facilitate further 
chemical modification. They also may add a functional moiety, such as an enzyme, which 
directly facilitates detection. 

TRANSGENIC ANIMALS 

The invention further contemplates animal models for studying the function of the 
present molecules and for overproducing the protein products. The disclosed DNA sequences 
may be used in conjunction with techniques for producing transgenic animals that are well 
known to those of skill in the art. 
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To prepare transgenic animals, target gene sequences may for example be introduced 
into, and overexpressed in, the genome of the animal of interest, or, if endogenous target 
gene sequences are present, they may either be overexpressed or, alternatively, be disrupted 
in order to underexpress or inactivate target gene expression, such as described for the 
disruption of apoE in mice (Plumef al, Cell 71: 343-353 (1992)). 

In order to overexpress a target gene sequence, the coding portion of the target gene 
sequence may be ligated to a regulatory sequence which is capable of driving gene expression 
in the animal and cell type of interest. Such regulatory regions will be well known to those of 
skill in the art, and may be utilized in the absence of undue experimentation. 

For underexpression of an endogenous target gene sequence, such a sequence may be 
isolated and engineered such that when reintroduced into the genome of the animal of interest, 
the endogenous target gene alleles will be inactivated. Preferably, the engineered target gene 
sequence is introduced via gene targeting such that the endogenous target sequence is 
disrupted upon integration of the engineered target gene sequence into the animal's genome. 

Animals of any species, including, but not limited to, mice, rats, rabbits, guinea pigs, 
pigs, micro-pigs, goats, and non-human primates, e.g., baboons, monkeys, and chimpanzees 
may be used to generate cardiovascular disease animal models. Goats, cows and sheep are 
particularly preferred for producing protein in vivo. 

Any technique known in the art may be used to introduce a target gene transgene into 
animals to produce the founder lines of transgenic animals. Such techniques include, but are 
not limited to pronuclear microinjection (Hoppe et al, U.S. Pat. No. 4,873,191 (1989)); 
retrovirus mediated gene transfer into germ lines (Van der Puttener al., Proc. Natl. Acad. 
Sci., USA 82:6148-6152 (1985)); gene targeting in embryonic stem cells (Thompson et al., 
Cell 56:313-321 (1989)); electroporation of embryos (Lo, Mol. Cell. Biol. 3:1803-1814 
(1983)); and sperm-mediated gene transfer (Lavitrano et al., Cell 57:717-723 (1989)); etc. 
For a review of such techniques, see Gordon, Transgenic Animals, Intl. Rev. Cytol. 115:171- 
229 (1989). 

The present invention provides for transgenic animals that carry the transgene in all 
their cells, as well as animals which carry the transgene in some, but not all their cells, i.e., 
mosaic animals. The transgene may be integrated as a single transgene or in concatamers, 
e.g., head-to-head tandems or head-to-tail tandems. The transgene may also be selectively 
introduced into and activated in a particular cell type by following, for example, the teaching 
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of Lasko et al. (Lasko et al, Proc. Natl. Acad. Sci. USA 89:3232-6236 (1992)). The 
regulatory sequences required for such a cell-type specific activation will depend upon the 
particular cell type of interest, and will be apparent to those of skill in the art. When it is 
desired that the target gene be integrated into the chromosomal site of the endogenous target 
gene, gene targeting is preferred. Briefly, when such a technique is to be utilized, vectors 
containing some nucleotide sequences homologous to the endogenous target gene of interest 
are designed for the purpose of integrating, via homologous recombination with chromosomal 
sequences, into and disrupting the function of the nucleotide sequence of the endogenous 
target gene. 

The transgene may also be selectively introduced into a particular cell type, thus 
inactivating the endogenous gene of interest in only that cell type, by following, for example, 
the teaching of Gu et al. Science 265: 103-106 (1994)). The regulatory sequences required 
for such a cell-type specific inactivation will depend upon the particular cell type of interest, 
and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expression of the recombinant target 
gene and protein may be assayed utilizing standard techniques. Initial screening may be 
accomplished by Southern blot analysis or PCR techniques to analyze animal tissues to assay 
whether integration of the transgene has taken place. The level of mRNA expression of the 
transgene in the tissues of the transgenic animals may also be assessed using techniques which 
include but are not limited to Northern blot analysis of tissue samples obtained from the 
animal, in situ hybridization analysis, and RT-PCR. Samples of target gene-expressing 
tissue, may also be evaluated immunocytochemically using antibodies specific for the target 
gene transgene gene product of interest. 

The transgenic animals that express target gene mRNA or target gene transgene 
peptide (detected immunocytochemically, using antibodies directed against the target gene 
product's epitopes) at easily detectable levels should then be further evaluated to identify those 
animals which display characteristic increased susceptibility to carcinogenesis. Additionally, 
specific cell types within the transgenic animals may be analyzed and assayed in vitro for 
cellular phenotypes characteristic of mutant phenotype. 

Once target gene transgenic founder animals are produced, they may be bred, inbred, 
outbred, or crossbred to produce colonies of the particular animal. Examples of such 
breeding strategies include but are not limited to: outbreeding of founder animals with more 
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than one integration site in order to establish separate lines; inbreeding of separate lines in 
order to produce compound target gene transgenics that express the target gene transgene of 
interest at higher levels because of the effects of additive expression of each target gene 
transgene; crossing of heterozygous transgenic animals to produce animals homozygous for a 
given integration site in order both to augment expression and eliminate the possible need for 
screening of animals by DNA analysis; crossing of separate homozygous lines to produce 
compound heterozygous or homozygous lines; breeding animals to different inbred genetic 
backgrounds so as to examine effects of modifying alleles on expression of the target gene 
transgene and the possible development of carcinogenesis. One such approach is to cross the 
target gene transgenic founder animals with a wild type strain to produce an Fl generation 
that exhibits increased susceptibility to carcinogenesis. The Fl generation may then be inbred 
in order to develop a homozygous line, if it is found that homozygous target gene transgenic 
animals are viable. 

Methods of generating "knockout" mice using homologous recombination in 
embryonic stem cells are well known in the art. Suitable methods are described, for example, 
in Mansour et al, Nature, 336:348 (1988); Zijlstra et al, Nature, 342:435 (1989) and 
344:742 (1990); and Hasty et al, Nature, 350:243 (1991). This genomic DNA can be 
obtained by conventional methods using the cDNA sequence as a probe in a commercially- 
available genomic DNA library. 

Briefly, a genomic fragment is cleaved with a restriction endonuclease and a 
heterologous cassette containing a neomycin-resistancegene is inserted at the cleavage site. A 
suitable cassette is the GTI-II neo cassette described by Lufkin et al, Cell 66:1105 (1991). 
The modified genomic fragment is cloned into a suitable targeting vector that is introduced 
into murine embryonic stem cells by electroporation. Cells that have undergone homologous 
recombination (and hence disruption of the gene) are selected by resistance to G418, and used 
to generate chimeric mice using well known methods. See Lufkin et al, supra. Traditional 
breeding methods then can be used to generate mice that are homozygous for the disrupted 
gene. 

The phenotype of mice that are homozygous for the mutation then can be studied to 
provide insights into the role of the protein in, for example, carcinogenesis. These mice also 
can be used as models for developing new treatments for cancers. If this mutation is lethal in 
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homozygous mice (for example during embryogenesis) heterozygous mice, which express 
only half the amount of the protein can also be studied. 



GENE THERAPY APPLICATIONS 

When mutations in the inventive protein, or in the elements controlling expression of 
that protein, are found to be associated with a malignant phenotype, control of cellular 
proliferation can be restored by gene therapy methods. For example, overexpression of the 
protein can be counteracted by concurrent expression of an antisense molecule that binds to 
and inhibits expression of the mRNA encoding the protein. Alternatively, overexpression can 
be inhibited in an analogous manner using a ribozyme that cleaves the mRNA. In another 
embodiment, where expression of a mutated protein induces the malignant phenotype, 
concomitant expression of the non-mutated molecule via introduction of an exogenous gene 
may be used. Methods of using antisense and ribozyme technology to control gene 
expression, or of gene therapy methods for expression of an exogenous gene in this manner 
are well known in the art. 

Each of these methods requires a system for introducing a vector into the cells 
containing the mutated gene. The vector encodes either an antisense or ribozyme transcript of 
the inventive protein. The construction of a suitable vector can be achieved by any of the 
methods well-known in the art for the insertion of exogenous DNA into a vector. See, e.g., 
Sambrook et al., Molecular Cloning (Cold Spring Harbor Press 2d ed. 1989), which is 
incorporated herein by reference. In addition, the prior art teaches various methods of 
introducing exogenous genes into cells in vivo. See Rosenberg et al. , Science 242: 1575-1578 
(1988) and Wolff et al., PNAS 86:9011-9014 (1989), which are incorporated herein by 
reference. The routes of delivery include systemic administration and administration in situ. 
Well-known techniques include systemic administration with cationic liposomes, and 
administration in situ with viral vectors. Any one of the gene delivery methodologies 
described in the prior art is suitable for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant, transport-deficient cancer 
cell. A listing of present-day vectors suitable for the purpose of this invention is set forth in 
Hodgson, Bio/T echnology 13: 222 (1995), which is incorporated by reference. 

For example, liposome-mediated gene transfer is a suitable method for the 
introduction of a recombinant vector containing an inventive gene according to the invention 
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into a MTX-resistant, transport-deficient cancer cell. The use of a cationic liposome, such as 
DC-Chol/DOPE liposome, has been widely documented as an appropriate vehicle to deliver 
DNA to a wide range of tissues through intravenous injection of DNA/cationic liposome 
complexes. See Caplen et al. , Nature Med. 1:39-46 (1995) and Zhu et al., Science 267:209- 
211 (1993), which are herein incorporated by reference. Liposomes transfer genes to the 
target cells by fusing with the plasma membrane. The entry process is relatively efficient, but 
once inside the cell, the liposome-DNA complex has no inherent mechanism to deliver the 
DNA to the nucleus. As such, the most of the lipid and DNA gets shunted to cytoplasmic 
waste systems and destroyed. The obvious advantage of liposomes as a gene therapy vector is 
that liposomes contain no proteins, which thus minimizes the potential of host immune 
responses. 

As another example, viral vector-mediated gene transfer is also a suitable method for 
the introduction of the vector into a target cell. Appropriate viral vectors include adenovirus 
vectors and adeno-associated virus vectors, retrovirus vectors and herpesvirus vectors. 

Adenoviruses are linear, double stranded DNA viruses complexed with core proteins 
and surrounded by capsid proteins. The common serotypes 2 and 5, which are not associated 
with any human malignancies, are typically the base vectors. By deleting parts of the virus 
genome and inserting the desired gene under the control of a constitutive viral promoter, the 
virus becomes a replication deficient vector capable of transferring the exogenous DNA to 
differentiated, non-proliferating cells. To enter cells, the adenovirus fibre interacts with 
specific receptors on the cell surface, and the adenovirus surface proteins interact with the cell 
surface integrins. The virus penton-cell integrin interaction provides the signal that brings the 
exogenous gene-containing virus into a cytoplasmic endosome. The adenovirus breaks out of 
the endosome and moves to the nucleus, the viral capsid falls apart, and the exogenous DNA 
enters the cell nucleus where it functions, in an epichromosomal fashion, to express the 
exogenous gene. Detailed discussions of the use of adenoviral vectors for gene therapy can 
be found in Berkner, Biotechniques 6:616-629 (1988) and Trapnell, Advanced Drug Delivery 
Rev. 72:185-199 (1993), which are herein incorporated by reference. Adenovirus-derived 
vectors, particularly non-replicative adenovirus vectors, are characterized by their ability to 
accommodate exogenous DNA of 7.5 kB, relative stability, wide host range, low 
pathogenicity in man, and high titers (10* to 10 s plaque forming units per cell). See Stratford- 
Perricaudet et al. , PNAS 89:2581 (1992). 
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Adeno-associated virus (AAV) vectors also can be used for the present invention. 
AAV is a linear single-stranded DNA parvovirus that is endogenous to many mammalian 
species. AAV has a broad host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or herpesvirus for its reproduction 
in vivo. The use of AAV as a vector for the introduction into target cells of exogenous DNA 
is well-known in the art. See, e.g., Lebkowski et al, Mole. & Cell. Biol. 8:3988 (1988), 
which is incorporated herein by reference. In these vectors, the capsid gene of AAV is 
replaced by a desired DNA fragment, and transcomplementation of the deleted capsid 
function is used to create a recombinant virus stock. Upon infection the recombinant virus 
uncoats in the nucleus and integrates into the host genome. 

Another suitable virus-based gene delivery mechanism is retroviral vector-mediated 
gene transfer. In general, retroviral vectors are well-known in the art. See Breakfield et al. , 
Mole. Neuro. Biol. i:339 (1987) and Shih et al., in Vaccines 85: 177 (Cold Spring Harbor 
Press 1985). A variety of retroviral vectors and retroviral vector-producing cell lines can be 
used for the present invention. Appropriate retroviral vectors include Moloney Murine 
Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous 
Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, human immunodeficiency virus, 
myeloproliferative sarcoma virus, and mammary tumor virus. These vectors include 
replication-competent and replication-defective retroviral vectors. In addition, amphotropic 
and xenotropic retroviral vectors can be used. In carrying out the invention, retroviral 
vectors can be introduced to a tumor directly or in the form of free retroviral vector 
producing-cell lines. Suitable producer cells include fibroblasts, neurons, glial cells, 
keratinocytes, hepatocytes, connective tissue cells, ependymal cells, chromaffin cells. See 
Wolff etal, PNAS 54:3344 (1989). 

Retroviral vectors generally are constructed such that the majority of its structural 
genes are deleted or replaced by exogenous DNA of interest, and such that the likelihood is 
reduced that viral proteins will be expressed. See Bender et al., J. Virol. 61:1639 (1987) and 
Armento et al., J. Virol. 67:1647 (1987), which are herein incorporated by reference. To 
facilitate expression of the antisense or ribozyme molecule, of the inventive protein, a 
retroviral vector employed in the present invention must integrate into the genome of the host 
cell genome, an event which occurs only in mitotically active cells. The necessity for host 
cell replication effectively limits retroviral gene expression to tumor cells, which are highly 
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replicative, and to a few normal tissues. The normal tissue cells theoretically most likely to 
be transduced by a retroviral vector, therefore, are the endothelial cells that line the blood 
vessels that supply blood to the tumor. In addition, it is also possible that a retroviral vector 
would integrate into white blood cells both in the tumor or in the blood circulating through 
the tumor. 

The spread of retroviral vector to normal tissues, however, is limited. The local 
administration to a tumor of a retroviral vector or retroviral vector producing cells will 
restrict vector propagation to the local region of the tumor, minimizing transduction, 
integration, expression and subsequent cytotoxic effect on surrounding cells that are 
mitotically active. 

Both replicatively deficient and replicatively competent retroviral vectors can be used 
in the invention, subject to their respective advantages and disadvantages. For instance, for 
tumors that have spread regionally, such as lung cancers, the direct injection of cell lines that 
produce replication-deficient vectors may not deliver the vector to a large enough area to 
completely eradicate the tumor, since the vector will be released only form the original 
producer cells and their progeny, and diffusion is limited. Similar constraints apply to the 
application of replication deficient vectors to tumors that grow slowly, such as human breast 
cancers which typically have doubling times of 30 days versus the 24 hours common among 
human gliomas. The much shortened survival-time of the producer cells, probably no more 
than 7-14 days in the absence of immunosuppression, limits to only a portion of their 
replicative cycle the exposure of the tumor cells to the retroviral vector. 

The use of replication-defective retroviruses for treating tumors requires producer 
cells and is limited because each replication-defective retrovirus particle can enter only a 
single cell and cannot productively infect others thereafter. Because these replication- 
defective retroviruses cannot spread to other tumor cells, they would be unable to completely 
penetrate a deep, multilayered tumor in vivo. See Markert el al, Neurosurg. 77: 590 (1992). 
The injection of replication-competent retroviral vector particles or a cell line that produces a 
replication-competent retroviral vector virus may prove to be a more effective therapeutic 
because a replication competent retroviral vector will establish a productive infection that will 
transduce cells as long as it persists. Moreover, replicatively competent retroviral vectors 
may follow the tumor as it metastasizes, carried along and propagated by transduced tumor 
cells. The risks for complications are greater, with replicatively competent vectors, however. 
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Such vectors may pose a greater risk then replicatively deficient vectors of transducing normal 
tissues, for instance. The risks of undesired vector propagation for each type of cancer and 
affected body area can be weighed against the advantages in the situation of replicatively 
competent verses replicatively deficient retroviral vector to determine an optimum treatment. 

Both amphotropic and xenotropic retroviral vectors may be used in the invention. 
Amphotropic viruses have a very broad host range that includes most or all mammalian cells, 
as is well known to the art. Xenotropic viruses can infect all mammalian cell cells except 
mouse cells. Thus, amphotropic and xenotropic retroviruses from many species, including 
cows, sheep, pigs, dogs, cats, rats, and mice, inter alia can be used to provide retroviral 
vectors in accordance with the invention, provided the vectors can transfer genes into 
proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment of cancer have been 
approved in the United States. See Culver, Clin. Chem. 40: 510 (1994). Retroviral vector- 
containing cells have been implanted into brain tumors growing in human patients. See 
Oldfield et al, Hum. Gene Ther. 4: 39 (1993). These retroviral vectors carried the HSV-1 
thymidine kinase (HSV-tk) gene into the surrounding brain tumor cells, which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir. Some of the limitations of 
current retroviral based cancer therapy, as described by Oldfield are: (1) the low titer of virus 
produced, (2) virus spread is limited to the region surrounding the producer cell implant, (3) 
possible immune response to the producer cell line, (4) possible insertional mutagenesis and 
transformation of retroviral infected cells, (5) only a single treatment regimen of pro-drug, 
ganciclovir, is possible because the "suicide" product kills retrovirally infected cells and 
producer cells and (6) the bystander effect is limited to cells in direct contact with retrovirally 
transformed cells. See Bi et al. , Human Gene Therapy 4: 725 (1993). 

Yet another suitable virus-based gene delivery mechanism is herpesvirus vector- 
mediated gene transfer. While much less is known about the use of herpesvirus vectors, 
replication-competent HSV-1 viral vectors have been described in the context of antitumor 
therapy. See Martuza et al, Science 252: 854 (1991), which is incorporated herein by 
reference. 



117 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 

DIAGNOSTIC METHODS 



PCT/IB00/01496 



The present invention also contemplates, for certain molecules described below, 
methods for diagnosis of human disease. In particular, patients can be screened for the 
occurrence of cancers, or likelihood of occurrence of cancers, associated with mutations in 
the encoded protein. DNA from tumor tissue obtained from patients suffering from cancer 
can be isolated and the gene encoding the protein can be sequenced. By examining a number 
of patients in this manner, mutations in the gene that are associated with a malignant cellular 
phenotype can be identified. In addition, correlation of the nature of the observed mutations 
with subsequent observed clinical outcomes allows development of prognostic model for the 
predicted outcome in a particular patient. 

Screening for mutations conveniently can be carried out at the DNA level by use of 
PCR, although the skilled artisan will be aware that many other well known methods are 
available for the screening. PCR primers can be selected that flank known mutation sites, and 
the PCR products can be sequenced to detect the occurrence of the mutation. Alternatively, 
the 3 ' residue of one PCR primer can be selected to be a match only for the residue found in 
the unmutated gene. If the gene is mutated, there will be a mismatch at the 3' end of the 
primer, and primer extension cannot occur, and no PCR product will be obtained. 
Alternatively, primer mixtures can be used where the 3' residue of one primer is any 
nucleotide other than the nonmutated residue. Observation of a PCR product then indicates 
that a mutation has occurred. Other methods of using, for example, oligonucleotide probes to 
screen for mutations are described, or example, in U.S. Patent No. 4,871,838, which is 
herein incorporated by reference in its entirety. 

Alternatively, antibodies can be generated that selectively bind either mutated or non- 
mutated protein. The antibodies then can be used to screen tissue samples for occurrence of 
mutations in a manner analogous to the DNA-based methods described supra. 

The diagnostic methods described above can be used not only for diagnosis and for 
prognosis of existing disease, but may also be used to predict the likelihood of the future 
occurrence of disease. For example, clinically healthy patients can be screened for mutations 
in the inventive molecule that correlate with later disease onset. Such mutations may be 
observed in the heterozygous state in healthy individuals. In such cases a single mutation 
event can effectively disable proper functioning of the gene and induce a transformed or 
malignant phenotype. This screening also may be carried out prenatally or neonatally . 
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DNA molecules according to the invention also are well suited for use in so-called 
"gene chip" diagnostic applications. Such applications have been developed by, inter alia, 
Synteni and Affymetrix. Briefly, all or part of the DNA molecules of the invention can be 
used either as a probe to screen a polynucleotide array on a "gene chip," or they may be 
immobilized on the chip itself and used to identify other polynucleotides via hybridization to 
the surface of the chip. In this manner, for example, related genes can be identified, or 
expression patterns of the gene in various tissues can be simultaneously studied. Such gene 
chips have particular application for diagnosis of disease, or in forensic analysis to detect the 
presence or absence of an analyte. Suitable chip technology is described for example, in 
Wodicka et ai, Nature Biotechnology, 15:1359 (1997) which is hereby incorporated by 
reference in its entirety, and references cited therein. 

PROTEIN-PROTEIN INTERACTIONS 

Due to their similarity to certain known proteins, it is anticipated that some of the 
inventive protein molecules will interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper motifs. 

Any method suitable for detecting protein-protein interactions can be employed for 
identifying interacting targets. Among the traditional methods which can be employed are co- 
immunoprecipitation, crosslinking and co-purification through gradients or chromatographic 
columns. Utilizing procedures such as these allows for the identification of GAP gene 
products. Once identified, a GAP protein can be used, in conjunction with standard 
techniques, to identify its corresponding pathway gene. For example, at least a portion of the 
amino acid sequence of the pathway gene product can be ascertained using techniques well 
known to those of skill in the art, such as via the Edman degradation technique (see, e.g. . 
Creighton, 1983, PROTEINS: STRUCTURES AND MOLECULAR PRINCIPLES, W.H. 
Freeman & Co. , N.Y., pp. 34-49). The amino acid sequence obtained can be used as a guide 
for the generation of oligonucleotide mixtures that can be used to screen for pathway gene 
sequences. Screening can be accomplished, for example, by standard hybridization or PCR 
techniques. Techniques for the generation of oligonucleotide mixtures and for screening are 
well-known. (See e.g. , Ausubel, supra, and PCR PROTOCOLS: A GUIDE TO METHODS 
AND APPLICATIONS, 1990, Innis et al, eds. Academic Press, Inc., New York). 
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Additionally, methods can be employed which result in the simultaneous identification 
of interacting target genes. One method which detects protein interactions in vivo, the two- 
hybrid system, is described in detail for illustration purposes only and not by way of 
limitation. One version of this system has been described (Chien et al. , Proc. Natl. Acad. 
Sci. USA, 88: 9578-9582 (1991)) and is commercially available from Clontech (Palo Alto, 
CA). 

Briefly, utilizing such a system, plasmids are constructed that encode two hybrid 
proteins: one consists of the DNA-binding domain of a transcription activator protein fused to 
a known protein, in this case an inventive protein, and the other contains the activator 
protein's activation domain fused to an unknown protein (a putative GAP, for instance) that is 
encoded by a cDNA which has been recombined into this plasmid as part of a cDNA library. 
The plasmids are transformed into a strain of the yeast Saccharomyces cerevisiae that contains 
a reporter gene (e.g., lacZ) whose regulatory region contains the transcription activator's 
binding sites. Either hybrid protein alone cannot activate transcription of the reporter gene, 
the DNA-binding domain hybrid cannot because it does not provide activation function, and 
the activation domain hybrid cannot because it cannot localize to the activator's binding sites. 
Interaction of the two hybrid proteins reconstitutes the functional activator protein and results 
in expression of the reporter gene, which is detected by an assay for the reporter gene 
product. 

The two-hybrid system or related methodology can be used to screen activation 
domain libraries for proteins that interact with a known "bait" gene product. By way of 
example, and not by way of limitation, gene products known to be involved in TH cell 
subpopulation-related disorders and/or differentiation, maintenance, and/or effector function 
of the subpopulations can be used as the bait gene products. Total genomic or cDNA 
sequences are fused to the DNA encoding on activation domain. This library and a plasmid 
encoding a hybrid of the bait gene product fused to the DNA-binding domain are 
cotransformed into a yeast reporter strain, and the resulting transformants are screened for 
those that express the reporter gene. For example, and not by way of limitation, the bait gene 
can be cloned into a vector such that it is translationally fused to the DNA encoding the DNA- 
binding domain of the GAL4 protein. These colonies are purified and the library plasmids 
responsible for reporter gene expression are isolated. DNA sequencing is then used to 
identify the proteins encoded by the library plasmids. 
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The present invention, thus generally described, will be understood more readily by 
reference to the following examples, which are provided by way of illustration and are not 
intended to be limiting of the present invention. 

The examples below are provided to illustrate the subject invention. These examples 
are provided by way of illustration and are not included for the purpose of limiting the 
invention. 

EXAMPLES 

EXAMPLE I: cDNA Library Construction 

cDNA library plates and clones originated from five cDNA libraries that were 
constructed by directional cloning. These are available through the Resource Center 
(http://www.rzpd.de) of the German Genome Project. In particular, the hfbr2 (human fetal 
brain; RZPD number DKFZp564) and hfkd2 (human fetal kidney; DKFZp566) libraries were 
generated using the Smart kit (Clontech), except that PCR was carried out with primers that 
contained uracil residues to permit directional cloning without restriction digestion and 
ligation, and were complementary with the pAMPl (LifeTechnologies) cloning sites for 
directional cloning. The htes3 (human testes; DKFZp434), hutel (human uterus; DKFZp586) 
and hmcfl (human mammary carcinoma; DKFZp727) libraries are conventional (Gubler, U., 
Hoffman, B.J., (1983), A simple and very efficient method for generating cDNA libraries. 
Gene 25, 263-269), size-selected cDNA libraries. They are cloned into pSPORTl 
(LifeTechnologies) via a NotI site which is introduced during reverse transcription 
downstream of the oligo dT primer and a Sail site that is introduced by the ligation of a 
adapters. The human mammary carcinoma library was constructed fgrom MCF7 cells. 

The cDNA sequences of this application were first identified among the sequences 
comprising various libraries. Technology has advanced considerably since the first cDNA 
libraries were made. Many small variations in both chemicals and machinery have been 
instituted over time, and these have improved both the efficiency and safety of the process. 
Although the cDNAs could be obtained using an older procedure, the procedure presented in 
this application is exemplary of one currently being used by persons skilled in the art. For the 
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purpose of providing an exemplary method, the mRNA isolation and cDNA library 
construction described here is for the MCF-7 library (DKFZp727) from which the clones 
named DKFZphmcfl xxyyxx were obtained. 

The human cell line MCF-7 was grown in DMEM supplemented with 10% fetal calf 
serum until confluency. 3 X 10 8 cells were harvested with a cell scraper in PBS. Cells were 
lysed in buffer containing 0.5 % NP-40 to leave the nuclei intact. The debris was pelleted by 
centrifugation at 15 000 x g for 10 minutes at 4 degrees Celsius. Proteins in the supernatant 
were degraded in presence of SDS and Proteinase K (30 minutes at 56 degrees Celsius). 
Precipitation of proteins was done in a Phenol/Chloroform extraction, RNA was precipitated 
from the aqueous phase with Na-acetate and Ethanol. Polyadenylated messages were isolated 
using Qiagen Oligotex (QIAGEN, Hilden Germany). 

First strand cDNA synthesis was accomplished using an oligo (dT) primer which also 
contained an NotI restriction site. Second strand synthesis was performed using a 
combination of DNA polymerase I, E. culi ligase and RNase H, followed by the addition of a 
Sail adaptor to the blunt ended cDNA. The Sail adapted, double-stranded cDNA was then 
digested with NotI restriction enzyme, and fractionated by size on an agarose gel. DNA of the 
appropriate size was cut from the gel and cast into a second gel in a 90° angle. After 
electrophoresis in the second dimension, cDNA of the appropriate size was cut from the gel. 
The agarose block was broken down with help of gelase. The cDNA was purified with help of 
two phenol extractions and an ethanol precipitation. The cDNA was ligated into Sall/NotI 
pre-digested pSportl vector (LifeTechnologies) and transformed into DH10B bacteria. 

The libraries were arrayed into 3 84- well microtiter plates and spotted on high density 
nylon membranes for hybridization analysis. Filters and clones are available through the 
Resource Center. Whole plates were distributed to the sequencing partners of the consortium 
for systematic sequencing. 

EXAMPLE II: Sequencing of cDNA Clones 

All clones in the 3 84- well microtiter plates were sequenced from the 5' end. 
Sequencing was done preferentially using dye terminator chemistry (ABD or Amersham) on 
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ABI automated DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL 
prototype instruments (Arakis) mainly with dye primer chemistry. 



The resulting expressed sequence tag (EST) sequences ("rl ESTs" = sequenced from 
5 '-end) were analysed for: 

a) the lack of identical matches with known genes. 

For this, the EST-sequence was blasted against the cDNA consortiums own 
database and after that against public databases and (with BLASTn and BLASTx against 
EMBL/EMBLNEW and assembled ESTs, please refer to EXAMPLE III: Bioinformatics 
analysis of full length cDNAs, for description and parameter settings). ESTs which were 
identical to known genes in more than 100 bp, with less than 2 mismatches, were excluded 
from further analysis. 

b) the presence of an open reading frame 

Open reading frames (ORFs) were detected with an tool developed by Munich 
Information Center for Protein Sequences (MIPS) called ORF-map. ORF-map visualises 
potential start and stop-codons. If an ORF without a stop codon was detected in a rl-EST, 
the sequence was processed further. 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GC -content of the rl -sequence, which 
should be >40%. Writing similar scripts is within the ordinary skill of one in bioinformatics. 

d) the lack of repeat structures 

Repeats such as Alu, Line or CA-repeats were detected by blasting (BLASTn and 
BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDNAs, for 
description and parameter settings) against a repeat-database compiled by MIPS. If a repeat 
was present within the rl -sequence, the sequence were not processed further. 

Novel clones that met all criteria were identified to the sequencers, who then 
performed 3 '-end sequencing of these clones. The resulting 3' ESTs ("si ESTs" = sequenced 
from 3 '-end) were checked for 
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a) the lack of matches with known genes in public databases, and sequences already 
generated by us. 



This was done by blasting against EMBL/EMBLNEW and assembled EST (BLASTn 
and BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDNAs, 
for description and parameter settings). 

b) the presence of polyadenylation signals. 

Again only clones matching the selection criteria were chosen to be sequenced 
completely by the sequencers. Clones were selected after the following criteria: 

A very good ORE had at least one BLASTx match to other proteins. A "good ORE" 
should extend to the 3' end and be longer than -40 codons. If the ORE started in the rl 
sequence, in front of the potential start codon, there should not exist too many competing start 
codons in frame with the ORE start codon and the start should match the Kozak consensus 
ATG. If the EST sequence was to short to decide according to the potential ORE, and there 
were only a few or no start codons in the sequence the GC content of the Sequence should be 
greater than 40%. The rl sequences needed not contain an polyA-tail at the 3' end. In 
addition, the results of the blasting against the assembled human ESTs could help in 
questionable cases to decide whether to stop or to continue. A hit against these ESTs was an 
indication to go further. 

Clones passing the above-described screening were sequenced in full. Sequencing was 
done preferentially using dye terminator chemistry (ABD or Amersham) on ABI automated 
DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL prototype 
instruments (Arakis) mainly with dye primer chemistry. Primer walking (Strauss et al., 1986, 
Specific-primer-directed DNA sequencing. Anal Biochem. 154, 353-360) was the preferred 
sequencing strategy because of the lower redundancy possible compared to random shotgun 
(Messing, J., Crea, R., Seeburg, H.P. (1981) A system for shotgun DNA sequencing. Nucleic 
Acids Res. 9, 32-39) methods. Walking primers were generally designed using software (e.g. 
Haas, S., Vingron, M., Poustka, A., Wiemann, S. (1998) Primer design in large-scale 
sequencing. Nucleic Acids Res. 26, 3006-3012, Schwager, C, Wiemann, S., Ansorge, W. 
(1995) GeneSkipper: integrated software environment for DNA sequence assembly and 
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alignment. HUGO Genome Digest 2, 8-9) that permitted complete automation of this usually 
time consuming process and helped in the parallel processing of large numbers of clones. 



EXAMPLE III: Bioinformatics analysis of full length cDNAs 

Each sequence obtained was compared on nucleotide level in a stepwise manner to 
sequences in EMBL/EMBLNEW, EMBL-EST, EMBL-STS using the BLASTn algorithm. 
Basic Local Alignment Search Tool (BLAST, Altschul S. F. (1993) J Mol Evol 36:290-300; 
Altschul, S. F. et al (1990) J Mol Biol 215:403-10) is used to search for local sequence 
alignments. BLAST produces alignments of both nucleotide (BLASTn) and amino acid 
sequences (BLASTp or BLASTx) to determine sequence similarity. BLAST is especially 
useful in determining exact matches or in identifying homologs, because of the local nature of 
the alignments. While it is useful for matches which do not contain gaps, it is inappropriate 
for performing motif-style searching. The fundamental unit of BLAST algorithm output is the 
High-scoring Segment Pair (HSP). 

An HSP consists of two sequence fragments of arbitrary but equal lengths whose 
alignment is locally maximal and for which the alignment BLAST approach is to look 
threshold or cut off score set by the user. BLAST looks for HSPs between a query sequence 
and a database sequence, to evaluate the statistical significance of any matches found, and to 
report only those matches which satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold for reporting database sequence 
matches. E is interpreted as the upper bound of the expected frequency of chance occurrence 
of an HSP (or set of HSPs) within the context of the entire database search. Any database 
sequence whose match satisfies E is reported in the program output. Parameter settings for 
the BLAST-operations (BLASTN 2.0al9MP-WashU) described were: EMBL-EMBLNEW: 
H=0 V=5 B=5 -filter seg; EMBL-EST: H=0 E=le-10 B=500 V=500 -filter seg; EMBL-STS: 
H=0 V=5 B=5. 

Search against EMBL/EMBLNEW was done to determine whether the cDNAs are 
already known, and also to find out whether the cDNAs are encoded by genomic sequences 
already sequenced and published/submitted to these databases. 
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Search against EMBL-EST was performed to get a first impression how abundant a 
particular cDNA would be and to get information on tissue specificity (so-called "electronic 
Northern-Blot", e.g. some of the cDNAs derived of the testis library show only hits to ESTs 
also derived of testis libraries). 

The cDNA-sequences were blasted against EMBL-STS to determine STS-sequence- 
match to the cDNA, thus providing a mapping information to the new cDNA. 

The potential protein-sequences were generated automatically by a script searching 
for the longest open reading frame (ORF) in each of the three forward frames with a 
minimum length of 90 codons. Next, the automatically generated ORFs were translated into 
protein sequences. These protein sequences were searched against the non redundant protein 
data set of PIR/SwissProt/Trembel/Tremblnew (BLASTP 2.0al9MP-WashU, parameter 
setting: V=7 B=7 H=0 -filter seg). If the script generated more than one ORF, one ORF was 
chosen manually by the annotater according to the degree of similarity to known proteins, the 
location of the ORF in the cDNA, the length, the amino acid composition and the content of 
Prosite-Motifs. 

Additionally there was a BLASTx (BLASTX 2.0al9MP-WashU against non 
redundant protein database comprising PIR/SWISSPROT/TREMBL/TREMBLNEW; 
parameter-settings were: matrix/home/data/blast/matrix/aa/BLOSUM62 H=0 V=5 B=5 -filter 
seg) search to find potential frame shift in the complementary cds of the cDNAs and to 
identify unspliced or partly spliced cDNAs. The protein sequence was then transferred to the 
PEDANT system, in order to generate additional information on the new proteins. PEDANT 
(Protein Extraction, Description, and ANalysis Tool, Frishman, D. & Mewes, H.-W. (1997) 
PEDANTic genome analysis. Trends in Genetics , 13, 415-416) is a platform developed at the 
Munich Information Center for Protein Sequences (MIPS, Munich, Germany), which 
incorporates practically all bioinformatics methods important for the functional and structural 
characterisation of protein sequences. Computational methods used by PEDANT are: 
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Very sensitive protein sequence database searches with estimates of statistical 
significance. Pearson W.R. (1990) Rapid and sensitive sequence comparison with FASTP 
and FASTA. Methods Enzymol. 183, 63-98. 

BLAST2 

Very sensitive protein sequence database searches with estimates of statistical 
significance. Altschul S.F., Gish W., Miller W., Myers E.W., and Lipman D.J. Basic local 
alignment search tool. Journal of Molecular Biology 215, 403-10. 

PREDATOR 

High-accuracy secondary structure prediction from single and multiple sequences. 
Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. 
Proteins, 27, 329-335. Frishman, D. and Argos, P.(1996) Incorporation of long-distance 
interactions in a secondary structure prediction algorithm. Prot. Eng. 9, 133-142. 

STRIDE 

Secondary structure assignment from atomic coordinates. Frishman, D. and Argos, 
P. (1995) Knowledge-based secondary structure assignment. Proteins 23, 566-579. 

CLUSTALW 

Multiple sequence alignment. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) 
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through 
sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids 
Research, 22:4673-4680. 

TMAP 

Transmembrane region prediction from multiply aligned sequences. Persson, B. and 
Argos, P. (1994) Prediction of transmembrane segments in proteins utilising multiple 
sequence alignments. J. Mol. Biol. 237, 182-192. 



127 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 

ALOM2 



PCT/IB00/01496 



Transmembrane region prediction from single sequences. Klein, P., Kanehisa, M., 
and DeLisi, C. Prediction of protein function from sequence properties: A discriminant 
analysis of a database. Biochim. Biophys. Acta 787, 221-226 (1984). Version 2 by Dr. K. 
Nakai. 

SIGNALP 

Signal peptide prediction Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G 
(1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their 
cleavage sites. Protein Engineering 10, 1-6. 

SEG 

Detection of low complexity regions in protein sequences. Wootton, J.C., Federhen, 
S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. 
Computers & Chemistry 17, 149-163. 

COILS 

Detection of coiled coils. Lupas, A., M. Van Dyke, and J. Stock, "Predicting Coiled 
Coils from Protein Sequences." Science (1991) 252, 1 162-1 164. 

PROSEARCH 

Detection of PROSITE protein sequence patterns. Kolakowski L.F. Jr., Leunissen 
J.A.M., Smith J.E. (1992) ProSearch: fast searching of protein sequences with regular 
expression patterns related to protein structure and function. Biotechniques 13, 919-921. 

BLIMPS 

Similarity searches against a database of ungapped blocks. J.C. Wallace and Henikoff 
S., (1 992) PATMAT: a searching and extraction program for sequence, pattern and block 
queries and databases, CABIOS 8, 249-254. Written by Bill Alford. 

HMMER 
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Hidden Markov model software . Sonnhammer E.L.L., Eddy S.R., Durbin R. (1997) 
Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments. Proteins 
28, 405-420. 

Pi 

Perl script that returns the amino acid composition, molecular weight, theoretical pi, and 
expected extinction coefficient of an amino acid sequence. By Fred Lindberg. The 
parameter-settings were as follows: known3d: score > 100; BLAST: E-value < 10; SCOP: <= 
50 Alignments, E-Value < 0.0001; signalp: Y=0.7; untersucht vom N-Terminus her: 50 aa; 
funcat: E-value < 0.001; BLOCKS: <= 10 hits; BLIMPS: threshold 1100.0; COILS: threshold 
0.95; SEG: threshold 20.0; BLAST in report: E-value < 0.001; PIR-KW, superfamilies, EC- 
Nummern in report: E-value < 0.00001; known3d in report: score > 120 

The results of PEDANT analysis, together with the results of the similarity searches, 
constitute the basis for the structural and functional annotation of the cDNAs and the encoded 
proteins, as specified below. 



EXAMPLE III: CELLULAR LOCALIZATIONS OF GFP-FUSION PROTEINS 

Plasmids of cDNA-GFP fusions were transfected into mammalian tissue culture cells 
and allowed to express the proteins for up to 48 hours. Live cells were imaged at 24 hours 
and 48 hours after transfection and the localisations recorded. The chart, below, depicts the 
apparent final cellular localisations of 107 cDNA-GFP fusions. 

In order to minimize the possibility of the GFP interfering with protein function 
and/or localization, two separate populations of cDNAs were generated encoding N-terminal 
or C-terminal GFP fusions. Clearly this appears to be a crucial strategy, since overall only 
56% of the proteins localised to a specific compartment irrespective of the position of the 
GFP. In the instances where only one fusion localized, the complementary fusion either gave 
no expression or a nuclear and cytosolic staining - characteristic for GFP alone expression. 

Each cDNA in turn was subjected to bioinformatic analysis. Where possible, the 
potential subcellular localisations of the expressed proteins were determined. This 
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GFP-fusion proteins in mammalian cells. 
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group: Cell structure and motility 

DKFZphf br2_lGcl6 . 3 encodes a novel 586 amino acid 1 protein with .similarity to the human actin 
binding protein MAYVEN and Drosophila Kelch. 

MAVEN is a novel actin binding protein predominantly expressed in brain. Drosophila kelch is 
involved in the maintenance of ring canal organization during oogenesis. The amino half of the 
protein including the BTB domain mediates dimerization, while the amino half might allow 
cross-linking of ring canal actin filaments, thus organising the inner rim cytoskeleton . The 
kelch repeat domain is necessary for ring canal localisation and believed to mediate an 
additional interaction, possibly with actin. The new protein shares the features of both 
proteins and therefore should be involved in the organisation of cyto skeleton binding to 
membrane proteins . 

The new protein can find application in modulating/blocking of cyto skeleton-membrane protein 
interaction. 



similarity to Drosophila kelch 
complete cDNA, complete cds, EST hits 

on genomic level partly encoded by AC005082 and AC006039 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 3 02 8 bp 

Poly A stretch at pos. 3 004, polyadenylation signal at pos . 2984 



1 GGGGGCCCGG GGACGCAGCC CAGTTGGTAG CGTCGCTCCC TGAGCGTTTC 
51 TAAGGGGGCC GCCCGGCCCT GTCTTTCGGC AGTGGCCGAG CCACCGCCGC 
101 CTGCCGCGCG TTCCAGAGCT GGGCGCTGCA GCTGCACTGC CGATCGCCGT 
151 GTTTGGTCGA TAGAATCCCC AGTGTGCCCA GAGAGTGCGA CCCCTCGCCC 
2 01 GGCCCGGCGA GCCCCGGGCG TGAACCGAGC TGAGGGAGGA TGGCAGCCTC 

2 51 TGGGGTGGAG AAGAGCAGCA AGAAGAAGAC CGAGAAGAAA CTTGCTGCTC 

3 01 GGGAAGAAGC TAAATTGTTG GCGGGTTTCA TGGGCGTCAT GAATAACATG 

3 51 CGGAAACAGA AAACGTTGTG TGACGTGATC CTCATGGTCC AGGAAAGAAA 

4 01 GATACCTGCT CATCGTGTTG TTCTTGCTGC AGCCAGTCAT TTTTTTAACT 
451 TAATGTTCAC AACTAACATG CTTGAATCAA AGTCCTTTGA AGTAGAACTC 

5 01 AAAGATGCTG AACCTGATAT TATTGAACAA CTGGTGGAAT TTGCTTATAC 
551 TGCTAGAATT TCCGTGAATA GCAACAATGT TCAGTCTTTG TTGGATGCAG 

6 01 CAAACCAATA TCAGATTGAA CCTGTGAAGA AAATGTGTGT TGATTTTTTG 
651 AAAGAACAAG TTGATGCTTC AAATTGTCTT GGTATAAGTG TGCTAGCGGA 
701 GTGTCTAGAT TGTCCTGAAT TGAAAGCAAC TGCAGATGAC TTTATTCATC 
751 AGCACTTTAC TGAAGTTTAC AAAACTGATG AATTTCTTCA ACTTGATGTC 
801 AAGCGAGTAA CACATCTTCT CAACCAGGAC ACTCTGACTG TGAGAGCAGA 
851 GGATCAGGTT TATGATGCTG CAGTCAGGTG GTTGAAATAC GATGAGCCTA 
901 ATCGCCAGCC ATTTATGGTT GATATCCTTG CTAAAGTCAG GTTTCCTCTT 
951 ATATCAAAGA ATTTCTTAAG TAAAACGGTA CAAGCTGAAC CACTTATTCA 

1001 AGACAATCCT GAATGCCTTA AGATGGTGAT AAGTGGAATG AGGTACCATC 
10 51 TACTGTCTCC AGAGGACCGA GAAGAACTTG TAGATGGCAC AAGACCTAGA 
1101 AGAAAGAAAC ATGACTACCG CATAGCCCTA TTTGGAGGCT CTCAACCACA 
1151 GTCTTGTAGA TATTTTAACC CAAAGGATTA TAGCTGGACA GACATCCGCT 

12 01 GCCCCTTTGA AAAACGAAGA GATGCAGCAT GCGTGTTTTG GGACAATGTA 
1251 GTATACATTT TGGGAGGCTC TCAGCTTTTC CCAATAAAGC GAATGGACTG 

13 01 CTATAATGTA GTGAAGGATA GCTGGTATTC GAAACTGGGT CCTCCGACAC 

13 51 CTCGAGACAG CCTTGCTGCA TGTGCTGCAG AAGGCAAAAT TTATACATCT 

14 01 GGAGGTTCAG AAGTAGGAAA CTCAGCTCTG TATTTATTTG AGTGCTATGA 
14 51 TACGAGAACT GAAAGCTGGC AC AC AAAG CC CAGCATGCTG ACCCAGCGCT 
1501 GCAGCCATGG GATGGTGGAA GCCAATGGCC TAATCTATGT TTGTGGTGGA 
1551 AGTTTAGGAA ACAATGTTTC AGGGAGAGTG CTTAATTCCT GTGAAGTTTA 
1601 TGATCCTGCC ACAGAAACAT GGACTGAGCT GTGTCCAATG ATTGAAGCCA 
1651 GGAAGAATCA TGGGCTGGTA TTTGTAAAAG ACAAGATATT TGCTGTGGGT 
1701 GGTCAGAATG GTTTAGGTGG TCTGGACAAT GTGGAATATT ACGATATTAA 
1751 GTTGAACGAA TGGAAGATGG TCTCACCAAT GCCATGGAAG GGTGTAACAG 
1801 TGAAATGTGC AGCAGTTGGC TCTATAGTTT ATGTCTTGGC TGGTTTTCAG 
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2 301 AGAAGATTGG CTCATCAGTG AAGCGCAGTA TCTTAGCTCT AGATTCTATT 
2351 TTCATGCATC ACAGAAGTGC TATACGGTTA GGTCTGTTTG TGCTCAGTCA 
2 401 AGAACTAAGA AATAGTATGA ATTGTAAGTC AAGATGGGCA ACTCAGATGG 
2451 AGCAGCTTAG TCTCACAGTT TGCTTGTCTA TTTATTTTAT TTAGTGCCAA 
2501 ATGTATTCCA TTTTAAAAGT AAGCCAGAGT GAGTCAAGGC ATATACACAC 
2551 TTTCTCACAA AACTTCCTAA ACAGATTTGG GGGTTTAATA TGTCCAACTC 
2 601 CTCATGAAAT ATATTCAATC CACTTAAATA TATTCCATCT TTTTAACATA 
2 651 AAATGTAAAG CTTAGCACCC ATCATTAATT TATGTCTCTG TTTTATCCAG 
2701 TGGTTAAAAA AGGATTCTGC CTCTTTAGTC CTCACTGTTA AATAAAACCC 
2751 AATCATAGTA AGTGATTAAC TAGCAAAAAG TAAAGCTATT TATAGCAAAT 
2801 TTCTAGATCA TTAGAAAAGC ACTGGTAGTT GTACAATATC AGTGTTGACT 
2851 TTGAACTTCT TTAACGAGAT CATGAATTCT TTTCCCTTAG CCAAAACATG 
2901 AAATATTTAA CCTAGTTGTC TCTAAAAGTT TTGTAATCAT GAGTTAGATA 
2 951 TATGTCATCT CCTATTCATT GCTTTTATGT GATCAATAAA TCTTTTACAA 
3001 ACCCAAAAGA AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry AC005082 from database EMBL: 

Homo sapiens clone RG271G13; HTGS phase 1, 7 unordered pieces. 
Score = 6460, P = 0.0e+00, identities = 1292/1292 

4 exons matching Bp 1180-3007 

Entry AC006039 from database EMBL: 

*** SEQUENCING IN PROGRESS *»* Homo sapiens clone NH0319F03; HTGS phase 
1, 3 unordered pieces. 

Score = 1780, P = 2.0e-117, identities = 368/377 

5 exons matching Bp 6-860 

Entry HSG20603 from database EMBL: 
human STS A005Y34. 
Score = 670, P = 1.0e-23, identities = 134/134 



Medline entries 



93201592 : 

kelch encodes a component of intercellular bridges in 
Drosophila egg chambers. 

97412177: 

Drosophila kelch is an oligomeric ring canal actin organizer. 



Peptide information for frame 3 



ORF from 240 bp to 1997 bp; peptide length: 586 
Category: strong similarity to known protein 



1 MAASGVEKSS 
51 QERKIPAHRV 
101 FAYTARISVN 
151 VLAECLDCPE 
201 VRAEDQVYDA 
251 PLIQDNPECL 
301 SQPQSCRYFN 
351 RMDCYNVVKD 
4 01 ECYDTRTESW 
4 51 CEVYDPATET 
501 YDIKLNEWKM 
551 ETDKWVANSK 



KKKTEKKLAA 
VLAAASHFFN 
SNNVQSLLDA 
LKATADDFIH 
AVRWLKYDEP 
KMVISGMRYH 
PKDYSWTDIR 
SWYSKLGPPT 
HTKPSMLTQR 
WTELCPMIEA 
VSPMPWKGVT 
VRAFPVTSCL 



REEAKLLAGF 
LMFTTNMLES 
ANQYQIEPVK 
QHFTEVYKTD 
NRQPFMVDIL 
LLSPEDREEL 
CPFEKRRDAA 
PRDSLAACAA 
CSHGMVEANG 
RKNHGLVFVK 
VKCAAVGSIV 
ICVVDTCGAN 



MGVMNNMRKQ 
KSFEVELKDA 
KMCVDFLKEQ 
EFLQLDVKRV 
AKVRFPLISK 
VDGTRPRRKK 
CVFWDNVVYI 
EGKIYTSGGS 
LI YVCGGSLG 
DKI FAVGGQN 
YVLAGFQGVG 
EETLET 



KTLCDVILMV 
EPDIIEQLVE 
VDASNCLGIS 
THLLNQDTLT 
NFLSKTVQAE 
HDYRIALFGG 
LGGSQLFPIK 
EVGNSALYLF 
NNVSGRVLNS 
GLGGLDNVEY 
RLGHI LEYNT 



BLASTP hits 



Entry KELC_DROME from database SWISSPROT: 
RING CANAL PROTEIN (KELCH PROTEIN) . 
Length = 689 

Score = 816 (287.2 bits), Expect = 1.9e-81, P = 1.9e-81 
Identities = 187/542 (34%), Positives = 290/542 (53%) 

Entry AC004021_1 from database TREMBL: 

WUGSC:H_DJ0186K10. 1"; Human PAC clone DJ0186K10 from 5q31, 
complete sequence. Homo sapiens (human) 
Length =4 97 
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Score = 704 (247.8 bits), Expect = 1.4e-69, P = 1.4e-69 
Identities = 163/483 (33%), Positives = 253/483 (52%) 

Entry HSDKG12_1 from database TREMBL: 

"KIAA0132"; Human mRNA for KIAA0132 gene, complete cds. Homo 
sapiens (human) 
Length = 624 

Score = 692 (243.6 bits), Expect = 2.6e-68, P = 2.6e-68 
Identities = 175/527 (33%), Positives = 272/527 (51%) 

Entry A45773 from database PIR: 

kelch protein, long form - fruit fly (Drosophila melanogaster ) 
Length = 1476 

Score = 817 (287.6 bits), Expect = 1.7e-80, P = 1.7e-80 
Identities = 189/549 (34%), Positives = 292/549 (53%) 



Alert BLASTP hits for DKFZphf br2_16cl6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_l 6cl6, frame 3 



Report for DKFZphfbr2_16cl6 . 3 



[LENGTH] 


586 




[MW] 


65992.06 




[pi] 


6.08 




[HOMOL] 


PIR:A45773 kelch protein, long form - fruit fly 


(Drosophila melanogaster) 5e-85 


[BLOCKS] 


BL00075D Dihydrofolate reductase proteins 




[SCOP] 


dlgog_3 2.46.1.1.1 (151-537) Galactose oxidase. 


central domai 6e-36 


[PIRKW] 


zinc finger 2e-ll 




[PIRKW] 


DNA binding 9e-10 




[PIRKW] 


transcription factor le-06 




[SUPFAM] 


A55R protein middle region homology le-35 




[SUPFAM] 


POZ domain homology le-35 




[SUPFAM] 


vaccinia virus 59K Hindlll-C protein 5e-15 




[SUPFAM] 


A55R protein le-35 




[SUPFAM] 


myxoma virus M9-R protein 2e-ll 




[SUPFAM] 


A55R protein carboxyl-terminal homology le-35 




[PROSITE] 


CAMP PHOSPHO SITE 2 




[PROSITE] 


MYRISTYL 8 




[PROSITE] 


CK2 PHOSPHO SITE 10 




[PROSITE] 


TYR PHOSPHO SITE 1 




[PROSITE] 


PKC PHOSPHO SITE 11 




[PROSITE] 


ASN GLYCOSYLATION 1 




[KW] 


Alpha Beta 




[KW] 


LOWCOMPLEXITY 3.75 % 





SEQ MAASGVEKSSKKKTEKKLAAREEAKLLAGFMGVMNNMRKQKTLCDVILMVQERKIPAHRV 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD . ccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeccccchhhhhe 

SEQ VLAAASHFFNLMFTTNMLESKSFEVELKDAEPDIIEQLVEFAYTARISVNSNNVQSLLDA 

SEG 

PRD eeccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhheeeeccchhhhhhhh 

SEQ ANQYQIEPVKKMCVDFLKEQVDASNCLGISVLAECLDCPELKATADDFIHQHFTEVYKTD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EFLQLDVKRVTHLLNQDTLTVRAEDQVYDAAVRWLKYDEPNRQPFMVDILAKVRFPLISK 

SEG 

PRD hhhchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhccch 

SEQ NFLSKTVQAEPLIQDNPECLKMVISGMRYHLLSPEDREELVDGTRPRRKKHDYRIALFGG 

SEG 

PRD hhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccccccccceeeeeeecc 

SEQ SQPQSCRYFNPKDYSWTDIRCFFEKRRDAACVFWDNVVYILGGSQLFPIKRMDCYNVVKD 

SEG 

PRD ccccceeeccccccccccccccccccceeeeeeeceeeeeeccccccccceeeecccccc 

SEQ SWYSKLGPPTPRDSLAACAAEGKI YTSGGSEVGNSALYLFECYDTRTESWHTKPSMLTQR 

SEG 

PRD cccccccccccccceeeeeccceeeeeccccccccceeeeeecccccccccccccccccc 
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SEQ CSHGMVEANGLI YVCGGSLGNNVSGRVLNSCEVYDPATETWTELCPMIEARKNHGLVFVK 

SEG 

PRD ccceeeecceeeeeecccccccccccccceeeeccccccccccccccccccccceeeeec 

SEQ DKIFAVGGQNGLGGLDNVEYYDIKLNEWKMVSPMPWKGVTVKCAAVGSIVYVLAGFQGVG 

SEG 

PRD ceeeecccccccccccceeeccccccceeecccccccccceeeeeccceeeeeccccccc 

SEQ RLGHILEYNTETDKWVANSKVRAFPVTSCLICVVDTCGANEETLET 

SEG 

PRD cccceeecccccccccccccccccccceeeeeeeeccccccccccc 



Prosite for DKFZphfbr2_16cl6.3 



PS00001 


442- 


>4 4 6 


ASN GLYCOSYLATIQN 


nT\rsf A A r\ O 1 
fUUCUUUUl 


PS00004 


1 1 


->15 


CAMP PHOSPHO SITE 


FDUOUUUU4 


PSO0O04 


188- 


>192 


CAMP PHOSPHO SITE 




PS00005 


9 


; ->12 


PKC PHOSPHO 


SITE 


rUULUUU U j 


PS00005 


10 


->13 


PKC PHOSPHO 


SITE 




PS00005 


1 4 


->17 


PKC PHOSPHO 


SITE 


rUULUUUUj 


PS00005 


104- 


>1 0"? 


PKC_PHOSPHO~ 


SITE 


FDUOUUUUb 


PS00005 


200- 


>2 03 


PKC PHOSPHO" 


"site 




PS00005 


305- 


>308 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


370- 


>373 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


418- 


>421 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


444- 


>447 


PKC PHOSPHO 


"sits 


PDOC00005 


PS00005 


520- 


>523 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


,5 52- 


>555 


PKC PHOSPHO] 


"site 


PDOC00005 


PSO0O06 


4->8 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00QO6 


42 


->46 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


116- 


>120 


CK2 PHOSPHO 


"site 


PDCC00006 


PS00006 


164- 


>168 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


273- 


>277 


CK2 PHOSPHO 


"STTE 


PDOC00006 


PS0000S 


315- 


>319 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


370- 


>374 


CK2 PHOSPHO 


"site 


PDCC00006 


PS00006 


405- 


>409 


CK2 PHOSPHO" 


"sits 


PDOC00006 


PS00006 


460- 


>464 


CK2_PHOSPHO~ 


"SITS 


PDOC00006 


PS00005 


550- 


>554 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


202- 


>209 


TYR PHOSPHO" 


"sits 


PDOC00007 


PS00008 


5 


->11 


MYRISTYL 




PDOC00008 


PS00008 


32 


->38 


MYRISTYL 




PDOC00008 


PS00008 


389- 


>395 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


436- 


>442 


MYRISTYL 




PDOC00008 


PS00008 


440- 


>4 4 6 


MYRISTYL 




PDOC00008 


PS00008 


487- 


>493 


MYRISTYL 




PDOC0C008 


PS00008 


493- 


>499 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_16cl6 . 3 ) 
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group: brain derived 

DKFZphfbr2_16f 21 encodes a novel 208 amino acid protein with strong similarity to human zinc 
finger protein 216. 

The novel protein shows strong similarity to the human zinc finger protein 216, but has no Zn 
finger. 

PROSITE: Contains no Zinc finger; No informative BLAST results; no predictive prosite, pfam or 
SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



strong similarity to zinc finger protein 216 

complete cDNA, complete cds, EST hits 
start matches Kozak consensus ANNatgG, 



Sequenced by Qiagen 
Locus : unknown 



Insert length: 1512 bp 

Poly A stretch at pos . 1490, polyadenylation signal at pos . 1474 



1 GGGAGCAAGC AGGGGTTCGG CGGCATTACC TGTACCCATT CACCGGCGGC 

51 TACCGGCGGC GGCGCGTAGC GTGTCAGGCG GAGAGACCCG CCGCCAGGTG 

101 TGCAACTGAG GAACATGGCT CAAGAAACTA ATCACAGCCA AGTGCCTATG 

151 CTTTGTTCCA CTGGCTGTGG ATTTTATGGA AACCCICGTA CAAATGGCAT 

201 GTGTTCAGTA TGCTATAAAG AACATCTTCA AAGACAGAAT AGTAGTAATG 

251 GTAGAATAAG CCCACCTGCA ACCTCTGTCA GTAGTCTGTC TGAATCTTTA 

301 CCAGTTCAAT GCACAGATGG CAGTGTGCCA GAAGCCCAGT CACCATTAGA 

351 CTCTACATCT TCATCTATGC AGCCCAGCCC TGTATCAAAT CAGTCACTTT 

401 TATCAGAATC TGTAGCATCT TCTCAATTGG ACAGTACATC TGTGGACAAA 

451 GCAGTACCTG AAACAGAAGA TGTGCAGGCT TCAGTATCAG ACACAGCACA 

501 GCAGCCATCT GAAGAGCAAA GCAAGCCTCT TGAAAAACCG AAACAAAAAA 

551 AGAATCGCTG TTTCATGTGC AGGAAGAAAG TGGGACTTAC TGGGTTTGAA 

601 TGCCGGTGTG GAAATGTTTA CTGTGGTGTA CACCGTTACT CAGATGTACT 

651 CAATTGCTCT TACAATTACA AAGCCGATGC TGCTGAGAAA ATCAGAAAAG 

7 01 AAAATCCAGT AGTTGTTGGT GAAAAGATCC AAAAGATTTG AACTCCTGCT 

751 GGAATACAAA ATTCTTGAGC ATCTGCAAAC TAAAAATTGA CTTGAGGTTT 

801 TTTTTTTCCT AGTCATTGGG AATGTAGAGC AGTGTATCTT GCATGTCATC 

851 GGAAGAATAG ATTTTTGTTT TGGTTTTGTT TTGAAAATGA CTCTGAACAT 

901 TTATTTCCAT TGCAATTTCT GTGGCTGAGG AGACTTAAAC TTTACAAGTA 

951 TTATCCTTTT AAGATCATTT TAATTTTAGT TGAGTGCAGA GGGCTTTTAT 

1001 AACAAACGTG CAGAAATTTT GGAGGGCTGT GATTTTTCCA GTATTAAACA 

1051 TGCATGCATT AATCTTGCAG TTTATTTTCT CATTATGTAT GTATATATCG 

1101 CTTTTCTCTG CAGCACGATT TCTCTTTTGA TAATGCCCTT TAGGGCACAA 

1151 CTAGTTATCA GTAACTGAAT GTATCTTAAT CATTATGGCT GCTTCTGTTT 

1201 TTTCATTAAC AAAGGTTATT CATATGTTAG CATATAGTTT CTTTGCACCC 

1251 ACTATTTATG TCTGAATCAT TTGTCACAAG AGAGTGTGTG CTGATGAGAT 

1301 TGTAAGTTTG TGTGTTTAAA CTTTTTTTTG AGCGAGGGAA GAAAAAGCTG 

1351 TATGCATTTC ATTGCTGTCT ACAGGTTTCT TTCAGATTAT GTTCATGGGT 

1401 TTGTGTGTAT ACAATATGAA GAATGATCTG AAGTAATTGT GCTGTATTTA 

1451 TGTTTATTCA CCAGTCTTTG ATTAAATAAA AAGGAAAACC AGAAAAAAAA 
1501 AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 
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ORF from 115 bp to 738 bp; peptide length: 208 
Category: strong similarity to known protein 



1 MAQETNHSQV PMLCSTGCGF YGNPRTNGMC SVCYKEHLQR QNSSNGRISP 
51 PATSVSSLSE SLPVQCTDGS VPEAQSALDS TSSSMQPSPV SNQSLLSESV 
101 ASSQLDSTSV DKAVPETEDV QASVSDTAQQ PSEEQSKPLE KPKQKKNRCF 
151 MCRKKVGLTG FECRCGNVYC GVHRYSDVLN CSYNYKADAA EKIRKENPVV 
201 VGEKIQKI 

BLASTP hits 
Entry ATF7H19_1 from database TREMBLNEW : 

gene: "F7H19.10"; product: "putative protein"; Arabidopsis thaliana DNA 
chromosome 4, BAC clone F7H19 (ESSAII project) >TREMBL : ATT 1 2 H 1 7_2 1 
gene: "T12H17 . 210 " ; product: "predicted protein"; Arabidopsis thaliana 
DNA chromosome 4, BAC clone T12H17 (ESSAII project) 
Score = 206, P = 2.1e-24, identities = 51/146, positives = 77/146 

Entry PVPVPR3A_1 from database TREMBL: 

gene: "PVPR3"; P. vulgaris PVPR3 protein mRNA, complete cds. 

Score = 237, P = 4.9e-20, identities = 50/136, positives = 73/136 

Entry AF062072_1 from database TREMBL : 

gene: "ZNF216"; product: "zinc finger protein 216"; Homo sapiens zinc 
finger protein 216 (ZNF216) gene, complete cds. 

Score - 591, P = 1.6e-57, identities = 124/215, positives = 147/215 



Alert BLASTP hits for DKFZphf br2_16f 21 , frame 1 

TREMBL:AF062071_1 product: "zinc finger protein ZNF216"; Mus musculus 
zinc finger protein ZNF216 mRNA, complete cds., N = 1, Score = 590, P = 
2.1e-57 

TREMBLNEW:AB001773_1 gene: "pem-6"; product: "PEM-6"; Ciona savignyi 
pem-6 (posterior end mark 6) mRNA, complete cds., N = 1, Score = 421, P 
= 1.7e-39 



>TREMBL:AF062071_1 product: "zinc finger protein ZNF216"; Mus musculus zinc 
finger protein ZNF216 mRNA, complete cds. 
Length = 213 

HSPs: 



Score 


= 590 


(88.5 bits). Expect = 2.1e-57, P = 2.1e-57 




Identities = 


= 123/213 (57%), Positives = 146/213 (68%) 




Query : 


1 


MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPAT SVSS 


57 






MAQETN + PMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQ +S GR+SP T S S 




Sbjct: 


1 


MAQETNQTPGPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQQNS-GRMSPMGTASGSNSP 


59 


Query: 


58 


LSESLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSE — SVASSQLDSTSVDKAVP 


115 






S + S VQ D + + A STS + PV+ + + ++ S+ D + K 




Sbjct: 


60 


TSDSASVQRADAGLNNCEGAAGSTSEKSRNVPVAALPVTQQMTEMSISREDKITTPKT-E 


113 


Query : 


116 


ETEDVQASVSDTAQQPSEEQS--KPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVH 


173 






+E V S + QPS QS K E PK KKNRCFMCRKKVGLTGF+CRCGN++CG+H 




Sbjct: 


119 


VSEPVVTQPSPSVSQPSSSQSEEKAPELPKPKKNRCFMCRKKVGLTGFDCRCGNLFCGLH 


178 


Query: 


174 


RYSDVLNCSYNYKADAAEKIRKENPVVVGEKIQKI 208 








RYSD NC Y+YKA+AA KIRKENPVVV EKIQ+I 




Sbjct: 


179 


RYSDKHNCPYDYKAEAAAKIRKENPVVVAEKIQRI 213 





Pedant information for DKFZphfbr2_16f 21, frame 1 



Report for DKFZphf br2_16f 21 . 1 



[LENGTH] 208 

[MW] 22541.23 

[pi] 6.80 

[HOMOL] TREMBL:AF062072_1 gene: "ZNF216"; product: "zinc finger protein 216"; Homo 
sapiens zinc finger protein 216 (ZNF216) gene, complete cds. 9e-57 

[PIRKW] zinc 8e-13 

[PIRKW] zinc finger 8e-13 
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[PIRKWJ fusion protein 8e-13 

[SUPFAM] unassigned ubiquitin-related proteins 8e-13 

[SUPFAM] ubiquitin homology 8e-13 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 4 

[KW] Irregular 

[KW] LOWCOMPLEXITY 7.21 % 



SEQ MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPATSVSSLSE 

SEG 

PRD ccccccccccccccccccccccccccccccchhhhhhhhhhccccccccccccccccccc 

SEQ SLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSESVASSQLDSTSVDKAVPETEDV 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ QASVSDTAQQPSEEQSKPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVHRYSDVLN 

SEG 

PRD cccccccccccccccccccccccccccceeecccccccceeecccccccccccccccccc 

SEQ CSYNYKADAAEKIRKENPVVVGEKIQKI 

SEG 

PRD ccchhhhhhhhhhhhhcccccccccccc 



Prosite for DKFZphfbr2_16f21. 1 



PS00001 


6->10 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


42->46 


ASN 


~GLYCOSYLATION 


PDOC00001 


PS00001 


92->96 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00001 


180->184 


ASN" 


GLYCOSYLATION 


PDOC00001 


PS00006 


57->61 


CK2~ 


PHOSPHO SITE 


PUOCC0306 


PS00006 


70->74 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


76->80 


CK2~ 


"PHOSPHO SITE 


PDOCC0006 


PS00006 


103->107 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


108->112 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


123->127 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


159->163 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00008 


22->28 


MYRISTYL 


PDOC00008 


PS00008 


166->172 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphfbr2_16f 21 . 1) 
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DKFZphfbr2_16gl8 



group: cell cycle 

DKFZphfbr2_16gl8 . 3 encodes a novel 984 amino acid protein with similarity to centromeric 
proteins of yeasts. 

The novel protein shows similarity to S. pombe SPAC17A5 . 07c and the S. cerevisiae Smt4p 
suppressor of MIF2 gene. MIF2 encodes a centromeric protein with homology to the mammalian 
centromeric protein CENP-C. Mutations in MIF2 stabilise dicentric minichromosomes and confer 
high instability to chromosomes that bear a cis-acting mutation in element I of the yeast 
centromeric DNA (CDEI). Therefore the new protein should be involved in centrortier 
organisation, too. 

The new protein can find application in modulating/blocking the cell cycle and influencing the 
behavior of chromosomes, both natural and artificial in eukaryotic cells. 



similarity to KIAA0797 and yeast Smt4p 
complete cDNA, complete cds, EST hits 

the yeast Smt4 protein seems to be involved in centromer function 
and microtuble organisation 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 4826 bp 

Poly A stretch at pos . 4756, polyadenylation signal at pos . 4736 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 



GGGTCGAGGT 
TTTTCCTTTC 
ACAGCGCCTG 
AGCTCGGGCG 
AAAAAGTCAT 
ACCAGAGGAT 
AACGCTGGAC 
ATCTCTCTAG 
TTCCAGGTCA 
TGGGAACGGA 
GGAAGTTTGA 
ATCTGATGGC 
GTTATTTATC 
TCTGCAAAGC 
CATTTCTCTT 
GTAGAGGTTG 
TATTCTGATT 
AAGGCTTAGA 
AGTCAACAGA 
GAGTTTGAAA 
AGAAATTACA 
TTAACAGTCA 
GGTTCAACCA 
GATTTCTTCC 
AGCCTATTCT 
GAACCAATTG 
TTCAGAAATT 
AAAATGAGAG 
TGTGAATCTG 
GGAGAACATT 
ATTTTATATT 
GGTTGTGTTA 
CCTGAATGAG 
GGTTATGGAA 
CTTTTCTTCT 
AGAACACTCT 
TTGAACTACA 
ATTATGACGG 
GTTGTCTTGG 
AAAGTTCTTT 
GGTGTTGCTG 
AAACACAGAT 
GCGGTTGCTA 
GAAGTCAGGC 
ACCTACTAAG 
AAGAAGGAGA 
CTTATATTGG 



CGACGGTATC 
CCCTCCCCCT 
CAACTGAAAT 
ACGGCCATCT 
CTTCTGATTT 
GTCCATGTTC 
TCTCCCTTTG 
ACCATAAAAA 
TCACCAGAAA 
GTTAGGAAGA 
GTGATACAGA 
AGCCTAGAAT 
TGAAAGGGGC 
AGACTGCGCA 
TTAATATCTG 
TGATCATCTC 
CAAAAGTGGA 
AATAATTTAC 
ACAGACAAAA 
GGCCAAGTGA 
ACTAAACCTA 
GGAGTTGACT 
CTGAAACCGT 
CTGGTTGAGA 
AAGAGGACAT 
TTGTTTCCAG 
CTTAAGTTAC 
TACTTCTGAA 
TACAGATGTC 
TCCAGTATTA 
TACTTCTGTT 
CAATCACAAA 
ATTTCATTGC 
AAGTAAGGAT 
GGGTCTCTTC 
GTATTAAGCC 
CAATCCTGTT 
AAATAAGTAT 
GTTCAGGCAT 
TATTCATTAT 
TTGCTGAAGA 
GCGGCCAAGC 
CTCCCTTTCT 
ACACTGGACT 
GGGGGATTGG 
GTTTCTTAAT 
AGAAGGCATC 



GATAAGTTTT 
CCCTCTCCAA 
TTCAGCAGCG 
TCATCCGAAA 
ATCGGAGATA 
AATCACCACT 
CAGTGGGAAA 
TAAAAAACAT 
GGATACCCAG 
AAATACATAA 
CAACTTGCAA 
CTTATCAAAA 
TCACAACGAA 
CAATAAAGAA 
ATACTCAGCC 
GAACAGGAAA 
ACTCACTCTG 
CTGATTCTCA 
AAACAAGAAG 
AAACTATCAT 
CAAAAAGTGA 
TTGAGTAATG 
TGAGTACTCT 
AGGATGAGAA 
AATGAAGGGA 
TGATGAAGAA 
AATCTAAGCA 
TCAGCATTGT 
ATCTGAATTA 
TGCCTAGTAA 
TATATTGGTA 
AAAATATATT 
TAGTGGATAC 
GATAATCACA 
AGATTATCTT 
AGCAATCAAA 
TCACAGAGAG 
AATCAGTGGA 
TTCCTTTGTT 
TACTGTGTTT 
AATGAAGCTG 
CTACTTACAC 
ATTACATCTA 
TGTTCAGAAG 
GAGTAACTAA 
GATGTAATCA 
AGATGAACTT 



GCCGGAGGGG 
GGAGAAGATG 
7CATCACAGA 
AGAAAGATGT 
GTCCAAATTC 
GAAGCCTAAG 
ATCCGAGGGT 
AGTTATATTG 
GGACCCCACC 
TCAGAGCAAC 
TCTAAACCCT 
GTAAGACAGT 
AAACGAAGAA 
TGAAGACCTT 
GCAGAAACAA 
ATTTCCAGGA 
ATATTGTACT 
ATGACTCAAC 
CAGGATCCAA 
TTTTACTAAG 
CCACCAAAAG 
AATTCCATTG 
TGAGTTGAAT 
ACCAATCACT 
GGACCTGTTG 
AGACCGTGAG 
TAGAACTACC 
TGCCCATATA 
TGAGATGGAT 
AAATAAAAGG 
AAGATCCCAT 
CACACATTTA 
GTAAAAGGAG 
CAAGAGATTC 
ATCTAGTGAA 
AAGAATTGAA 
GAATTAGAGC 
TCAGAACCTC 
CAACTTGTTC 
AAATCAGTAT 
CTTCCTGCAG 
ATCCAGATGA 
TTGATTGTAT 
TGAAGATCTG 
TTGATTTTTA 
GTTGAACGAA 



TCCTGAGGTG 
GACAAGAGAA 
AGGAAAAAGG 
TAAATGCAAA 
AGAAGCTCAG 
GAATAAAGTC 
GTCCTGTTAC 
ACGAATGTCC 
TGTAACTGAG 
TTTCTTCATC 
CACAAGAGCT 
AGATGACAAT 
AGGATGATGG 
AACAGTGGAA 
GGATGTTAAA 
AGACAAAGAG 
TCTTTGGATA 
AATATCCACT 
AACTGCCTGA 
CTATCCTCAC 
TGCCTCTGCC 
ATATTGTGGG 
ACCATAGAAA 
GATCTCAGCT 
AACATAAAAG 
ACAACTAATG 
ATTGATTACA 
ATCCTGTCAT 
CTACAACTGG 
AGCTTCTAAA 
TTCAAGTGTC 
AAGCGGTTTG 
TCATGCTATT 
AGACCCAATT 
TTCATTTTCC 
GCTGAAAGAT 
TTTCTTACCC 
TCTTCAAAAG 
TTTCCCTGCT 
CTCAGCCCTC 
AAGCAAAGTA 
AGAATGGCGG 
ATCCTCCACC 
GAGTGTTTAG 
CCTTAAGTAT 
GTCACATTTT 
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2351 TAGTAGCTTT TTCTATAAAT 
2401 AAGATAATCC AAATCTTTCA 
2 4 51 ACATGGACTC GTCACATAAA 
2 501 TGTAAATGAG TCGTCTCACT 
2 551 TAGAAGAAGC TGTGTATGAA 
2 601 CAGGCTCAGC AGTCCCAAAG 
2 651 TACTACTTCG ACACTGTCTT 
2701 CGAATATGTC AGTACCAAAG 
2751 CTAGACTCCT TGAAAGCTGC 
2801 AGAGTATTTA GAGGTAGAGT 
2851 TCAGCAAAAC AAACATGGTG 
2 901 AATAGCAGTG ATTGTGGAGT 
2 951 CAAGGATCCT ATTGTTAACT 
3001 TTCCTCGTCA TGTAATAAAG 
3051 TTGAAACTTC ATTTACAGCA 
3101 TACAAACATG ACACAGATGT 
3151 GCATTTGTGT TAGCCAGCTC 
3201 ATAATAAGTC ATTGGAACAT 
3251 ATTGTTGGGA TCTCATAGAT 
3301 TTACTAGATA TAAATTAAAA 
3351 AATATGATTG GATTATGCAA 
3401 ATAATAAAAC TTACATGATC 
34 51 AGTTAAAGCC TCCCTGGTGC 
3501 GGTCACATCA TATTGTAATT 
3551 AATACTGTGT ATTTTTTAAA 
3601 TGCCATTTAC GGCATCCCTT 
3651 TAGGAAGATG ATAAAAATTC 
3701 ATTGCTAAAT ACGATTACTC 
3751 ATATGTGAGT ATCTTATAAT 
3801 ACAAAAAATT TTACCTGTGC 
3851 CTCAACTTGA GGTACTGCTA 
3901 TTATGTATAG TTTCTCTAAT 
3951 TTAACAAAGA AAACCCTCAG 
4 001 ATAATTTTAT AGCTCAGTTT 
4051 TCTCATTGCT TTTATATTTT 
4101 TCATGGAACT TAATTTTTTA 
4151 TGTGATAATG GTGGCATTAT 
4201 GTTATCAGGA GTATTTTGAG 
4251 TAAGAAAAAT GTTTTTTAAC 
4301 TTAGATTTAC ATTATAACTA 
4351 ATAAAGCTAG AAAGTCTGAA 
4401 TCAGTTAAGC CTCAGTATTC 
44 51 ATATTATTAA ATATATTTGT 
4501 AATTACATAT TTCATTCCCA 
4551 GTGACTATTG TTTTGTACAT 
4 601 TCTTGTGATT TCTTAATGTT 
4 651 GTCCTTTATC ATGTTTTGAA 
4701 TGTGCAAGTA ATGTTTTGAG 
4751 CAATTCAAAA AAAAAAAAAA 
4801 ATGATGATGA TGATGATGAT 



GCTTGACAAG AAAGGAAAAT AATTTAACAG 
ATGGCACAGA GAAGACATAA AAGAGTAAGA 
CATTTTTAAT AAAGATTACA TCTTTGTACC 
GGTATCTCGC AGTCATTTGT TTTCCATGGT 
GATTTTCCAC AAACTGTATC CCAGCAGTCC 
TGACAACAAA ACAATAGATA ATGATCTACG 
TGAGTGCAGA GGATTCCCAA AGTACCGAGT 
AAAATGTGTA AAAGGCCATG TATTCTTATA 
TTCTGTACGA AACACAGTTC AGAATTTACG 
GGGAAGTTAA ACTAAAAACT CATCGTCAAT 
GATCTATGCC CTAAAGTTCC TAAACAGGAC 
ATATTTATTG CAGTATGTGG AAAGCTTCTT 
TTGAACTTCC AATTCATTTG GAGAAGTGGT 
ACCAAACGGG AAGATATTCG AGAGCTCATC 
ACAGAAGGGC AGCAGTAGCT AGTTAATCTG 
TCTCTAAGAT TACTGGAAAG CCCCTTACCA 
ACAGAGAAGA AAATAACTTG CAGTAGTTTT 
TATTTAAAAT ATGTAGGACA CATTATTAGA 
GGAATGGGAA TGGGGGTGAT ATAGATAAAC 
TTTTATAAAT ATTTCATATT TTTCTGAGTA 
CAGCATATGT AATATGGGAA TGTTTTGTAG 
TGTACTTCCA CGTGACTGGG TGCTGAGGGG 
CAGCCCCAGT GCTTGTCAAA TTTGCTGACA 
CTATTCTTTG CAGCTCAAGC ATGCAGTATG 
AAAATAATTT AGTATCAAGG CTTCAGAAAA 
CTGTATGTAA CAAAAAGACA TTCATAATGT 
GCTCTTTTAA AGTGCAGCTT ATTATTCTCA 
TGCTTTTTTT TTTTCATTTC TTTTGATGTC 
TTAGTTCATT TGTTCAGGGT AAAATTTGAA 
AAAATAGTTT T T T AAA A AT T ATACATGTAG 
TATAAATATT CACTCACATT ATCACGGAAT 
ATAGAAGATA AAATTGGTGT CCTCATAACT 
TCCTATTTAT TAATGGGTAG AATTAAATAT 
ACCCAGTATT CATCTGCAAA GCCAGATTGC 
TAAATTGTAG CTTTTAGAGA CCTATGATCC 
TTAAATATTC AGGTAACAGT TCTGAATTCA 
ATATGATTAA ACACTTCAGA ACTTTCTAAT 
GGAGATATGA TTATATTGTA TTTTCTCAGA 
AATATTATTT TAATCTGTTT TAAGCATCTC 
CATAAAGCAG TGAAGCAAAG GCAAATTAAG 
CATTTTATTT CAAAATCATA CGAATCGGGG 
TTAGCTTTTG TTGATTTTGG CACTATCTTT 
TGTTTGGATA TTTCATATAA AGATGGCTAT 
ATTTGTGTGT GTTGGGGGGT ACTTTTAAAG 
CTAATTTTGG GAAACCAAGT CTATAAGACA 
TTTGTTTGTA TGTTTTTCAA AGATATCACT 
GATTGTTTAA AATTCATTTT CCTAAATTAA 
GATATCGGTG TTTTATATTA AACATATTTC 
AAAAACTTAT CGATACCGTC GACCTCGATG 
GTCGAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 138 bp to 3089 bp; peptide length: 984 
Category: similarity to known protein 



1 MDKRKLGRRP 
51 FRSSERWTLP 
101 LTNVLGTELG 
151 PHKSCYLSER 
201 LNSGSRGCDH 
251 TSLDKSTEQT 
301 KLSSLNSQEL 
351 NTIEKPILRG 
401 ETTNENESTS 
451 DLQLDFIFTS 



SSSEIITEGK 
LQWERSLRNK 
RKYIRTPPVT 
GSQRSKTVDD 
LEQESRNKDV 
KKQEDDSTIS 
TLSNATKSAS 
HNEGNQSLIS 
ESALLELPLI 
VYIGKI KGAS 



RKKSSSDLSE 
VISLDHKNKK 
EGSLSDTDNL 
NSAKQTAHNK 
KYSDSKVELT 
TEFERPSENY 
AGSTTETVEY 
AEPIVVSSDE 
TCESVQMSSE 
KGCVTITKKY 



IRKMLNAKEE 
HIRGCPVTSR 
QSEQLSSSSD 
EKRRKDDGIS 
LISRKTKRRL 
HQDPKLPEEI 
SNSIDIVGIS 
EGPVEHKSSE 
LCPYNPVMEN 
IKIPFQVSLN 



DVHVQSPLSK 
SSPERIPRVI 
GSLESYQNLN 
LLISDTQPED 
RNNLPDSQYC 
TTKPTKSDFT 
SLVEKDENEL 
ILKLQSKQDR 
ISSIMPSNEM 
EISLLVDTTH 
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501 LKRFGLWKSK 

551 EFI FLELHNP 

601 LSSKESSFIH 

651 QKQSSGCYSL 

701 LECLEEGEFL 

751 NNLTEDNPNL 

801 CFPWLEEAVY 

851 QSTESNMSVP 

901 THRQFSKTNM 

951 LEKWFPRHVI 



DDNHSKRSHA 
VSQREELKLK 
YYCVSTCSFP 
SITSNPDEEW 
NDVI I DFYLK 
SMAQRRHKRV 
EDFPQTVSQQ 
KKMCKRPCIL 
VDLCPKVPKQ 
KTKREDIREL 



ILFFWVSSDY 
DIMTEISIIS 
AGVAVAEEMK 
REVRHTGLVQ 
YLILEKASDE 
RTWTRHINIF 
SQAQQSQSDN 
ILDSLKAASV 
DNSSDCGVYL 
ILKLHLQQQK 



LQEIQTQLEH 
GELELSYPLS 
LKSVSQPSNT 
KLIVYPPPPT 
LVERSHIFSS 
NKDYIFVPVN 
KTTDNDLRTT 
RNTVQNLREY 
LQYVESFFKD 
GSSS 



SVLSQQSKSS 
WVQAFPLFQN 
DAAKPTYTFL 
KGGLGVTNED 
FFYKCLTRKE 
ESSHWYLAVI 
STLSLSAEDS 
LEVEWEVKLK 
PIVNFELPIH 



BLASTP hits 



Entry SPAC17A5_7 from database TREMBL: 

"SPAC17A5 . 07c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid cl7A5. Schizosaccharomyces pombe (fission 
yeast) 

Length = 652 

Score = 275 (96.8 bits]. Expect = 1.9e-29, Sum P(3) = 1.9e-29 
Identities = 56/120 (46%), Positives = 78/120 (65%) 



Entry S49947 from database PIR: 

SMT4 protein - yeast (Saccharomyces cerevisiae) 
Length = 1034 

Score = 163 (57.4 bits), Expect = 4.6e-16, Sum P(3) = 4.6e-16 
Identities = 46/159 (28%), Positives - 76/159 (47%) 



Entry YQG6_CAEEL from database SWISSPROT: 

HYPOTHETICAL 35.7 KD PROTEIN C41C4.6 IN CHROMOSOME II. 
Length = 342 

Score = 162 (57.0 bits), Expect = 6.1e-13, Sum P(3) = 6.1e-13 
Identities = 37/119 (31%), Positives = 62/119 (52%) 



Entry AB018340_1 from database TREMBL: 

gene: "KIAA0797"; product: "KIAA0797 protein"; Homo sapiens mRNA for 

KIAA0797 protein, partial cds . 

Score - 540, P - 1.9e-50, identities = 120/243, positives = 155/243 



Alert BLASTP hits for DKFZphfbr2_16gl8, frame 3 

TREMBL : ATT1 6L1_11 gene: "T16L1 . 110" ; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII 
project), N = 2, Score = 239, P = 2.1e-18 



>TREMBL : ATT1 6L11 1 gene: "T16L1.110"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII project) 
Length = 710 



HSPs: 



Score 


= 239 


(35.9 bits), Expect = 2.1e-18, Sum P(2) = 2.1e-18 




Identities 1 


= 51/135 (37%), Positives = 78/135 (57%) 




Query: 


683 


IVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLKYLILEKASDELVERSHIFSSFF 


742 






+VYP + V +D+E L+ F+ND IIDFY+KYL + S + R H F+ FF 




Sbjct: 


176 


LVYPQGEPDAVV-VRKQDIELLKPRRFINDTIIDFYIKYL-KNRISPKERGRFHFFNCFF 


233 


Query: 


743 


YKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIFNKDYIFVPVNESSHWYLAVICF 


802 






+ RK NL + P+ + ++RV+ WT+++++F KDYIF+P+N S HW L +IC 




Sbjct: 


234 


F RKLANLDKGTPSTCGGREAYQRVQKWTKNVDLFEKDYI FI PINCSFHWSLVI ICH 


289 


Query : 


803 


PWLEEAVYEDFPQTV 817 








P + + PQ V 




Sbjct : 


290 


PGELVPSHVENPQRV 30 4 




Score 


= 70 


(10.5 bits), Expect = 2.1e-18, Sum P(2) = 2 . le-18 




Identities - 


= 13/28 (46%), Positives = 15/28 (53%) 




Query : 


948 


PIHLEKWFPRHVIKTKREDIRELILKLH 975 








P HL WFP KR +1 EL+ LH 





Sbjct: 403 PSHLRNWFPAKEASLKRRNILELLYNLH 430 

Pedant information for DKFZphfbr2_16gl8, frame 3 



Report for DKFZphfbr2_16gl8 . 3 
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[LENGTH] 


984 




[MW] 


112265.80 




[pi] 


6.13 




[HOMOL] 


TREMBL : ABO 18 34 0 1 gene: "KIAA0797" ; product: "KIAA0797 protein"; Homo sapiens 


mRNA for KIAA0797 protein, partial 


cds. 8e-53 


[FUNCAT] 


03.22 cell cycle control and mitosis [S. cerevisiae, YIL031w] 9e-17 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YPL020c] 4e-06 


[BLOCKS] 


BL00494C Bacterial 


luciferase subunits proteins 


[PROSITE] 


AMIDATION 3 




[PROSITE] 


MYRISTYL 9 




[PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO SITE 


30 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


19 


[PROSITE] 


ASN_GLYCOSYLATION 


12 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


4.47 % 



SEQ MDKRKLGRRPSSSEIITEGKRKKSSSDLSEIRKMLHAKPEDVHVQSPLSKFRSSERWTLP 

SEG 

PRD ccccceeecccceeeeecccccccccchhhhhhhhhhccccccccccccccccccccchh 

SEQ LQWERSLRNKVISLDHKNKKHIRGCPVTSRSSPERIPRVILTNVLGTELGRKYIRTPPVT 

SEG 

PRD hhhhhhhhhheeeeccccceeeccccccccccccceeeeeeeeeccceeeccceeecccc 

SEQ EGSLSDTDNLQSEQLSSSSDGSLESYQNLNPHKSCYLSERGSQRSKTVDDNSAKQTAHNK 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ EKRRKDDGISLLISDTQPEDLNSGSRGCDHLEQESRMKDVKYSDSKVELTLISRKTKRRL 

SEG 

PRD hhhhcccceeeeecccccccccccccccccccccccccccccccccceeeeeehhhhhhh 

SEQ RNNLPDSQYCTSLDKSTEQTKKQEDDSTISTEFERPSENYHQDPKLPEEITTKPTKSDFT 

SEG 

PRD hccccccccccccccccchhhhhccccccccccccccccccccccccccccccccccccc 

SEQ KLSSLNSQELTLSNATKSASAGSTTETVEYSNSIDIVGISSLVEKDENELNTIEKPILRG 

SEG 

PRD ccccccccceeehhhhhhhcccccceeeeccceeeceeeccchhhhhhhhhhhccccccc 

SEQ HNEGNQSLISAEPIWSSDEEGPVEHKSSEILKLQSKQDRETTNENESTSESALLELPLI 

SEG xxxxxxxxxxxxxxxxx . . . 

PRD cccccceeeecceeeeecccccccccchhhhhhhhhhhhhhcccccccchhhhhccccce 

SEQ TCESVQMSSELCPYNPVMENISSIMPSNEMDLQLDFIFTSVYIGKIKGASKGCVTITKKY 

SEG 

PRD eecccccccccccccccccceeeccccchhhhhhheeeeeeeeeeeeccccceeeeeeee 

SEQ IKIPFQVSLNEISLLVDTTHLKRFGLWKSKDDNHSKRSHAILFFWVSSDYLQEIQTQLEH 

SEG 

PRD eeeeccccceeeeeeecccceeeeeeeecccccccccceeeeeeeeccchhhhhhhhhhh 

SEQ SVLSQQSKSSEFIFLELHNPVSQREELKLKDIMTEISIISGELELSYPLSWVQAFPLFQN 

SEG 

PRD hhhhccccceeeeeeeeccccccchhhhhhhhhheeeeeccceeeeccceeeeeeceeec 

SEQ LSSKESSFIHYYCVSTCSFPAGVAVAEEMKLKSVSQPSNTDAAKPTYTFLQKQSSGCYSL 

SEG 

PRD ccccccccceeeeecccccccchrihhhhhhhhhcccccccccccccceeeecccccccce 

SEQ SITSNPDEEWREVRHTGLVQKLIVYPPPPTKGGLGVTNEDLECLEEGEFTjNDVI TDFYLK 

SEG : 

PRD eeccccccceeeeeeccceeeeeeecccccccccccccchhhhhhhhccchhhhhhhhhh 

SEQ YLILEKASDELVERSHIFSSFFYKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ NKDYIFVPVNESSHWYLAVICFPWLEEAVYEDFPQTVSQQSQAQQSQSDNKTIDNDLRTT 

SEG xxxxxxxxxxx 

PRD cceeeeeccccccceeeeeeeccchhhhhhhccccchhhhhhhhhhcccccccccccccc 

SEQ STLSLSAEDSQSTESNMSVPKKMCKRPCILILDSLKAASVRNTVQNLREYLEVEWEVKLK 

SEG 

PRD cceeeeecccccceeeccccccccccceeeeeccccccccchhhhhhhhhhhhhhhhhhh 

SEQ THRQFSKTNMVDLCPKVPKQDNSSDCGVYLLQYVESFFKDPIVNFELPIHLEKWFPRHVI 
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SEG 

PRD hhhhhccccccccccccccccccccceeeeehhhhhhhcccceeecccccccccccchhh 

SEQ KTKREDIRELILKLHLQQQKGSSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccc 



Prosite for DKFZphfbr2_16gl8 . 3 
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PS00008 505->Sll MYRISTYL PDOC00008 

PS00008 622->628 MYRISTYL PDOC00008 

PS00008 693->699 MYRISTYL PDOC00008 

PS00009 6->10 AMIDATION PDOC00009 

PS00009 18->22 AMIDATION PDOC00009 

PS00009 109->113 AMIDATION PDOC00009 



(No Pfam data available for DKFZphf br2_16gl8 . 3) 
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DKFZphfbr2_16il2 



group: transmembrane protein 

DKFZphfbr2_16il2 encodes a novel 185 amino acid protein, with strong similarity to PUT2 
protein of Fugu rubripes. 

The novel protein contains 1 transmembrane region. 

PUT 2 is a Fugu rupies protein similar to the neural cell adhesion molecule LI (LI -CAM) a 
mitosis-specific chromosome segregation protein (SMC1) and the calcium channel alpha-1 subunit 
homolog (CCA1) . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



strong similarity to Fugu rubripes PUT2 

complete cDNA, complete cds, EST hits, 
TRANSMEMBRANE 1 

Sequenced by LMD 

Locus: /map="873 . 3/875 . 1 cR from top of Chrl linkage group" 
Insert length: 1552 bp 

Poly A stretch at pos . 1528, polyadenylation signal at pos. 1506 



1 GGGGGGGGAC AACTGGGTCT TTTGCGGCTG CAGCGGGCTT GTAGGCGTCC 

51 GGCTTTGCTG GCCCAGCAAG CCTGATAAGC ATGAAGCTCT TATCTTTGGT 

101 GGCTGTGGTC GGGTGTTTGC TGGTGCCCCC AGCTGAAGCC AACAAGAGTT 

151 CTGAAGATAT CCGGTGCAAA TGCATCTGTC CACCTTATAG AAACATCAGT 

201 GGGCACATTT ACAACCAGAA TGTATCCCAG AAGGACTGTT GTAGCAACTG 

251 CCTGCACGTG GTGGAGCCCA TGCCAGTGCC TGGCCATGAC GTGGAGGCCT 

301 ACTGCCTGCT GTGCGAGTGC AGGTACGAGG AGCGCAGCAC CACCACCATC 

351 AAGGTCATCA TTGTCATCTA CCTGTCCGTG GTGGGTGCCC TGTTGCTCTA 

4 01 CATGGCCTTC CTGATGCTGG TGGACCCTCT GATCCGAAAG CCGGATGCAT 

4 51 ACACTGAGCA ACTGCACAAT GAGGAGGAGA ATGAGGATGC TCGCTCTATG 

501 GCAGCAGCTG CTGCATCCCT CGGGGGACCC CGAGCAAACA CAGTCCTGGA 

551 GCGTGTGGAA GGTGCCCAGC AGCGGTGGAA GCTGCAGGTG CAGGAGCAGC 

601 GGAAGACAGT CTTCGATCGG CACAAGATGC TCAGCTAGAT GGGCTGGTGT 

651 GGTTGGGTCA AGGCCCCAAC ACCATGGCTG CCAGCTTCCA GGCTGGACAA 

701 AGCAGGGGGC TACTTCTCCC TTCCCTCGGT TCCAGTCTTC CCTTTAAAAG 

7 51 CCTGTGGCAT TTTTCCTCCT TCTCCCTAAC TTTAGAAATG TTGTACTTGG 

801 CTATTTTGAT TAGGGAAGAG GGATGTGGTC TCTGATCTCT GTTGTCTTCT 

851 TGGGTCTTTG GGGTTGAAGG GAGGGGGAAG GCAGGCCAGA AGGGAATGGA 

901 GACATTCGAG GCGGCCTCAG GAGTGGATGC GATCTGTCTC TCCTGGCTCC 

951 ACTCTTGCCG CCTTCCAGCT CTGAGTCTTG GGAATGTTGT TACCCTTGGA 

1001 AGATAAAGCT GGGTC1TCAG GAACTCAGTG TTTGGGAGGA AAGCATGGCC 

1051 CAGCATTCAG CATGTGTTCC TTTCTGCAGT GGTTCTTATC ACCACCTCCC 

1101 TCCCAGCCCC AGCGCCTCAG CCCCAGCCCC AGCTCCAGCC CTGAGGACAG 

1151 CTCTGATGGG AGAGCTGGGC CCCCTGAGCC CACTGGGTCT TCAGGGTGCA 

1201 CTGGAAGCTG GTGTTCGCTG TCCCCTGTGC ACTTCTCGCA CTGGGGCATG 

12 51 GAGTGCCCAT GCATACTCTG CTGCCGGTCC CCTCACCTGC ACTTGAGGGG 

1301 TCTGGGCAGT CCCTCCTCTC CCCAGTGTCC ACAGTCACTG AGCCAGACGG 

1351 TCGGTTGGAA CATGAGACTC GAGGCTGAGC GTGGATCTGA ACACCACAGC 

14 01 CCCTGTACTT GGGTTGCCTC TTGTCCCTGA ACTTCGTTGT ACCAGTGCAT 

14 51 GGAGAGAAAA TTTTGTCCTC TTGTCTTAGA GTTGTGTGTA AATCAAGGAA 

1501 GCCATCATTA AATTGTTTTA TTTCTCTCAA AAAAAAAAAA AAAAAAAATA 

1551 TC 



BLAST Results 



Entry HS808349 from database EMBL: 
human STS WI-11986. 
Score = 1716, P = 5.7e-73, identities = 364/378 

Entry HS487355 from database EMBL : 
human STS WI-13088. 
Score = 1358, P = 1.3e-56, identities = 274/277 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 81 bp to 635 bp; peptide length: 185 
Category: similarity to unknown protein 



1 MKLLSLVAVV GCLLVPPAEA NKSSEDIRCK CICPPYRNIS GHI YNQNVSQ 

51 KDCCSNCLHV VEPMPVPGHD VEAYCLLCEC RYEERSTTTI KVIIVIYLSV 

101 VGALLLYMAF LMLVDPLIRK PDAYTEQLHN EEENEDARSM AAAAASLGGP 

151 RANTVLERVE GAQQRWKLQV QEQRKTVFDR HKMLS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16il2 , frame 3 

TREMBL:AF026198_5 gene: "PUT2"; product: "putative protein 2"; Fugu 
rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, 
complete cds; putative protein 1 (PUT1) gene, partial cds; 
mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCA1) 
and putative protein 2 (PUT2) genes, partial cds, complete sequence., N 
= 1, Score = 655, P = 2.8e-64 

TREMBL:CER12C12_5 gene: "R12C12.6"; Caenorhabditis elegans cosmid 
R12C12., N - 1, Score - 225, P - le-18 



>TREMBL:AF026198_5 gene: "PUT2"; product: "putative protein 2"; Fugu 

rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete 
cds; putative protein 1 (PUT1) gene, partial cds; mitosis-specific 
chromosome segregation protein SMC1 homolog (SMC1) gene, complete cds; and 
calcium channel alpha-1 subunit homolog (CCA1) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 
Length = 187 

HSPs: 

Score = 655 (98.3 bits), Expect = 2.8e-64, P = 2.8e-64 
Identities = 124/163 (76%), Positives = 140/163 (85%) 

Query: 22 KSSEDIRCKCICPPYRNISGHIYNQNVSQKDCCSNCLHVVEPMPVPGHDVEAYCLLCECR 81 

KS +D+RCKCICPPYRNISGHIYN+N +QKDC NCLHVV+PMPVPG+DVEAYCLLCEC+ 
Sbjct: 31 KSFDDVRCKCICPPYRNISGHI YNRNFTQKDC--NCLHVVDPMPVPGNDVEAYCLLCECK 88 

Query: 82 YEERSTTTIKVII VI YLSVVGALLLYMAFLMLVDPLIRKPDAYTEQLHNEEENEDARSMA 141 

YEERST TI+V I +1 +LS VVGALLLYM FL+LVDPLIRKPD + LHNEE++ED + 
Sbjct: 89 YEERSTNTIRVTI IIFLSVVGALLLYMLFLLLVDPLIRKPDPLAQTLHNEEDSEDIQPQM 148 

Query: 142 AAAASLGGP-RANTVLERVEGAQQRWKLQVQEQRKTVFDRHKML 184 

+ G P R NTVLERVEGAQQRWK QVQEQRKT V FDRHKML 

Sbjct: 149 S GDPARGNTVLERVEGAQQRWKKQVQEQRKTVFDRHKML 187 



Pedant information for DKFZphfbr2_16il2, frame 3 



Report for DKFZphf br2_16il2 . 3 



[LENGTH] 185 

[MW] 20764.29 

[pi] 6.21 

[HOMOL1 TREMBL : AF02 61 98_5 gene: "PUT2"; product: "putative protein 2"; Fugu rubripes 



neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete cds; putative protein 1 
(PUT1) gene, partial cds; mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCA1) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 3e-68 
[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] ASNGLYCOSYLATION 3 

[KW] SIGNAL_PEPTIDE 21 
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[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 2.70 % 



SEQ MKLLSLVAVVGCLLVPPAEANKSSEDIRCKCICPPYRNISGHI YNQNVSQKDCCSNCLHV 

SEG 

PRD ccceeeeeeeeccccccccccccccceeeeeecccccccccceeeccccccccccceeee 

MEM 

SEQ VEPMPVPGHDVEAYCLLCECRYEERSTTTIKVIIVI YLSVVGALLLYMAFLMLVDPLIRK 

SEG 

PRD . eecccccccccchhhhhhhhhhhhccccceeeeeeehhhhhhhhhhhhhhhhhhhccccc 
MEM MMMMMMWMM1MMMMM>MMMMMMMMMMMM . . . 

SEQ PDAYTEQLHNEEENEDARSMAAAAASLGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDR 

SEG xxxxx 

PRD ccchhhhhhhhhcccchhhhhhhhhhccccccchhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM 



SEQ 


HKMLS 








SEG 










PRD 


hhc cc 








MEM 
















Prosite for DKFZphfbr2_ 
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(No Pfam data available (or DKFZphfbr2_16il2 . 3) 
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DKFZphfbr2_16k22 



group: brain derived 

DKFZphfbr2_16k22 encodes a novel 108 amino acid protein with very weak similarity to 
thioredoxin of Bacillus subtilis. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

weak similarity to thioredoxin 

complete cDNA, complete cds, genomic DNA? 
no EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2088 bp 

Poly A stretch at pos . 2065, no polyadenylation signal found 

1 AAAAGGAAGA AGGAAATAAG GATATTTCAA GGGTTACCAA AGTCGAGGAA 

51 AACTATTTTA AGAAGAAATC TGAATTATTT GTGCACATAG GTTGTAATAA 

101 TAGCATCTTG CATTAAATGG TGTTTTCTAG CTTACAAAGT GGATTCATAT 

151 ACACTATTGT AACTGACTCT CTACAAACTT GCAAGGTTAG CAAGACAAAT 

201 GGTATTTTAA GATAACAAAC TGAGACTCAA AAAAGGCAAG TAACTCGTTC 

251 TACTTCCCAA AGCCAGAAAG TGGCAAAATA GAAAATGGAT CCTGAATCTC 

301 CAACACCATG CAAACTAAGA GAGGGAATCC TCTGTAGAGG GAATGGAAGT 

351 AAAAAGGCAC AAGTGGTGAT GTCACCTTCT GAACAGAGAT GGAACTTTTC 

401 TTCCTCTGAG AAAAAAGAGA AAAGATAGTT TTAAGTGGCA AAAGAACATG 

451 AAGCAATGTG AGGTGAAGAA ACAGAAAAGA CTATGGATGG AATTCCTAGA 

501 TGTGAGATAC ACAAAGTTCC ATTTCAAAGA GAAATATCTA TAGATAGGCA 

551 TAAAGTTACA CACCTGAACT ACCAACTCTG AACCAGTAAC TCAAGAGATA 

601 TTTTGTGTGT CCCACAAGCC ATATGGCTCT GGGGACAAAT TATCTGAAAG 

651 TGCCCAATAA GAAAAATATT TGAGGAAGGG GAGTTGGTGA GTGAATGAAT 

701 TAAAGGACAT CAGAAAGATA CATTGACTGT TCTCCTTCCC AGGAAACAAA 

751 GTGGCTAAGT CAAAACAACG GGCAGCTGTG GGATAGCAAA GAAAAAAAAA 

801 CTTCCAGGCC CAGGTTCTAG TGAAAGCTAC TATGGAAGTT AGCCACTCAA 

851 CTTTAGAACC AGAGGCTTCT TTTCCTCCTC CCTTCTTATC TTTTCTAGTT 

901 TATAGCAAAT TTATATTGAG CCACTTATTC TTTCTGAATG CTAGTTCCCC 

951 TTTAGCATTT CTTTTTCTTC ATTCCCTTTG GACTGGCCCA ATGCTTTGGC 

1001 CCCTTATCAA AGCATTTTCT AAGAAACAGT CTGACAGCTC TAATTTGCAT 

1051 CTGGTTATGC AAGATGTGGT TAAGAACATG GACTCTGGAG GTAAATACAC 

1101 CTTGATTCCA ATTCATTCTC TCATTTATTC ATTCAGCAAA TATTTAGTGA 

1151 ACATCTAACA TGTGCTAGGC ACTGTTCTAG TTGCTGAGGA TACAGCTTCA 

1201 AACAAAATAA GGTCTCTGCA AGGATGCCTT CTCTTACCAC TCCTATTCAG 

1251 CGTAGTATTG GAAGTCCTGG CCAGGGCAAT CAGGCAAGAA AAAGAAATCA 

1301 AGGTCATCCA AATAGGAAGA GAGGAAGTCA AACTATCCCT GTTTACAGAC 

1351 AACATGATCC TACATCTAGA AAAAAACCCA TTGTCTTAGC CCAAAAGCTT 

14 01 CTTAGGCTGA TAAACAACTT CAGCAAAGTC TTAGGATACA AAATCCATGT 

14 51 GCAAAAAACA CTAGCATTCT TATACACCAA CAACAGTCAA GCCGAGATCC 

1501 AAATCAGGAA CAAACTCCTA TTCACAATTG CCACAAAAAC AATAGAACAG 

1551 GAAAACAGCT AACTAGGAAG GTGAAAGATC TCTACAAGGA GAACTACAAA 

1601 CCACTGCTCA CAGAAATCAG AGATGACACA TATAAATGGA AAAACATTCC 

1651 ATGATCATGG ATAGGAAGAA TGAATATTAC TGAAATGGCT ATACTGTCCA 

1701 AAGCAATTTA TAGATTCAAT GCTATTCCTA GTAAACTACC ATTGAGATTT 

1751 TTTACAGAAC TAGAAAAAAA AAAAACTATT TTAAGGCTGG GCGCAGTGGC 

18 01 TCTCACCTGT AATCCCAGCA CTTTGGGAGG CCGAGATGGG TGGATCACGA 

1851 GGTCAGGAGA TGGAAAACAT CCTGGCTAAC ATGGTGAAAC CCCGTCTCTA 

1901 CTAAAAATAC AAAAAAT TAG CCAGGCGTGG TGGTGGGCGC CTGTAATCCC 

1951 AGCTGCTCGG GAGGCTGAGG CAGGATAATG GTGTGAACCC GGGAGGCAGA 

2001 GCTTGCAGTG AGCTGAGATT GCACCACTGC ACTCCAGCCT GAGGGACAGA 

2051 GTGAGACTCC ATCTCAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 832 bp to 1155 bp; peptide length: 108 
Category: putative protein 



1 MEVSHSTLEP EASFPPPFLS FLVYSKFILS HLFFLNASSP LAFLFLHSLW 
51 TGPMLWPLIK AFSKKQSDSS NLHLVMQDVV KNMDSGGKYT LIPIHSLIYS 
101 FSKYLVNI 



Entry B37192 from database PIR: 

thioredoxin - Bacillus subtilis Score = 71 (25.0 bits), Expect = 0.04 
P = 0.039 

Identities = 16/49 (32%), Positives = 30/49 (61%) 



Alert BLASTP hits for DKFZphfbr2_16k22, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_16k22 , frame 1 



BLASTP hits 



Report for DKFZphfbr2_16k22 . 1 



[LENGTH] 
[MW] 



108 

12281.47 
8.06 



[pi] 



[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



MYRISTYL 1 

CAMP_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

ASN_GLYCOSYLATION 

Alpha_Beta 



1 
1 
1 

l 



SEQ 
PRD 



MEVSHSTLEPEASFPPPFLSFLVYSKFILSHLFFLNASSPLAFLFLHSLWTGPMLWPLIK 
ccccccccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhccccccchhhhh 



SEQ 
PRD 



AFSKKQSDSSNLHLVMQDVVKNMDSGGKYTLIPIHSLIYS FSKYLVNI 
hhhcccccccceeehhhhhhcccccccceeeeeccceeeecccccccc 



Prosite for DKFZphf br2_16k22 . 1 



PS00001 
PS00004 
PS00005 
PS00006 
PS00008 



36->40 
64->68 
63->66 
6->10 
86->92 



ASN_GLYCOS YLATION 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00008 



(No Pfam data available for DKFZphfbr2_16k22 . 1) 



148 



12/13/10, EAST Version: 2.4.2 
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DKFZphfbr2_16112 



PCT/IB00/01496 



group: transmembrane protein 

DKFZphfbr2_16112 encodes a novel 267 amino acid protein with similarity to gallus gallus 
putative transmembrane protein E3-16 

The novel protein contains one putative transmembrane domain. In chicken, E3-16 is expressed 
specifically in the inner ear. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neurons involved in perception of hearing. 



similarity to gallus putative transmembrane protein E3-16 
complete cDNA, complete cds, EST hits 

potental start at Bp 73 matchs kozak consensus PyCCataG 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

insert length: 2042 bp 

Poly A stretch at pos. 2024, polyadenylation signal at pos . 2003 



1 GGGGGCGGCG GAGGCAGAGA CCGAGGCTGC ACCGGCAGAG GCTGCGGGGC 
51 GGACGCGCGG GCCGGCGCAG CCATGGTGAA GATTAGCTTC CAGCCCGCCG 
101 TGGCTGGCAT CAAGGGCGAC AAGGCTGACA AGGCGTCGGC GTCGGCCCCT 
151 GCGCCGGCCT CGGCCACCGA GATCCTGCTG ACGCCGGCTA GGGAGGAGCA 
201 GCCCCCACAA CATCGATCCA AGAGGGGGGG CTCAGTGGGC GGCGTGTGCT 
251 ACCTGTCGAT GGGCATGGTC GTGCTGCTCA TGGGCCTCGT GTTCGCCTCT 
301 GTCTACATCT ACAGATACTT CTTCCTTGCG CAGCTGGCCC GAGATAACTT 
351 CTTCCGCTGT GGTGTGCTGT ATGAGGACTC CCTGTCCTCC CAGGTCCGGA 
401 CTCAGATGGA GCTGGAAGAG GATGTGAAAA TCTACCTCGA CGAGAACTAC 
451 GAGCGCATCA ACGTGCCTGT GCCCCAGTTT GGCGGCGGTG ACCCTGCAGA 
501 CATCATCCAT GACTTCCAGC GGGGTCTGAC TGCGTACCAT GATATCTCCC 
551 TGGACAAGTG CTATGTCATC GAACTCAACA CCACCATTGT GCTGCCCCCT 
601 CGCAACTTCT GGGAGCTCCT CATGAACGTG AAGAGGGGGA CCTACCTGCC 
651 GCAGACGTAC ATCATCCAGG AGGAGATGGT GGTCACGGAG CATGTCAGTG 
701 ACAAGGAGGC CCTGGGGTCC TTCATCTACC ACCTGTGCAA CGGGAAAGAC 
751 ACCTACCGGC TCCGGCGCCG GGCAACGCGG AGGCGGATCA ACAAGCGTGG 
801 GGCCAAGAAC TGCAATGCCA TCCGCCACTT CGAGAACACC TTCGTGGTGG 
851 AGACGCTCAT CTGCGGGGTG GTGTGAGGCC CTCCTCCCCC AGAACCCCCT 
901 GCCGTGTTCC TCTTTTCTTC TTTCCGGCTG CTCTCTGGCC CTCCTCCTTC 
951 CCCCTGCTTA GCTTGTACTT TGGACGCGTT TCTATAGAGG TGACATGTCT 
1001 CTCCATTCCT CTCCAACCCT GCCCACCTCC CTGTACCAGA GCTGTGATCT 
1051 CTCGGTGGGG GGCCCATCTC TGCTGACCTG GGTGTGGCGG AGGGAGAGGC 
1101 GATGCTGCAA AGTGTTTTCT GTGTCCCACT GTCTTGAAGC TGGGCCTGCC 
1151 AAAGCCTGGG CCCACAGCTG CACCGGCAGC CCAAGGGGAA GGACCGGTTG 
1201 GGGGAGCCGG GCATGTGAGG CCCTGGGCAA GGGGATGGGG CTGTGGGGGC 
1251 GGGGCGGCAT GGGCTTCAGA AGTATCTGCA CAATTAGAAA AGTCCTCAGA 
1301 AGCTTTTTCT TGGAGGGTAC ACTTTCTTCA CTGTCCCTAT TCCTAGACCT 
1351 GGGGCTTGAG CTGAGGATGG GACGATGTGC CCAGGGAGGG ACCCACCAGA 
1401 GCACAAGAGA AGGTGGCTAC CTGGGGGTGT CCCAGGGACT CTGTCAGTGC 
1451 CTTCAGCCCA CCAGCAGGAG CTTGGAGTTT GGGGAGTGGG GATGAGTCCG 
1501 TCAAGCACAA CTGTTCTCTG AGTGGAACCA AAGAAGCAAG GAGCTAGGAC 
1551 CCCCAGTCCT GCCCCCCAGG AGCACAAGCA GGGTCCCCTC AGTCAAGGCA 
1601 GTGGGATGGG CGGCTGAGGA ACGGGGCAGG CAAGGTCACT GCTCAGTCAC 
1651 GTCCACGGGG GACGAGCCGT GGGTTCTGCT GAGTAGGTGG AGCTCATTGC 
1701 TTTCTCCAAG CTTGGAACTG TTTTGAAAGA TAACACAGAG GGAAAGGGAG 
1751 AGCCACCTGG TACTTGTCCA CCCTGCCTCC TCTGTTCTGA AATTCCATCC 
1801 CCCTCAGCTT AGGGGAATGC ACCTTTTTCC CTTTCCTTCT CACTTTTGCA 
1851 TGTTTTTACT GATCATTCGA TATGCTAACC GTTCTCAGCC CTGAGCCTTG 
1901 GAGAGGAGGG CTGTAACGCC TTCAGTCAGT CTCTGGGGAT GAAACTCTTA 
1951 AATGCTTTGT ATATTTTCTC AATTAGATCT CTTTTCAGAA GTGTCTATAG 
2001 AACAATAAAA ATCTTTTACT TCTGAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 
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Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation 
using cDNA library subtraction. Molecular cloning and 
characterization of a gene belonging to a novel multigene 
family of integral membrane proteins. 



Peptide information for frame 1 



ORF from 73 bp to 873 bp; peptide length: 267 
Category: similarity to known protein 



1 MVKISFOPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGGSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADIIHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFVVE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16112, frame 1 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 

E3-16)., N = 1, Score = 573, P = 1.4e-55 

SWISSNEW : ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN) . , N = 
1, Score = 559, P = 4.2e-54 

SWISSNEW: ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN)., N = 1, 
Score - 452, P - 9.1e-43 



>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length =262 

HSPs: 



Score - 573 (86.0 bits), Expect = 1.4e-55, P = 1.4e-55 
Identities = 118/264 (44%), Positives = 175/264 (66%) 



Query : 


1 


MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 


60 






MVK+SF A+A + A+K ++ ++L+ P + + P+ G C+ 




Sbjct: 


1 


MVKVSFNSALA— HKEAANKEEENS QVLILPP-DAKEPEDVVVPAGHKRAWCW 


50 


Query: 


61 


-LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM- 


112 






+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 




Sbjct: 


51 


CMCFGLAFMLAGVILGGAYLYKYFAFQQ GGVYFCGIKYIEDGLSLPESGAQLKSARY 


107 


Query: 


113 


- ELEEDVK I YLDENYERIN VP VPQFGGGDPADIIHDFQRGLTAYHDISLDKCY VI ELNTT 


171 






+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT+ 




Sbjct: 


108 


HTIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTS 


167 


Query: 


172 


IVLPPRNFWELLMNVKRGTYLPQTYI IQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLR 


231 






+V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+ 




Sbjct: 


168 


VVMPPKNFLELLINIKAGT YLPQS YLIHEQMIVTDRIENVDQLGFFI YRLCRGKETYKLQ 


227 


Query: 


232 


RRATRRRINKRGAKNCNAI RHFENTFVVETLIC 264 








R+ + I KR A NC IRHFEN F +ETLIC 




Sbjct: 


228 


RKEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 





Pedant information for DKFZphfbr2_16112, frame 1 



Report for DKFZphfbr2_16112 . 1 



[LENGTH] 267 

[MW] 30223.94 
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WO 01/12659 



PCT/IB00/01496 



[pi] 

[ HOMOL ] 

le-49 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



8.16 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3-16) 

PRENYLATION 1 
MYRISTYL 5 

CAMP_PHOSPHO_SITE 2 

CK2_PHOSPHO_SITE 3 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 4 

ASN_GLYCOSYLATION 1 
TRANSMEMBRANE 1 
LOW COMPLEXITY 15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMMMMMMM 

SEQ LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . .xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAI RHFENT FVVETLICGVV 

SEG xx 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphfbr2_16112 . 1 



PS00001 


169->173 


ASN_GLYCOSYLATION 


PDOC03001 


PS00004 


187->191 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


232->236 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


49->52 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


209->212 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


227->230 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


235->238 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


30->34 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


110->114 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


209->213 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


119->127 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


52->58 


MYRISTYL 


PDOC00008 


PS00008 


5B->59 


MYRISTYL 


PDOC00008 


PS00008 


71->77 


MYRISTYL 


PDOC00008 


PS00008 


138->144 


MYRISTYL 


PDOC00008 


PS00008 


243->249 


MYRISTYL 


PDOC00008 


PS00294 


264->268 


PRENYLATION 


PDOC00266 



(No Pfam data available for DKFZphf br2_l 6112 . 1 ) 
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DKFZphfbr2_22f21 



PCT/IB00/01496 



group: brain derived 

DKFZphfbr2_22f21 encodes a novel 567 amino acid protein with weak similarity to C. elegans 
cosmide C18C4.5 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to C. elegans C18C4.5 

EST HSAA6531/HSAA5273/ defines splice variant, or unspliced CDNA additional -180 Bp at 
position 250 

Sequenced by AGOWA 

Locus: /map="311.4 cR from top of Chrl4 linkage group" 
Insert length: 1910 bp 

Poly A stretch at pos. 1887, polyadenylation signal at pos . 1867 



1 TGGGCCCTTA GCAACGGCCT GGCGACGGTT TCCTTGCTGC TGCAGCCCCC 
51 GTCGGCTCCT CTTTTCCAGT CCTCCACTGC CGGGGCTGGG CCCGGCCGCG 
101 GGAAGGACCG AAGGGGATAC AGCGTGTCCC TGCGGCGGCT GCAAGAGGAC 
151 TAAGCATGGA TGGCAGCCGG AGAGTCAGAG CAACCTCTGT CCTTCCCAGA 
201 TATGGTCCAC CGTGCCTATT TAAAGGACAC TTGAGCACCA AAAGTAATGC 
251 TGCAGTAGAC TGCTCGGTTC CAGTAAGCAT GAGTACCAGC ATAAAGTATG 
301 CAGACCAACA ACGAAGAGAG AAACTCAAAA AGGAATTAGC ACAATGTGAA 
351 AAAGAGTTCA AATTAACTAA AACTGCAATG CGAGCCAATT ATAAAAATAA 
401 TTCCAAGTCA CTTTTTAATA CCTTACAAGA GCCCTCAGGC GAACCGCAAA 
451 TTGAGGATGA CATGTTAAAA GAAGAAATGA ATGGATTTTC ATCCTTTGCA 
501 AGGTCACTAG TACCCTCTTC AGAGAGACTA CACCTAAGTC TACATAAATC 
5 51 CAGTAAAGTC ATCACAAATG GTCCTGAGAA GAACTCCAGT TCCTCCCCGT 
601 CCAGTGTGGA TTATGCAGCC TCCGGGCCCC GGAAACTGAG CTCTGGAGCC 
651 CTGTATGGCA GAAGGCCCAG AAGCACATTC CCAAATTCCC ACCGGTTTCA 
701 GTTAGTCATT TCGAAAGCAC CCAGTGGGGA TCTTTTGGAT AAACATTCTG 
751 AACTCTTTTC TAACAAACAA TTGCCATTCA CTCCTCGCAC TTTAAAAACA 
801 GAAGCAAAAT CTTTCCTGTC ACAGTATCGC TATTATACAC CTGCCAAAAG 
851 AAAAAAGGAT TTTACAGATC AACGGATAGA AGCTGAAACC CAGACTGAAT 
901 TAAGCTTTAA ATCTGAGTTG GGGACAGCTG AGACTAAAAA CATGACAGAT 
951 TCAGAAATGA ACATAAAGCA GGCATCTAAT TGTGTGACAT ATGATGCCAA 
1001 AGAAAAAATA GCTCCTTTAC CTTTAGAAGG GCATGACTCA ACATGGGATG 
1051 AGATTAAGGA TGATGCTCTT CAGCATTCCT CACCAAGGGC AATGTGTCAG 
1101 TATTCCCTGA AGCCCCCTTC AACTCGTAAA ATCTACTCTG ATGAAGAAGA 
1151 ACTGTTGTAT CTGAGTTTCA TTGAAGATGT AACAGATGAA ATTTTGAAAC 
1201 TTGGTTTATT TTCAAACAGG TTTTTAGAAC GACTGTTCGA GCGACATATA 
1251 AAACAAAATA AACATTTGGA GGGGGAAAAA ATGCGCCACC TGCTGCATGT 
1301 CCTGAAAGTA GACTTAGGCT GCACATCGGA GGAAAACTCG GTAAAGCAAA 
1351 ATGATGTTGA TATGTTGAAT GTATTTGATT TTGAAAAGGC TGGGAATTCA 
14 01 GAACCAAATA AATTAAAAAA TGAAAGTGAA GTAACAATTC AGCAGGAACG 
1451 TCAACAATAC CAAAAGGCTT TGGATATGTT ATTGTCGGCA CCAAAGGATG 
1501 AGAACGAGAT ATTCCCTTCA CCAACTGAAT TTTTCATGCC TATTTATAAA 
1551 TCAAAGCATT CAGAAGGGGT TATAATTCAA CAGGTGAATG ATGAAACAAA 
1601 TCTTGAAACT TCAACTTTGG ATGAAAATCA TCCAAGTATT TCAGACAGTT 
1651 TAACAGATCG GGAAACTTCT GTGAATGTCA TTGAAGGTGA TAGTGACCCT 
1701 GAAAAGGTTG AGATTTCAAA TGGATTATGT GGTCTTAACA CATCACCCTC 
1751 CCAATCTGTT CAGTTCTCCA GTGTCAAAGG CGACAATAAT CATGACATGG 
1801 AGTTATCAAC TCTTAAAATC ATGGAAATGA GCATTGAGGA CTGCCCTTTG 
1851 GATGTTTAAT CTTCATTAAT AAATACCTCA AATGGCCAGT AAAAAAAAAA 
1901 AAAAAAAAAA 



BLAST Results 



Entry HS477360 from database EMBL: 
human STS WI-14643. 
Length = 418 
Minus Strand HSPs: 

Score = 1850 (277.6 bits), Expect = 2.5e-77, P = 2.5e-77 

Identities = 392/405 (96%), Positives = 392/405 (96%), Strand = Minus / 

Plus 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 156 bp to 1856 bp; peptide length: 567 
Category: similarity to unknown protein 



1 MDGSRRVRAT SVLPRYGPPC LFKGHLSTKS NAAVDCSVPV SMSTSIKYAD 
51 QQRREKLKKE LAQCEKEFKL TKTAMRANYK NNSKSLFNTL QEPSGEPQIE 
101 DDMLKEEMNG FSSFARSLVP SSERLHLSLH KSSKVITNGP EKNSSSSPSS 
151 VDYAASGPRK LSSGALYGRR PRSTFPNSHR FQLVISKAPS GDLLDKHSEL 
201 FSNKQLPFTP RTLKTEAKSF LSQYRYYTPA KRKKDFTDQR IEAETQTELS 
2 51 FKSELGTAET KNMTDSEMNI KQASNCVTYD AKEKI APLPL EGHDSTWDEI 
301 KDDALQHSSP RAMCQYSLKP PSTRKIYSDE EELLYLSFIE DVTDEILKLG 
351 LFSNRFLERL FERHIKQNKH LEGEKMRHLL HVLKVDLGCT SEENSVKQND 
401 VDMLNVFDFE KAGNSEPNKL KNESEVTIQQ ERQQYQKALD MLLSAPKDEN 
4 51 EIFPSPTEFF MPIYKSKHSE GVI IQQVNDE TNLETSTLDE NHPSISDSLT 
501 DRETSVNVIE GDSDPEKVEI SNGLCGLNTS PSQSVQFSSV KGDNNHDMEL 
551 STLKIMEMSI EDCPLDV 



BLASTP hits 



Entry CEC18C4_3 from database TREMBL: 

"C18C4.5"; Caenorhabditis elegans cosmid C13C4. 

Length = 1091 

Score = 98 (34.5 bits). Expect = 0.29, P = 0.25 
Identities = 105/470 (22%), Positives = 192/470 (40%) 



Alert BLASTP hits for DKFZphfbr2_22f 21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22f 21, frame 3 



Report for DKFZphf br2_22f 21 . 3 



[LENGTH] 567 

[MW] 64120.02 

[pi] 5.68 

[PROSITE] AMI DAT I ON 1 

[PROSITE] MYRISTYL 3 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 16 

[PROSITE] PKC_PHOSPHO_SITE 18 

[PROSITE] ASN_GLYCOSYLATION 4 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 1.23 % 



SEQ MDGSRRVRATSVLPRYGPPCLFKGHLSTKSNAAVDCSVPVSMSTSIKYADQQRREKLKKE 

SEG 

PRD cccccceeeeeeccccccccccccccccccceeeecccccccchhhhhhhhhhhhhhhhh 

SEQ LAQCEKEFKLTKTAMRANYKNNSKSLFNTLQEPSGEPQIEDDMLKEEMNGFSSFARSLVP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccceeecccccccchhhhhhhhhhhccccccceeecc 

SEQ SSERLHLSLHKSSKVITNGPEKNSSSSPSSVDYAASGPRKLSSGALYGRRPRSTFPNSHR 

SEG xxxxxxx 

PRD ccchhhhhhhhceeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ FQLVISKAPSGDLLDKHSELFSNKQLPFTPRTLKTEAKSFLSQYRYYTPAKRKKDFTDQR 

SEG 

PRD cceeeeeccccccccccccccccccccccccchhhhhhhhhhhhhccccccchhhhhhhh 

SEQ IEAETQTELSFKSELGTAETKNMTDSEMNIKQASNCVTYDAKEKIAPLPLEGHDSTWDEI 

SEG 

PRD hhhhhhhhhhhhhhccccccccccchhhhhhhccceeehhhhhhcccccccccccccccc 
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SEQ KDDALQHSSPRAMCQYSLKPPSTRKI YSDEEELLYLSFIEDVTDEILKLGLFSNRFLERL 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhh 

SEQ FERHIKQNKHLEGEKMRHLLHVLKVDLGCTSEENSVKQNDVDMLNVFDFEKAGNSEPNKL 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhccccccccccccccccccccceeeecccccccccccc 

SEQ KNESEVTIQQERQQYQKALDMLLSAPKDENEIFPSPTEFFMPIYKSKHSEGVIIQQVNDE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccceeeeecccc 

SEQ TNLETSTLDENHPSISDSLTDRETSVNVIEGDSDPEKVEISNGLCGLNTSPSQSVQFSSV 

SEG 

PRD ccccccccccccccccccccccccceeecccccccceeeeccccccccccccceeeeecc 

SEQ KGDNNHDMELSTLKIMEMSIEDCPLDV 

SEG 

PRD ccccccchhhhhhhhhhhhhccccccc 



Prosite for DKFZphfbr2_22f21 . 3 



psooooi 


81->85 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00001 


143- 


■>147 


asn" 


GLYCOSYLATION 


PDOC00001 


PSOOOOI 


262- 


->266 


asn" 


GLYCOS YLATION 


PDOC00001 


PSOOOOI 


422- 


•>426 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00004 


159- 


•>163 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




4->7 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


27->30 


PKC" 


"PHOSPHO" 


"SITE 


PDOC00005 


PSO0005 


45->48 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


122- 


■>125 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


132- 


•>135 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


178- 


■>181 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


202- 


■>205 


PKC" 


~PHOSPHO~ 


"site 


PDOC00005 


PS00005 


209- 


•>212 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PSOOOOS 


212- 


•>215 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


250- 


■>253 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


309- 


•>312 


PKC^ 


"PHOSPHO^ 


"site 


PDOC00005 


PS00005 


317- 


•>320 


PKC" 


"PHOSPHO" 


"STTE 


PDOC00005 


PS00005 


322- 


■>325 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


353- 


•>356 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


395- 


■>398 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


500- 


•>503 


PKC^ 


>HOSPHO~ 


"site 


PDOC00005 


PS00005 


539- 


•>542 


PKC" 


PHOSPHO" 


"site 


PDOC00005 


PS00005 


552- 


>555 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00006 


8? 


l->93 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


149- 


•>153 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


245- 


•>249 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


264- 


■>268 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


295- 


>299 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


328- 


•>332 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


337- 


•>341 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


390- 


>394 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


455- 


>459 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


481- 


>485 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


486- 


>490 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


494- 


>498 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


498- 


>502 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


500- 


>504 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


513- 


>517 


CK2 


PHOSPHO 


"site 


PDOC00006 


PS00006 


559- 


>563 


CK2 


PHOSPHO 


"site 


PDOC00006 


PS00008 


164- 


>170 


MYRISTYL 




PDOC00008 


PS00008 


256- 


>262 


MYRISTYL 




PDOC00008 


PS00008 


350- 


>356 


MYRISTYL 




PDOC00008 


PS00009 


167- 


>171 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphf br2_22f 21 . 3) 
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DKFZphfbr2_22hl3 



group: transmembrane protein 

DKFZphfbr2_22hl3 encodes a novel 520 amino acid protein, with similarity to Drosophila 
melanogaster EG:39E1.3. 

The protein contains an ATP/GTP A Prosite pattern (P-loop) . This loop interacts with one of 
the phosphate groups of a A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteins, such as ATP synthase alpha and beta subunits, Myosin heavy chains, Kinesin heavy 
chains and kinesin-like proteins, Dynamins and dynamin-like proteins, several kinases, DNA 
RNA helicases, GTP-binding elongation factors and the Ras family of GTP-binding proteins. 
Additionally, the novel protein contains one putative transmembran domain. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



AC004780_1, differences to predicted genmodel 
membrane regions: 1 

AC004780_1, differences to predicted genmodel 

complete cDNA, complete cds, EST hits 
on genomic level encoded by AC004780, 
differences to predicted genmodel! 
TRANSMEMBRANE 1 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2292 bp 

Poly A stretch at pos . 2272, polyadenylation signal at pos . 2255 



1 GGGGGAGGGA ACTGATCTCA GCTCGGGCCC GCGTTACATC CTCCTCCTCT 
51 TCTTCCTTCG GCCCAGCTTT CCTTAGGGGC TGCAACCCGG ACGCCGAGGC 
101 CGGTTTCGGA GTGGGGAGTG CCCATTTTCT CTCCTTCCCA CGTTCCTGGC 
151 CCCCAGACGC CATTTGCAGG CGGGTGGCTT GGGTCAGCCT CCCCGCCCCC 
201 ACCCGACTCC CGTCACGGGA GAGCGCACAC CGCGCCCCGA GAACCAATCA 
251 GCAGCCGCGT TAGGTAACCA TGTCTGAGTC TGGACACAGT CAGCCTGGAC 
301 TCTATGGGAT AGAGCGGCGG CGACGGTGGA AGGAGCCTGG CTCTGGTGGC 
351 CCCCAGAATC TCTCTGGGCC TGGTGGTCGG GAGAGGGACT ACATTGCACC 
401 ATGGGAAAGA GAGAGAAGGG ATGCCAGCGA AGAGACAAGC ACTTCCGTCA 
451 TGCAGAAAAC CCCCATCATC CTCTCAAAAC CTCCAGCAGA GCGGTCAAAA 
501 CAGCCACCAC CTCCAACAGC CCCTGCTGCC CCGCCTGCTC CAGCCCCTCT 
551 GGAGAAGCCC ATCGTTCTCA TGAAGCCACG GGAGGAGGGG AAGGGGCCTG 
601 TGGCCGTGAC AGGTGCCTCT ACCCCTGAGG GCACCGCCCC ACCACCCCCT 
651 GCAGCCCCTG CGCCACCCAA GGGGGAGAAG GAGGGGCAGA GACCCACACA 
701 GCCTGTGTAC CAGATCCAGA ACCGGGGCAT GGGCACTGCC GCACCAGCAG 
751 CCATGGACCC TGTCGTGGGT CAGGCCAAAC TACTGCCCCC AGAGCGCATG 
601 AAGCACAGCA TCAAGTTGGT GGATGACCAG ATGAATTGGT GTGACAGTGC 
851 CATCGAGTAC CTGTTGGATC AGACTGATGT GTTGGTGGTT GGTGTCCTGG 
901 GCCTCCAGGG GACAGGCAAG TCCATGGTCA TGTCATTGTT GTCAGCCAAC 
951 ACTCCAGAGG AGGACCAGAG GACTTATGTT TTCCGGGCCC AGAGCGCTGA 
1001 AATGAAGGAA CGAGGGGGCA ACCAGACCAG TGGCATCGAC TTCTTTATTA 
1051 CCCAAGAACG GATTGTTTTC CTGGACACAC AGCCCATCCT GAGCCCTTCT 
1101 ATCCTAGACC ATCTCATCAA TAATGACCGC AAACTGCCTC CAGAGTACAA 
1151 CCTTCCCCAC ACTTACGTTG AAATGCAGTC ACTCCAGATT GCTGCCTTCC 
1201 TTTTCACGGT CTGCCATGTG GTGATTGTTG TCCAGGACTG GTTCACAGAC 
1251 CTCAGTCTCT ACAGGTTCCT GCAGACAGCA GAGATGGTGA AGCCCTCCAC 
1301 CCCATCCCCC AGCCACGAGT CCAGCAGCTC ATCGGGCTCC GATGAAGGCA 
1351 CCGAGTACTA CCCCCACCTA GTCTTCTTGC AGAACAAAGC TCGCCGAGAG 
1401 GACTTCTGTC CTCGGAAGCT GCGGCAGATG CACCTGATGA TTGACCAGCT 
14 51 CATGGCCCAC TCCCACCTGC GTTACAAGGG AACTCTGTCC ATGTTACAAT 
1501 GCAATGTCTT CCCGGGGCTT CCACCTGACT TCCTGGACTC TGAGGTCAAC 
1551 TTATTCCTGG TACCCTTCAT GGACAGTGAA GCAGAGAGTG AAAACCCACC 
1601 AAGAGCAGGA CCTGGTTCCA GCCCACTCTT CTCCCTGCTG CCTGGGTATC 
1651 GTGGCCACCC CAGTTTCCAG TCCTTGGTGA GCAAGCTCCG GAGCCAAGTG 
1701 ATGTCCATGG CCCGGCCACA GCTGTCACAC ACGATCCTCA CCGAGAAGAA 
1751 CTGGTTCCAC TACGCTGCCC GGATCTGGGA TGGGGTGAGA AAGTCCTCTG 
1801 CTCTGGCAGA GTACAGCCGC CTGCTGGCCT GAGGCCAAGG AGAGGAATGT 
1851 CATGCAGGGG ACCTCCTGGG TCCGCAGTGT ACTGCGAGGG AGCACAGATG 
1901 TCCATCCCCC GCTGGGGTGG AGAGCGGCAG CAGGCCTGAT GGATGAGGGA 
1951 TCGTGGCTTC CCGGCCCAGA GACATGAGGT GTCCAGGGCC AGGCCCCCCA 
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2001 CCCTCAGTTG GGGCTGTTCC GGGGGTGACT GTGAGCGATC CCACCCCAAA 

2051 CCTGAGATGG GGTAGCCCGT CCTGTGTCCT CCACAGGGAC AAGCAGTGGG 

2101 AGGAGTCTGA ATGGTCACCA GGAAGCCCGG GCTCCATCTT GACCTCCTTT 

2151 TTCAGGGACA GGAGCAACAG GCCCCTCTTC CCTGACTCTA AGCCCTTCCC 

2201 TGTAAGGTGA GGCAGGGTCT GGAGAGCTCT TTATTGGAAC AGATCTGGTG 

2251 GTTCAAATAA ACACAGTCAT GCAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AC004780 from database EMBL : 

Homo sapiens chromosome 19, cosmld F17127, complete sequence. 
Score = 2S16, P = 0.0e+00, identities = 524/525 
15 exons Bp 8031-31789 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 270 bp to 1829 bp; peptide length: 520 
Category: similarity to unknown protein 
Prosite motifs: ATP_GTP_A (211-219) 



1 MSESGHSQPG 

51 DASEETSTSV 

101 MKPREEGKGP 

151 NRGMGTAAPA 

201 QTDVLVVGVL 

251 NQTSGIDFFI 

301 EMQSLQIAAF 

351 SSSSSGSDEG 

401 RYKGTLSMLQ 

451 SPLFSLLPGY 

501 RIWDGVRKSS 



LYGIERRRRW 
MQKTPIILSK 
VAVTGASTPE 
AMDPVVGQAK 
GLQGTGKSMV 
TQERI VFLDT 
LFTVCHVVIV 
TEYYPHLVFL 
CNVFPGLPPD 
RGHPSFQSLV 
ALAEYSRLLA 



KEPGSGGPQN 
PPAERSKQPP 
GTAPPPPAAP 
LLPPERMKHS 
MSLLSANTPE 
QPILSPSILD 
VQDWFTDLSL 
QNKARREDFC 
FLDSEVNLFL 
SKLRSQVMSM 



LSGPGGRERD 
PPTAPAAPPA 
APPKGEKEGQ 
IKLVDDQMNW 
EDQRTYVFRA 
HLINNDRKLP 
YRFLQTAEMV 
PRKLRQMHLM 
VPFMDSEAES 
ARPQLSHTIL 



YIAPWERERR 
PAPLEKPIVL 
RPTQPVYQIQ 
CDSAIEYLLD 
QSAEMKERGG 
PEYNLPHTY V 
KPSTPKPSHE 
IDQLMAHSHL 
ENPPRAGPGS 
TEKNWFHYAA 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_22hl3, frame 3 

TREMBL:AC004780_1 product: "F17127_l"; Homo sapiens chromosome 19, 

cosmid F17127, complete sequence., N = 2, Score = 1264, P = 1.3e-231 

TREMBL:CEY54E2A_1 gene: "Y54E2A.2"; Caenorhabditis elegans cosmid 
Y54E2A, N = 2, Score = 219, P = 1.4e-15 



>TREMBL:AC004780_1 product: "F17127_l"; Homo sapiens chromosome 19, cosmid 
F17127, complete sequence. 
Length = 528 



HSPs : 



Score = 1264 (189.6 bits), Expect = 1.3e-231, Sum P(2) = 1.3e-231 
Identities = 254/302 (84%), Positives = 264/302 (87%) 



Query: 46 ERERRDASEETSTSVMQKTPI ILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPRE 105 

E+ER D+ + S +Q+T + R + P + A APLEKPIVLMKPRE 

Sbjct: 39 EKER-DSDSDFSP — LQQTEGCQRRDKHFRHAENPHHPLKTSSRA- APLEKPIVLMKPRE 94 



Query: 106 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 165 

EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 
Sbjct: 95 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 154 

Query: 166 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLWGVLGLQGTGKSMVMSLLS 225 

VGQAKLLPPERMKHSIKLVDDQMMWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 
Sbjct: 155 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 214 



156 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Query: 226 ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 285 

ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINN 
Sbjct: 215 ANTPEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERI VFLDTQPI LSPSILDHLINN 274 

Query: 286 DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRFLQTAEMVKPSTP 345 

DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYR K ++ 

Sbjct: 275 DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRLWDLGCKCKSNSH 334 

Query: 346 SP 347 
SP 

Sbjct: 335 SP 336 

Score = 993 (149.0 bits]. Expect = 1.3e-231, Sum P(2) = 1.3e-231 
Identities = 189/189 (100%), Positives = 189/189 (100%) 



Query: 


332 


RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 


391 






RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 




Sbjct: 


340 


RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 


399 


Query: 


392 


DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 


451 






DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 




Sbjct: 


400 


DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 


459 


Query: 


452 


PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 


511 






PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 




Sbjct: 


4 60 


PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 


519 


Query: 


512 


LAEYSRLLA 520 








LAEYSRLLA 




Sbjct: 


520 


LAEYSRLLA 52 8 





Pedant information for DKFZphfbr2_22hl3, frame 3 



Report for DKFZphfbr2_22hl3 . 3 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 



520 

57650.31 
6.52 

TREMBL:AC004780 



F17127, complete sequence. 0.0 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[KM] 



1 product: "F17127_l"; Homo sapiens chromosome 19, cosmid 



[KW] 



ATP_GTP_A 1 
MYRISTYL 8 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 8 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 3 

ASN_GLYCOS YLATION 2 
TRANSMEMBRANE 1 
LOW COMPLEXITY 11.73 % 



SEQ MSESGHSQPGLYGI ERRRRWKEPGSGGPQNLSGPGGRERDYIAPWERERRDASEETSTSV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccceeeeehhhhhhhhhccccccee 

MEM 

SEQ MQKTPIILSKPPAERSKQPPPPTAPAAPPAPAPLEKPIVLMKPREEGKGPVAVTGASTPE 

SEG xxxxxxxxxxxxxxx 

PRD eeccceeecccccccccccccccccccccccccccceeeeeccccccccceeeecccccc 

MEM 

SEQ GTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPVVGQAKLLPPERMKHS 

SEG . . xxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeeeccccccccccccceeecceeecccchhhhh 

MEM 

SEQ IKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLSANTPEEDQRTYVFRA 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hhhhcccchhhhhhhhhhccccceeeeeecccccccchhhhhhhhccccchhhhhheeee 

MEM 

SEQ QSAEMKERGGNQTSGIDFFITQERIVFLDTQPILSPSILDHLINNDRKLPPEYNLPHTYV 

SEG 

PRD hhhhhhhcccccceeeeeeeecceeeeeeccccccccccccccccccccccccccccchh 

MEM 

SEQ EMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRFLQTAEMVKPSTPSPSHESSSSSGSDEG 

SEG xxxxxxxxxxxxxxxx . . . 
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PRD hhhhhhhhhhhhhhhheeeeeeeccchhhhhhhhhhhhhhhccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ TEYYPHLVFLQNKARREDFCPRKLRQMHLMIDQLMAHSHLRYKGTLSMLQCNVFPGLPPD 

SEG 

PRD cccccceeeehhhhhhhcccccchhhhhhhhhhhhhhhhhhccccccccccccccccccc 

MEM 

SEQ FLDSEVNLFLVPFMDSEAESENPPRAGPGSSPLFSLLPGYRGHPSFQSLVSKLRSQVMSM 

SEG 

PRD chhhhhheeeeeccccccccccccccccccccceeeccccccccchhhhhhhhhhhhhhh 

MEM 

SEQ ARPQLSHTILTEKNWFHYAARIWDGVRKSSALAEYSRLLA 

SEG 

PRD hhhhhhhheeeccchhhhhhhhhhhhcchhhhhhhhhccc 

MEM 



Prosite for DKFZphf br2_22hl3 . 3 



PS00001 


3C 


->34 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


251- 


>255 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


32 


->36 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


507- 


>511 


CAMP PHOSPHO SITE 


PDOC00004 


PSOO005 


180- 


>183 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


215- 


>218 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


491- 


>494 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


117- 


>121 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


228- 


>232 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


254- 


>258 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


277- 


>281 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


298- 


>302 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


355- 


>359 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


435- 


>440 


CK2 PHOSPHO SITE 


PDOC00C06 


PS00008 


2i 


->32 


MYRISTYL 


PDOC00008 


PS00008 


139- 


>145 


MYRISTYL 


PDOC00008 


PS00008 


153- 


>159 


MYRISTYL 


PDOC00008 


PS00008 


211- 


>217 


MYRISTYL 


PDOC00008 


PS00008 


214- 


>220 


MYRISTYL 


PDOC00008 


PS00008 


249- 


>255 


MYRISTYL 


PDOC00008 


PS00008 


356- 


>362 


MYRISTYL 


PDOC00008 


PS00008 


505- 


>511 


MYRISTYL 


PDOC00008 


PS00017 


211- 


>219 


ATP GTP A 


PDOC00017 



(No Pfam data available for DKFZphf br2_22hl 3 . 3) 
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DKFZphfbr2_22i4 



group: brain derived 

DKFZphfbr2_22i4 . 1 encodes a novel 228 amino acid protein with similarity to the N-terminus of 
human p52rIPK. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to Human P52rIPK N-terminus 
complete cDNA, complete cds, few EST hits 

function of P52rIPK, repressor of p58IPK protein kinase inhibitor 
upstream regulator of interferon induced proteins 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 4748 bp 

Poly A stretch at pos . 4726, polyadenylation signal at pos . 4709 



1 TGGGTCCGGT CCTAGGGTCA CACCCACCGC AGGGTCTGGC TTGGTACAGT 
51 TGGGTGCATG CAGAAGTAGG TGGAGCTGCT GTTGCAGCCT TGAGAGAGTT 
101 TTATTGTAAA ACTCTTGTAA TTTATAGTAA TCGGAGGGGA AAACACCTCT 
151 TCCTTTTAAT TGCTCTGAGG ACCGCTGCCA AAGAAACGCA GTAGATCCGC 
201 TCCCTCTTGG GGGCGGGGAG AAAGAACGGG TTGTGTCCGC CATGTTGGTG 
2 51 AAGTCAAGCG AAGGCGACTA GAGCTCCAGG AGGGCCAGTT CTGTGGGCTC 
301 TAGTCGGCCA TATTAATAAA GAGAAAGGGA AGGCTGACCG TCCTTCGCCT 
351 CCGCCCCCAC ATACACACCC CTTCTTCCCA CTCCGCTCTC ACGACTAAGC 
4 01 TCTCACGATT AAGGCACGCC TGCCTCGATT GTCCAGCCTC TGCCAGAAGA 
451 AAGCTTAGCA GCCAGCGCCT CAGTAGAGAC CTAAGGGCGC TGAATGAGTG 
501 GGAAAGGGAA ATGCCGACCA ATTGCGCTGC GGCGGGCTGT GCCACTACCT 
551 ACAACAAGCA CATTAACATC AGCTTCCACA GGTTTCCTTT GGATCCTAAA 
601 AGAAGAAAAG AATGGGTTCG CCTGGTTAGG CGCAAAAATT TTGTGCCAGG 
651 AAAACACACT TTTCTTTGTT CAAAGCACTT TGAAGCCTCC TGTTTTGACC 
701 TAACAGGACA AACTCGACGA CTTAAAATGG ATGCTGTTCC AACCATTTTT 
7 51 GATTTTTGTA CCCATATAAA GTCTATGAAA CTCAAGTCAA GGAATCTTTT 
801 GAAGAAAAAC AACAGTTGTT CTCCAGCTGG ACCATCTAAT TTAAAATCAA 
851 ACATTAGTAG TCAGCAAGTA CTACTTGAAC ACAGCTATGC CTTTAGGAAT 
901 CCTATGGAGG CAAAAAAGAG GATCATTAAA CTGGAAAAAG AAATAGCAAG 
951 CTTAAGAAGA AAAATGAAAA CTTGCCTACA AAAGGAACGC AGAGCAACTC 
1001 GAAGATGGAT CAAAGCCACG TGTTTGGTAA AGAATTTAGA AGCAAATAGT 
1051 GTATTACCTA AAGGTACATC AGAACACATG TTACCAACTG CCTTAAGCAG 
1101 TCTTCCCTTG GAAGATTTTA AGATCCTTGA ACAAGATCAA CAAGATAAAA 
1151 CACTGCTAAG TCTAAATCTA AAACAGACCA AGAGTACCTT CATTTAAATT 
1201 TAGCTTGCAC AGAGCTTGAT GCCTATCCTT CATTCTTTTC AGAAGTAAAG 
1251 ATAATTATGG CACTTATGCC AAAATTCATT ATTTAATAAA GTTTTACTTG 
1301 AAGTAACATT ACTGAATTTG TGAAGACTTG ATTACAAAAG AATAAAAAAC 
1351 TTCATATGGA AATTTTATTT GAAAATGAGT GGAAGTGCCT TACATTAGAA 
14 01 TTACGGACTT AAAAATTTTG CTAATAAATT GTGTGTTTGA AAGGTGTTTT 
14 51 TTGTTTTTGT CTTTTTAAAC TACTGTTAAA AGAACAGCTT ATGATAAGTA 
1501 ATATGTTTAA CTTAGAGAAG AATTTTTTCC TGTACCAAAG TTGGCATATT 
1551 GCATTCTAAA TAAGATGCTA AATAAGAGTT AACCAACATT CAACATGACC 
1601 TTAAAACTGC TGGGTTTTGT ATT AATTAAA TTATAATTGG CACTGTGATT 
1651 TGAAAAATTT ATAGAAAAAA AGGTACAGGG CAAGTTTTTA AATTAAAACT 
1701 TTCTATATTT TGTTTTACCA GTAAAAGTGA GCTTATCATG GCCTCTCTCA 
1751 TAAGAATGAT TTTAAAATAG GTTGTAAAAT ATTTTGAAAA TATTTGAATG 
1801 TGAAGTACCA TTGAGTCATC CAAACTAGGT AAGGCCTCAA GTACTTTAAA 
1851 CTAGTAAAAT CTAGTAGCTG ATAATATTCA CCTAAGTAAG TGTTGTAAAA 
1901 TAATTCAGAG TTCAGGACCT AGCTTAGATA AATGTATACT ACTCTTTTTC 
1951 TCATAGTAAA AATCTTACAT TTCCAACTTC AAAATTGGTG CTTCCATATT 
2001 TGTTGATAAC CAAAACTCCT AAGGTTTTTT GTTTTCTTTT TAACTACTTT 
2051 CCAAATGCAT ACTATACCTC AGAAATAGTG TATCAATATA GTGGGCTTTT 
2101 TTTTTCCTCT TCATAAACCC ACAGTAAAAT TTAATCACAG GAAACTACTT 
2151 ATATCTTCAC ACTTTGTATT GATAACTTAA AATGGCATCA GTTTATCTTA 
2201 GACATCAGCT TGCTTTTTAT CTCCTTTTTT AGTGAGTGAA ATAGAGCAAC 
2251 TAGCATGCCT GTGTTCCCAG CTACTTGGGA GGCTAAGGTG GGAAGATCAA 
2301 TTGAACCTAG GAGGTTGAGG CTATAGTGAG CTGTGATTGC ACGACTGCAC 
2351 TCCAGCCTGG GCAATGGAGT GAGACTCCTG TCTCTAAAAC AGCAACAACA 
2401 AAAATAAAGC AACCATAGTG CATAAGGGAA ATTAAATGTT CCCTATAGAA 
2451 ATATGTGTAT GTCTGTGATA GTGGTATGCA AATGCTAATT ATTTTATAAA 
2501 ATAAAAGTTC AGAACTATTC TTATCATTGC CACTTGAACA ATTAAAGGGT 
2551 TTGCTTTATT TCACTAATGT TTAATAGGAA CCCTTTGCTT CAAACAGCTT 
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2601 TGTTGAAATC ATGTAAAAAT TTGTTAATAG AGAATCAAGT TATTTAACTC 
2651 AACTTATTTA ATTCAAGCTT GTGATACTAA CATACAAAGG TAGCATAAAC 
2701 CAAGTCATAA ATTGCTGTAA TCTTTCCTGT AGAGTAATAG CTACTTCATG 
2751 ATTTTTTTAA AAATTTCATT TTTTTGCTAT TTAGGATTGC ATTTGCTTGG 
2801 CTCCTAGTAA CAATTCTTTT ACAGTATTAG CACTCTCTTT ACTAAGGAAT 
2851 GCCTCCCAAG GAAATGCAAA GGTAGGAAAA GTCTCTTAGA ATGCCCATGA 
2901 GGTATTTAAA ACAGATATTT ATGAAAATCT TTTTGTGAAT GTTATAAATC 
2951 TTGCTAGTTA TTTTATCTTT ATCTTAAGTA TTAGATGTAG TTCCTTGGAA 
3001 TTGTCATTAC ATATTTATTT TTTTCTAGTG TGGTTTCAAA TAACTTTTTG 
3051 CCAACATATA ATCATCATCA AACATTCACT GACCATATCT ATTTTATAAC 
3101 TCAAAATAAG TTGGACAAAT AATCATTTTA ATAAAAACTA TTTTTTCCAA 
3151 GTATAACCAC TGTCATGTGG TTCACCCTTC ACCCCAGATA CAAAACACTT 
32 01 ATTTGTGTAG CCCAGTTCCC ATCTACAGTA ATACCTTGAA ACCTTAATAA 
3251 ATTTTAAAAA TCATAAAAAT AAAATATTGT AAAATACAAC AAATTTTGGA 
3301 CAAGGTTACT TCATCTTCAT TCATTATTAC CTGACAGTAT TAAACTACTA 
3351 CTCAATAATT TTAGAGTAAA CTTTTCTGTG TTTTCCCCGT GATTTTCATT 
3401 GTGCTGTCCT GACAACATGC TCCAAACTCT TTGCATCAAA TTGTTTTATT 
34 51 AACATACATT TGTCTACCTT AAAACTAGCT TTATTCACAG AGAAAGACCT 
3501 AAAAGGAGTC TATTAAAATG CTGCTTTCAG TTTGATAGTT TTTTTTTTAA 
3551 TCACTCTGAC CATAAACTAA CTGAAATTAT AATGGATTTT TTTTCCTCTC 
3601 CCGGTCACAA CACAGATCTT CTGTTCATTT GTTCTCTGTC TACTGGGCAC 
3651 CAACCTCTAC AAAGAACCAG CCAAAGGCTA GGTACTTGAT ATAAAAAGGA 
3701 ATATTACATT ATTTTCTGCC CTCAAGTTGC TCTATCTCCT GAAAGAAACA 
3751 AGTAATATTT ATAATACAAT ATGATAAATG CTACAAAAGA AATAGCTGTA 
3801 AAGTCCTTTG GTAAATGCTG TTGAATTGGA ATTCAGTAAG AACTATAAAC 
3851 TGTAGACCTT TTTATAATCA AATGCTTTTG TCTTGAAACA AAACAGATTC 
3901 CTCCTTATAT TGACTTAGCA AAGGAGGTAC AAGGACATTG GCATTTGACC 
3951 TGAATTATGG TGTTTTATTG AATGAGCTAT AAGACAACAT TTTTACCCTT 
4001 TAAAATGAAC ACTGAACAAA TGTGTTAATG GTATCTTTGT TAAAAGGAAA 
4051 ACATAGCTAT AAATAAAATA CTACATCGAA ATCCAGCACT GGAGTTCATT 
4101 TGAAATTTGA TATTTTGTGT AAAGTAACAA ACCTATTAAC ACAGATTTTT 
4151 AAAATAACTC AGAATCGTAT AAAGCACTTT GGTACTTATT TGTTCTCTTT 
4201 TCCCTTACAT TCTGTGTGGT AGGTGGTATT ATCTCTGATT TACACATGAA 
4251 GACATCCTTG TTAATGCAAT TTATTTATTC ATTCGGGCAT TTACTGTGTG 
4301 CCAACTTGCA AAAGGAATAG AAATGTCTGT GATCTAGATA GTTCTAGATT 
4351 GAACATAGAT TTTCTGCCAA CAAATCCTCT CTGCTGTTCA CATTATCCTT 
44 01 TGTTTAACGT ATGAACCAGG TTACTAAAAT AGGATAAATC ATGTGTCTTA 
4451 GAATATGAAA ATAGTAAGGT CTTTGAGGTC ACTTGATCTT CTCTAAGTAG 
4501 ACTTTATAAT ATTGTGTTTT ATCTCATTTC TCAATATTAG AATACGGGTA 
4551 GATTTTAATT TTGCTATAAT ATAGGAAATG GTTCATCTTT GTACCAAAAT 
4 601 ATTGCATTCT TCTGATATTT AGACAGTTGG AAACTTTCTA AAATTGAGGA 
4 651 TTTTGTAGTG TATACTAAAT AATTGCATAT TCAAAAAAAT GTATTCTGAG 
4701 TATGGTGATA TTAAACATTT TTCCCCAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98107671: 

Regulation of interf eron-induced protein kinase PKR: 
modulation of P58IPK inhibitory function by a novel protein, 
P52rIPK 



Peptide information for frame 1 



ORF from 511 bp to 1194 bp; peptide length: 228 
Category: similarity to known protein 



1 MPTNCAAAGC ATTYNKHINI SFHRFPLDPK RRKEWVRLVR RKNFVPGKHT 

51 FLCSKHFEAS CFDLTGQTRR LKMDAVPTIF DFCTHIKSMK LKSRNLLKKN 

101 NSCSPAGPSN LKSNISSQQV LLEHSYAFRN PMEAKKRIIK LEKEIASLRR 

151 KMKTCLQKER RATRRWIKAT CLVKNLEANS VLPKGTSEHM LPTALSSLPL 
201 EDFKILEQDQ QDKTLLSLNL KQTKSTFI 

BLASTP hits 

Entry AF007393_1 from database TREMBL: 

product: "P52rIPK"; Homo sapiens P52rIPK mRNA, complete cds. 
Score = 166, P = 2.5e-ll, identities = 40/106, positives = 56/106 
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Alert BLASTP hits for DKFZphfbr2_22i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22i4 , frame 1 

Report for DKFZphfbr2_22i4 . 1 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

le-09 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KWJ 



228 

26259.94 
10.17 

TREMBL:AFQ07393 



1 product: "P52rIPK"; Homo sapiens P52rIPK mRNA, complete cds. 



MYRISTYL 1 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 4 

ASNGLYCOS YLATION 3 
All_AIpha 

LOW COMPLEXITY 7.02 % 



SEQ MPTNCAAAGCATTYNKHINISFHRFPLDPKRRKEWVRLVRRKNFVPGKHTFLCSKHFEAS 

SEG 

PRD cccccccccccccccccccceeeecccccchhhhhhhhhhhhhcccccceeehhhhhhhh 

SEQ CFDLTGQTRRLKMDAVPTIFDFCTHIKSMKLKSRNLLKKNNSCSPAGPSNLKSNISSQQV 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccceeeeccccchhhhhhhhhhhccccccccccccccccccchhh 

SEQ LLEHS YAFRNPMEAKKRI IKLEKEIASLRRKMKTCLQKERRATRRWIKATCLVKNLEANS 

SEG 

PRD hhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeecccccc 

SEQ VLPKGTSEHMLPTALSSLPLEDFKILEQDQQDKTLLSLNLKQTKSTFI 

SEG 

PRD cccccccccccccccccccccchhhhhhcccccccccccccccccccc 



Prosite for DKFZphfbr2_22i4 . 1 



PS00001 


19->23 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00001 


100->104 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00001 


114->118 


asn" 


"glycosylation 


PDOC30001 


PS00004 


160->164 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


68->71 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


88->91 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


147->150 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


163->166 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00006 


60->64 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


78->82 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00008 


9->l"5 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphfbr2_22i4 . 1) 
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DKFZphfbr2_22k3 



group: brain derived 

DKFZphfbr2_22k3 encodes a novel 538 amino acid protein with weak similarity to extensins. 
No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to extensins 

complete cDNA, complete cds, few EST hits 
CpG Island in 5' UTR complete cDNA 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2775 bp 

Poly A stretch at pos. 2755, polyadenylation signal at pos . 2718 



1 GGGGCTGCCC GCGCGCTCCA CGGTGCAGAG CTCTAAGCGC GCGGGCTGGC 
51 AGGCTGCGGC GCGTCAAGGT CAGCCTGGAG CTGGGTGGCG GCCTGCCTGG 
101 GGGCGGGGGA CCCTACTGGA GGCCCGGGCT GGGGCCTCCC AGCGCCTCGG 
151 CCATATTGAA TAGCTTCGAC TGGACCGTCT TTGTCTGCGA AGTCCTGTCC 
201 CAAGTTCCAG CCGCGTCCCT GGGGCCTGGG GCAGGAAGAG TCGCTGGCAG 
251 CCCGCGCGCC CCAACTTGGA GCTGGGACAC CACGTTTCCA GCTTGGAGTG 
301 GGCCTTGAGC CTTGGGACTG ACCTCGCCCC CGGCTCACGT AGGCATCCTG 
351 GAAATTGATT CCCCCAAGTC CTTGGTGGGG GAGCCGGACT TGGTCAAGAC 
401 TGTACTTGTT GCAGGCGAAG AGATTGGAGG CGTTTGGCTC GTCCCTGGCT 
451 AGGGAGGTGA GACTCTCCGG TCAGCGTTGC TGGAACTCCC CCCATCCAGT 
501 CCCTCCCTCA AGACTAAGGG CTACAGTAGT TTGTTGGGGC TCATTGCCCC 
551 CTCACCCCAG ATATCACCCT GGAGATCTTA AAGACTCTCG AGAAAAGCCA 
601 CGTGGGGGGC TGGTTCCCCT GGGGCTTCCT GCCGTCCCCC GACTGCCTCA 
651 TTCTTTGGAG CGTCCCCGAT GTCTGCAAAG ATGTGGATTT GGACGTCCTC 
701 GTGGAAGCCC TAAAGCCCGT GGGGACATTT AAGAAGATCG GCAAGGTGTT 
751 CCGCAAGGAG GAGGACTCCA CGGTGGGGAT GCTGCAGATC GGGGAGGACG 
801 TCGACTATTT GCTCATCCCC CGGGAGGTCA GGCTGGCTGG GGGCGTCTGG 
851 AGAGTCATCT CTAAGCCCGC CACCAAGGAA GCAGAATTTC GGGAGCGGCT 
901 GACCCAGTTC CTGGAAGAAG AGGGCCGCAC CCTGGAGGAC GTGGCCCGCA 
951 TCATGGAGAA GAGCACCCCG CACCCGCCCC AGCCCCCCAA AAAGCCCAAG 
1001 GAGCCCCGAG TGAGGAGGAG AGTGCAGCAG ATGGTGACTC CTCCGCCCCG 
1051 GCTGGTCGTG GGCACGTACG ACAGCAGCAA CGCCAGCGAC AGCGAGTTCA 
1101 GCGACTTCGA GACCTCCAGA GACAAGAGCC GCCAGGGCCC GCGGCGGGGC 
1151 AAGAAGGTGC GCAAAATGCC CGTCAGCTAC CTGGGCAGCA AGTTCCTGGG 
1201 AAGCGACCTG GAGAGTGAGG ATCATGAGGA ACTGGTCGAG GCCTTCCTCC 
1251 GGCGACAGGA GAAGCAGCCC AGCGCGCCGC CTGCCCGCCG CCGCGTCAAC 
1301 CTGCCAGTGC CCATGTTTGA GGACAACCTG GGGCCTCAGC TGTCCAAAGC 
1351 GGACAGGTGG CGGGAGTATG TCAGCCAGGT GTCCTGGGGG AAGCTGAAGC 
1401 GGAGGGTGAA GGGTTGGGCG CCGAGGGCGG GCCCCGGGGT GGGCGAGGCC 
1451 CGGCTGGCCT CCACCGCAGT GGAGAGCGCA GGGGTATCAT CGGCGCCAGA 
1501 GGGCACCAGC CCGGGGGATC GCTTGGGAAA CGCGGGAGAT GTTTGTGTGC 
1551 CCCAGGCTTC CCCTAGGCGA TGGAGGCCCA AGATCAACTG GGCCTCCTTT 
1601 CGGCGCCGCA GGAAGGAGCA GACAGCACCC ACAGGTCAGG GGGCAGACAT 
1651 CGAGGCTGAT CAGGGGGGAG AGGCTGCAGA TAGTCAAAGG GAAGAGGCCA 
1701 TAGCTGACCA GCGGGAAGGG GCTGCAGGTA ATCAGAGGGC TGGGGCCCCA 
1751 GCTGACCAGG GGGCAGAGGC TGCAGATAAT CAGAGGGAAG AGGCTGCAGA 
1801 TAATCAGAGG GCAGGGGCCC CAGCTGAGGA GGGGGCAGAG GCTGCAGATA 
1851 ACCAGAGGGA AGAGGCTGCA GATAATCAGA GGGCAGAGGC CCCAGCTGAC 
1901 CAGAGGTCAC AGGGCACAGA TAACCACAGG GAAGAGGCTG CAGATAATCA 
1951 GAGGGCGGAG GCCCCAGCTG ACCAGGGGTC AGAGGTTACA GATAATCAAA 
2001 GGGAAGAGGC CGTACATGAC CAGAGGGAAA GGGCCCCAGC TGTCCAGGGT 
2051 GCAGATAATC AGAGGGCACA GGCCCGGGCT GGCCAGAGGG CAGAGGCTGC 
2101 ACATAATCAG AGGGCAGGGG CCCCAGGTAT CCAGGAAGCT GAAGTCTCAG 
2151 CTGCCCAAGG GACCACAGGA ACAGCTCCAG GAGCCAGGGC CCGGAAACAG 
2201 GTCAAGACAG TGAGGTTCCA GACCCCTGGA CGCTTTTCGT GGTTTTGCAA 
2251 GCGCCGGAGA GCCTTCTGGC ACACTCCCCG GTTGCCAACC CTGCCCAAGA 
2301 GAGTCCCCAG GGCAGGAGAG GTCAGGAACC TCAGGGTGCT GAGGGCCGAG 
2351 GCCAGAGCAG AAGCTGAGCA GGGAGAGCAA GAAGACCAGC TGTGAGGTGA 
2401 GGGCTAGAGA CAGCCCACGG GCCCTCCCTC CAAGTGTGGG AGGGAGAGAT 
2 451 GCTCTGCCTC TGAACTTCAA AGTGGAGGTG GAGTGCTGGC CACGTCTCCA 
2501 CCTAACAACC CTCTTTATTC TCTTGTTAAA GTTTTGTTCA TGCTTTGATT 
2 551 TTTTTTTAAA TTTTTTAGAG ACAGGGTCTC ACTCTGTTGC CCAGGCTGGA 
2 601 GTGCAGTGGC ATGATCATAA CTCACTGCAG CCTCAAACTT CTGGCCTCAA 
2651 GTGATCCTCC TGCCTCGGCC TCCCAAAATG CTGGGATTAC AGATGTGAGC 
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2701 CACCACACAC ACCATCTGAT TAAAAAAAAA AAATACTGAT TCCCTGTAGC 
27 51 AACCCAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS164A7F from database EMBL: 

H. sapiens CpG island DNA genomic Msel fragment, clone 164a7, forward 
read cpgl64a7 . f tla . 
Score = 740, P = 3.0e-25, identities = 150/151 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 779 bp to 2392 bp; peptide length: 538 
Category: similarity to known protein 



1 MLQIGEDVDY LLIPREVRLA GGVWRVISKP ATKEAEFRER LTQFLEEEGR 
51 TLEDVARIME KSTPHPPQPP KKPKEPRVRR RVQQMVTPPP RLVVGTYDSS 
101 NASDSEFSDF ETSRDKSRQG PRRGKKVRKM PVSYLGSKFL GSDLESEDDE 
151 ELVEAFLRRQ EKQPSAPPAR RRVNLPVPMF EDNLGPQLSK ADRWREYVSQ 
201 VSWGKLKRRV KGWAPRAGPG VGEARLASTA VESAGVSSAP EGTSPGDRLG 
251 NAGDVCVPQA SPRRWRPKIN WASFRRRRKE QTAPTGQGAD IEADQGGEAA 
301 DSQREEAIAD QREGAAGNQR AGAPADQGAE AADNQREEAA DNQRAGAPAE 
351 EGAEAADNQR EEAADNQRAE APADQRSQGT DNHREEAADN QRAEAPADQG 
401 SEVTDNQREE AVHDQRERAP AVQGADNQRA QARAGQRAEA AHNQRAGAPG 
451 IQEAEVSAAQ GTTGTAPGAR ARKQVKTVRF QTPGRFSWFC KRRRAFWHTP 
501 RLPTLPKRVP RAGEVRNLRV LRAEARAEAE QGEQEDQL 



BLASTP hits 



Entry RNU67136_1 from database TREMBL: 

"A- kinase anchoring protein AKAP150"; Rattus norvegicus 
A-kinase anchoring protein AKAP150 mRNA, complete cds . Rattus 
norvegicus (Norway rat) 
Length = 714 

Score = 182 (64.1 bits), Expect = 1.2e-10, P = 1.2e-10 
Identities = 73/257 (28%), Positives = 104/257 (40%) 



Alert BLASTP hits for DKFZphfbr2_22k3, frame 2 



TREMBL : PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 

S-antigen gene, complete cds., N - 1 , Score = 178, P = 3.7e-ll 



>TREMBL: PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 
S-antigen gene, complete cds. 
Length = 285 



HSPs: 



Score = 178 (26.7 bits), Expect = 3.7e-ll, P = 3.7e-ll 
Identities = 60/217 (27%), Positives = 97/217 (44%) . 



Query: 269 INWASFRRRRKEQTAPTGQGA-DIEADQGGEAADSQRE-EAIADQ REGAAGNQRAGA 323 

+N + + + E G+G D E E +D+ E E I 0 E A N+ AG+ 

SbjCt: 4 7 LNGKNGKGNKYEDLQEEGEGENDDEEHSNSEESDNDEENEI IVGQDGSNEKAGSNEEAGS 106 

Query: 324 PADQGAEAADNQREEAADNQRAGAPAEEGA — EAADNQR EEAADNQRAEAPADQRS 377 

G+ E+A N++AG^ E G+ EA N+ EEA N++A + S 

SbjCt: 107 NEKAGSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGS 166 

Query: 378 QGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQAR— AG 435 

EEA N++A + + GS E+A +++ + G+ N++A + AG 

Sbjct: 167 NEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGS-NEKAGSNEEAG 225 



Query: 43 6 QRAEAAHNQRAGA PGIQEAEVSAAQGTTGTA-PGA 469 
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EA N+ AG+ G E + +G GT PG+ 
Sbjct: 226 SNEEAGSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGS 263 

Score = 173 (26.0 bits), Expect = 1.5e-10, P = 1.5e-10 
Identities = 51/190 (25%), Positives = 83/190 (43%) 



Query: 


279 


KEQTAPTGQ-GADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQRE 


337 






+E GQ G++ +A EA +++ A E A N++AG+ G+ E 




Sbjct: 


83 


EENEI IVGQDGSNEKAGSNEEAGSNEK AGSNEEAGSNEKAGSNEKAGSNEEAGSNE 


138 


Query: 


338 


EAADNQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPA 


397 






T~*7> Ifct 1 li 1 T-l /"■I 1 7\ »T L_l_7Y_i_ 1 O PH^ IlT 1 ■ "ft 1 

EA N+ AG+ E G+ h+A N ++A + + b LEA N + +A + 




Sbjct: 


139 


EAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNE 


198 


Query: 


398 


DQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVS 


457 






GS EEA +++ + G++ + AG EA N+ AG+ EA 




Sbjct: 


199 


KAGSNEKAGSNEEAGSNEKAGSNEEAGSNEE AGSNEEAGSNEEAGSNEGSEAGTE 


253 


Query: 


458 


AAQGTTGTAPG 4 68 








+GT G G 




Sbjct: 


254 


GPKGTGGPGSG 264 




Score 


= 147 


(22.1 bits), Expect = 1.6e-07, P = 1.6e-07 




Identities = 


= 40/168 (23%), Positives = 70/168 (41%) 




Query : 


288 


GADIEADQGGEAADSQR — EEAI ADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRA 


345 






G++ EA +A +++ A E A N+ AG+ + G+ E+A N++A 




Sbjct : 


111 


GSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKA 


170 


Query: 


346 


GAPAEEGAEAADNQREEAADNQRAEAPADQRSOGTDNHREEAADNQRAEAPADQGSEVTD 


405 






G+ E G+ EEA N++A + S EEA N++A + + GS 




Sbjct: 


171 


GSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEEA 


230 


Query: 


406 


NQREEAVH DQR- - ERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPG I 4 51 








EEA ++ + G + + G E +HN++ I 




Sbjct: 


231 


GSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGSGGEHSHNKKKSKKSI 278 




Score 


- 101 


(15.2 bits), Expect = 2.5e-02, P = 2.4e-02 




Identities ■ 


-■ 26/100 (26%), Positives = 47/100 (47%) 




Query: 


281 


QTAPTGQGADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAA 


340 






+ A + + A + G EEA ++++ G+ N++AG+ G+ E+A 




Sbjct: 


162 


EKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGS — NEKAGSNEKAGSNEEAGSNEKAG 


219 


Query: 


341 


DNQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGT 380 








N+ AG+ E G+ EEA N+ +EA + +GT 




Sbjct: 


220 


SNEEAGSNEEAGSNEEAGSMEEAGSNEGSEA-GIEGPKGT 258 





Pedant information for DKFZphfbr2_22k3, frame 2 



Report for DKFZphf br2_22k3 . 2 



[LENGTH] 538 

[MW] 59402.19 

[pi] 8.72 

[HOMOL] TREMBL:AF037364_1 gene: "MAI", 
Homo sapiens paraneoplastic neuronal antigen 

[PROSITE] AMIDATION 1 

[PROSITE] MYRISTYL 12 

[PROSITE] CK2_PHOSPHO_SITE 11 

[PROSITE] PKC_PHOSPHO_SITE 6 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 18.03 % 



product: "paraneoplastic neuronal antigen MAI"; 
MAI (MAI ) mRNA, complete cds . 4e-10 



SEQ MLQIGEDVDYLLIPREVRLAGGVWRVISKPATKEAEFRERLTQFLEEEGRTLEDVARIME 

SEG 

PRD cccccccccccccccccccccceeeeeeecccchhhhhhhhhhhhhhhccchhhhhhhhh 

SEQ KSTPHPPQPPKKPKEPRVRRRVQQMVTPPPRLWGTYDSSNASDSEFSDFETSRDKSRQG 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hcccccccccccccccchhhhhhhhhccccceeeeecccccccccccccccccccccccc 

SEQ PRRGKKVRKMPVSYLGSKFLGSDLESEDDEELVEAFLRRQEKQPSAPPARRRVNLPVPMF 

SEG xxxxxxxxxxx 

PRD ccccccccccceeeccccccccccccchhhhhhhhhhhhhhccccccchhhhhccccccc 
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SEQ EDNLGPQLSKADRWREYVSQVSWGKLKRRVKGWAPRAGPGVGEARLASTAVESAGVSSAP 

SEG 

PRD cccccccchhhhhhhhhheeeeccchhhhhhccccccccccchhhhhhhhhhhccccccc 

SEQ EGTSPGDRLGNAGDVCVPQASPRRWRPKINWASFRRRRKEOTAPTGQGADIEADQGGEAA 

SEG 

PRD cccccccccccccceeeecccccccccccchhhhhhhhhhhhhcccccchhhhhccchhh 

SEQ DSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRAGAPAEEGAEAADNQR 

SEG xxxxxxxxxxxxx xxxxxxxxxxxx . . . . 

PRD hhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhccccchhhhhhhhhhh 

S EQ EEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAP 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ AVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVSAAQGTTGTAPGARARKQVKTVRF 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccccccchhhhhhhhhhh 

SEQ QTPGRFSWFCKRRRAFWHTPRLPTLPKRVPRAGEVRNLRVLRAEARAEAEQGEQEDQL 

SEG xxxxxxxxxxxxxx. . . 

PRD cccccceeehhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphfbr2_22k3 . 2 



PS00001 


101- 


•>105 


ASN GLYCOSYLATION 


PDOC000D1 


PS00005 


112- 


■>115 


PKC_PHOSPHO 


SITE 


PDOC00005 


PSQ0005 


261- 


■>264 


PKC PHOSPHO 


"site 


PDOC00005 


PSQ0005 


273- 


•>276 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


302- 


■>305 


PKC PHOSPHO" 


"site 


PDOC00005 


PSQ0005 


477- 


•>480 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


499- 


•>502 


PKC PHOSPHO 


SITE 


PDOC0C005 


PS00006 


51->55 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


103- 


•>107 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


108- 


■>112 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


112- 


■>116 


CK2 PHOSPHO" 


"site 


PDOCOC'006 


PS00006 


142- 


•>146 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


146- 


>150 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


189- 


•>193 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


229- 


•>233 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


238- 


•>242 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


244- 


•>248 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


302- 


>306 


CK2 PHOSPHO^ 


"site 


PDOC0C006 


PS00008 


95- 


>101 


MYRISTYL 




PDCC00008 


PS00008 


220- 


>226 


MYRISTYL 




PDOC00008 


PS00008 


242- 


>248 


MYRISTYL 




PDOC00008 


PS00008 


296- 


>302 


MYRISTYL 




PDOC00008 


PS00008 


314- 


>320 


MYRISTYL 




PDOC00008 


PS00008 


317- 


>323 


MYRISTYL 




PDOC00008 


PS00008 


328- 


>334 


MYRISTYL 




PDOC00008 


PS00008 


352- 


>358 


MYRISTYL 




PDOC00008 


PS00008 


400- 


>406 


MYRISTYL 




PDOC00008 


PS00008 


450- 


>456 


MYRISTYL 




PDOC00008 


PS00008 


461- 


>4 67 


MYRISTYL 




PDOC00008 


PS00008 


464- 


>470 


MYRISTYL 




PDOC03008 


PS00009 


123- 


>127 


AM I DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphfbr2_22k3 . 2) 
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DKFZphfbr2_22k8 



group: brain derived 

DKFZphfbr2_22k8 encodes a novel 172 amino acid protein without similarity to known proteins. 
No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 
Locus: /map="7" 
Insert length: 2789 bp 

Poly A stretch at pos. 2769, polyadenylation signal at pos. 2756 



1 GGGGGAGCCA TGAGGCGCCA GCCTGCGAAG GTGGCGGCGC TGCTGCTCGG 
51 GCTGCTCTTG GAGTGCACAG AAGCCAAAAA GCATTGCTGG TATTTCGAAG 
101 GACTCTATCC AACCTATTAT ATATGCCGCT CCTACGAGGA CTGCTGTGGC 
151 TCCAGGTGCT GTGTGCGGGC CCTCTCCATA CAGAGGCTGT GGTACTTCTG 
201 GTTCCTTCTG ATGATGGGCG TGCTTTTCTG CTGCGGAGCC GGCTTCTTCA 
251 TCCGGAGGCG CATGTACCCC CCGCCGCTGA TCGAGGAGCC AGCCTTCAAT 
301 GTGTCCTACA CCAGGCAGCC CCCAAATCCC GGCCCAGGAG CCCAGCAGCC 
351 GGGGCCGCCC TATTACACTG ACCCAGGAGG ACCGGGGATG AACCCTGTCG 
401 GGAATTCCAC GGCAATGGCT TTCCAGGTCC CACCCAACTC ACCCCAGGGG 
4 51 AGTGTGGCCT GCCCGCCCCC TCCAGCCTAC TGCAACACGC CTCCGCCCCC 
501 GTACGAACAG GTAGTGAAGG CCAAGTAGTG GGGTGCCCAC GTGCAAGAGG 
551 AGAGACAGGA GAGGGCCTTT CCCTGGCCTT TCTGTCTTCG TTGATGTTCA 
601 CTTCCAGGAA CGGTCTCGTG GGCTGCTAAG GGCAGTTCCT CTGATATCCT 
651 CACAGCAAGC ACAGCTCTCT TTCAGGCTTT CCATGGAGTA CAATATATGA 
701 ACTCACACTT TGTCTCCTCT GTTGCTTCTG TTTCTGACGC AGTCTGTGCT 
751 CTCACATGGT AGTGTGGTGA CAGTCCCCGA GGGCTGACGT CCTTACGGTG 
801 GCGTGACCAG ATCTACAGGA GAGAGACTGA GAGGAAGAAG GCAGTGCTGG 
851 AGGTGCAGGT GGCATGTAGA GGGGCCAGGC CGAGCATCCC AGGCAAGCAT 
901 CCTTCTGCCC GGGTATTAAT AGGAAGCCCC ATGCCGGGCG GCTCAGCCGA 
951 TGAAGCAGCA GCCGACTGAG CTGAGCCCAG CAGGTCATCT GCTCCAGCCT 
1001 GTCCTCTCGT CAGCCTTCCT CTTCCAGAAG CTGTTGGAGA GACATTCAGG 
1051 AGAGAGCAAG CCCCTTGTCA TGTTTCTGTC TCTGTTCATA TCCTAAAGAT 
1101 AGACTTCTCC TGCACCGCCA GGGAAGGATA GCACGTGCAG CTCTCACCGC 
1151 AGGATGGGGC CTAGAATCAG GCTTGCCTTG GAGGCCTGAC AGTGATCTGA 
1201 CATCCACTAA GCAAATTTAT TTAAATTCAT GGGAAATCAC TTCCTGCCCC 
12 51 AAACTGAGAC ATTGCATTTT GTGAGCTCTT GGTCTGATTT GGAGAAAGGA 
1301 CTGTTACCCA TTTTTTTGGT GTGTTTATGG AAGTGCATGT AGAGCGTCCT 
1351 GCCCTTTGAA ATCAGACTGG GTGTGTGTCT TCCCTGGACA TCACTGCCTC 
1401 TCCAGGGCAT TCTCAGGCCC GGGGGTCTCC TTCCCTCAGG CAGCTCCAGT 
1451 GGTGGGTTCT GAAGGGTGCT TTCAAAACGG GGCACATCTG GCCGGGAAGT 
1501 CACATGGACT CTTCCAGGGA GAGAGACCAG CTGAGGCGTC TCTCTCTGAG 
1551 GTTGTGTTGG GTCTAAGCGG GTGTGTGCTG GGCTCCAAGG AGGAGGAGCT 
1601 TGCTGGGAAA AGACAGGAGA AGTACTGACT CAACTGCACT GACCATGTTG 
1651 TCATAATTAG AATAAAGAAG AAGTGGTCGG AAATGCACAT TCCTGGATAG 
1701 GAATCACAGC TCACCCCAGG ATCTCACAGG TAGTCTCCTG AGTAGTTGAC 
1751 GGCTAGCGGG GAGCTAGTTC CGCCGCATAG TTATAGTGTT GATGTGTGAA 
1801 CGCTGACCTG TCCTGTGTGC TAAGAGCTAT GCAGCTTAGC TGAGGCGCCT 
1851 AGATTACTAG ATGTGCTGTA TCACGGGGAA TGAGGTGGGG GTGCTTATTT 
1901 TTTAATGAAC TAATCAGAGC CTCTTGAGAA ATTGTTACTC ATTGAACTGG 
1951 AGCATCAAGA CATCTCATGG AAGTGGATAC GGAGTGATTT GGTGTCCATG 
2001 CTTTTCACTC TGAGGACATT TAATCGGAGA ACCTCCTGGG GAATTTTGTG 
2051 GGAGACACTT GGGAACAAAA CAGACACCCT GGGAATGCAG TTGCAAGCAC 
2101 AGATGCTGCC ACCAGTGTCT CTGACCACCC TGGTGTGACT GCTGACTGCC 
2151 AGCGTGGTAC CTCCCATGCT GCAGGCCTCC ATCTAAATGA GACAACAAAG 
2201 CACAATGTTC ACTGTTTACA ACCAAGACAA CTGCGTGGGT CCAAACACTC 
2251 CTCTTCCTCC AGGTCATTTG TTTTGCATTT TTAATGTCTT TATTTTTTGT 
2301 AATGAAAAAG CACACTAAGC TGCCCCTGGA ATCGGGTGCA GCTGAATAGG 
2351 CACCCAAAAG TCCGTGACTA AATTCCGTTT GTCTTTTTGA TAGCAAATTA 
2401 TGTTAAGAGA CAGTGATGGC TAGGGCTCAA CAATTTTGTA TTCCCATGTT 
24 51 TGTGTGAGAC AGAGTTTGTT TTCCCTTGAA CTTGGTTAGA ATTGTGCTAC 
2501 TGTGAACGCT GATCCTGCAT ATGGAAGTCC CACTTTGGTG ACATTTCCTG 
2551 GCCATTCTTG TTTCCATTGT GTGGATGGTG GGTTGTGCCC ACTTCCTGGA 
2601 GTGAGACAGC TCCTGGTGTG TAGAATTCCC GGAGCGTCCG TGGTTCAGAG 
2 651 TAAACTTGAA GCAGATCTGT GCATGCTTTT CCTCTGCAGC AATTGGCTCG 
2701 TTTCTCTTTT TTGTTCTCTT TTGATAGGAT CCTGTTTCCT ATGTGTGCAA 
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2751 AATAAAAATA AATTTGGGCA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry HS671255 from database EMBL: 
human STS SHGC-11828. 
Length = 400 
Minus Strand HSPs: 

Score = 1822 (273.4 bits), Expect = 4.8e-76, P = 4.8e-76 
Identities = 382/397 (96%), Positives = 382/397 (96%), 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 525 bp; peptide length: 172 
Category: putative protein 
Classification: unset 



1 MRRQPAKVAA LLLGLLLECT EAKKHCWYFE GLYPTYYICR SYEDCCGSRC 

51 CVRALSIQRL WYFWFLLMMG VLFCCGAGFF I RRRMYPPPL IEEPAFNVSY 

101 TRQPPNPGPG AQQPGPPYYT DPGGPGMNPV GNSTAMAFQV PPNSPQGSVA 

151 CPPPPAYCNT PPPPYEQVVK AK 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8, frame 1 

PIR:S14970 extensin class I (clone wl7-l) - tomato, N = 1, Score = 118, 
P = 2.3e-07 



>PIR:S14970 extensin class I (clone wl7-l) - tomato 
Length = 132 



HSPs: 



Score = 118 (17.7 bits), Expect = 2.3e-07, P = 2.3e-07 
Identities = 30/82 (36%), Positives = 35/82 (42%) 



Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 146 

PPP P Y + PPPP PPYYPP+P + PSP 

Sbjct: 32 PPPSPSPPP— PYYYKSPPPPSPSP — PPPYYYKSPPPPDPSPPPPYYYKSPPPPSPSPP 87 



Query: 147 GSVACPPPPAYCNTPPPP — YEQV 168 

PPPP Y + PPPP YE + 
Sbjct: 88 PPSPSPPPPTYSSPPPPPPFYENI 111 

Score = 104 (15.6 bits), Expect = 6.9e-06, P = 6.9e-06 
Identities = 28/78 (35%), Positives = 34/78 (43%) 



Query: 87 PPPLIEEPAFNVSYTRQPPNFGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 146 

PP P + Y + PP P P P P YY P P +P + + PP P 

Sbjct: 1 PPSPSPPPPY YYKSPPPPSPSP — PPPYYYKSPPPPSPSP PPPYYYKSPP-PPS 51 



Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y +PPPP 
Sbjct: 52 PS PPPPYYYKSPPPP 66 

Score = 102 (15.3 bits), Expect = l.le-05, P = l.le-05 
Identities = 30/78 (38%), Positives = 33/78 (42%) 



Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 146 

PPP P Y+PPPP PPYYPP+P S+ PPP 
Sbjct: 48 PPPSPSPPP — PYYYKSPPPPDPSP — PPPYYYKSPPPPSPSPPPPSPS PP-PPT 97 
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Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y N P PP 
Sbjct: 98 YSSPPPPPPFYENIPLPP 115 



Score = 95 (14.3 bits). Expect = 2.4e-04, P = 2.4e-04 
Identities = 24/61 (39%), Positives - 29/61 (47%) 



Query: 104 PPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPP 163 

PP+P P P P YY P P +P ++ PP P S PPPP Y +PPP 
Sbjct: 1 PPSPSP PPPYYYKSPPPPSPSP PPPYYYKSPP-PPSPS PPPPYYYKSPPP 49 

Query: 164 P 164 
P 

Sbjct: 50 P 50 



Score = 68 (10.2 bits). Expect = 4.2e+00, P = 9.8e-01 
Identities = 24/69 (34%), Positives = 29/69 (42%) 



Query: 87 PPPLIEEPAFNVSYTRQPP NPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPN 143 

PPP P Y PP +P P + P PP Y+ P P P + + PP 
Sbjct: 63 PPPPDPSPPPPYYYKSPPPPSPSPPPPSPSPPPPTYSSPPPPP--PFYENIPL PPV 116 



Query: 144 SPQGSVACPPPP 155 

S A PPPP 
Sbjct: 117 IGV-SYASPPPP 127 



Peptide information for frame 3 



ORF from 0 bp to 368 bp; peptide length: 123 
Category: questionable ORF 
Classification: unset 



1 GSHEAPACEG GGAAARAALG VHRSQKALLV FRRTLSNLLY MPLLRGLLWL 
51 QVLCAGPLHT EAVVLLVPSD DGRAFLLRSR LLHPEAHVPP AADRGASLQC 
101 VLHQAAPKSR PRSPAAGAAL LH 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22k8, frame 1 



Report for DKFZphf br2_22k8 . 1 



[LENGTH] 172 

[MW] 19194.47 

[pi] 8.77 

[KW] SIGNAL_PEPTIDE 23 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 27.33 % 

SEQ MRRQPAKVAALLLGLLLECTEAKKHCWYFEGLYPTYYICRSYEDCCGSRCCVRALSIQRL 

SEG xxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccceeeeccccccccccchhhhhhhhhh 

MEM 

SEQ WYFWFLLMMGVLFCCGAGFFIRRRMYPPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYT 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhccccceeeeecccccccccccccceeeeccccccccccccccccccc 

MEM .... MMMMMMMMMMMMMMMMM 

SEQ DPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPPPYEQVVKAK 

SEG xxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccccccceeecccccccccccccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphf br2_22k8 . 1) 
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Pedant information for DKFZphf br2_22k8, frame 3 



Report for DKFZphfbr2_22k8 . 3 

[LENGTH] 122 
[MW] 12854.08 
[pi] 10.27 
[KW] AllJUpha 

[KW] LOW_COMPLEXITY 25.41 % 

SEQ GSHEAPACEGGGAAARAALGVHRSQKALLVFRRTLSNLLYMPLLRGLLWLQVLCAGPLHT 

SEG . . . . xxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhccccchhhhhhhhhhhhhhhccccccchhhhhhhhcccccc 

SEQ EAVVLLVPSDDGRAFLLRSRLLHPEAHVPPAADRGASLQCVLHQAAPKSRPRSPAAGAAL 

SEG xxxxxxxxxxxxxxx . 

PRD cceeeeeccccchhhhhhhhccccccccccccccchhhhhhhhhccccccccchhhhhhc 

SEQ LH 

SEG 

PRD CC 

(No Prosite data available for DKFZphf br2_22k8 . 3) 
(No Pfam data available for DKFZphfbr2_22k8 . 3! 
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DKFZphfbr2_23blO 



group: nucleic acid managment 

DKFZphfbr2_2blO encodes a novel 580 amino 
helicase HEL117. 



acid protein with strong similarity to rat RNA 



HEL117 is a DEAD/H box helicase, which co-localises with a splicing factor and thus seems 
be involved in splicing. 

The new protein can find application in modulation of splicing. 



strong similarity to rat RNA helicase HEL117 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2905 bp 

Poly A stretch at pos. 2885, no polyadenylation signal found 



1 GGGGGCTCCG CTCCGCACCA CCAACCCCGG GCCGCAGTCC TGACGAGCGG 
51 GTCAGGGCTT GTCGGGCGGA AGCCTGGCCT GGAGCCTGGA AGGGGGAGAC 
101 GGCCCGAGCG GGAGCGGGAG CGGACGCGGC CTCAGTCCTG CGCGGAATAT 
151 TGAAGGATGT TTGTTCCAAG ATCTCTAAAA ATCAAGAGGA ATGCTAATGA 
201 TGATGGCAAA AGTTGTGTGG CTAAGATAAT TAAACCAGAC CCAGAAGACC 
251 TTCAGTTGGA CAAAAGCAGA GATGTTCCCG TTGATGCTGT AGCTACAGAA 
301 GCAGCCACAA TAGACAGGCA CATCAGCGAA TCATGCCCTT TCCCCAGCCC 
351 AGGTGGCCAG TTGGCAGAGG TTCATTCAGT AAGTCCCGAG CAGGGTGCGA 
401 AGGACAGCCA TCCTTCTGAA GAGCCCGTTA AGTCATTTTC CAAAACACAG 
451 CGCTGGGCAG AACCAGGGGA ACCCATCTGT GTTGTCTGTG GTCGTTATGG 
501 AGAGTATATC TGTGATAAGA CAGATGAAGA TGTGTGTAGT TTGGAGTGTA 
551 AAGCGAAACA TCTTCTACAA GTTAAGGAAA AGGAAGAGAA ATCAAAACTC 
601 AGCAATCCAC AGAAGGCTGA TTCTGAGCCA GAGTCTCCAC TGAATGCTTC 
651 CTATGTCTAC AAAGAGCACC CCTTTATTTT GAACCTTCAG GAAGACCAGA 
701 TTGAAAATCT TAAACAGCAG CTGGGAATTT TAGTTCAAGG GCAAGAAGTC 
751 ACCAGGCCCA TTATTGACTT TGAACATTGT AGTCTCCCTG AGGTCTTAAA 
801 TCACAACTTG AAGAAATCAG GCTATGAGGT GCCAACTCCC ATTCAAATGC 
851 AGATGATTCC TGTGGGACTT CTGGGAAGAG ACATTCTGGC CAGTGCAGAT 
901 ACTGGCTCAG GAAAAACAGC TGCTTTTCTT CTTCCTGTTA TCATGCGAGC 
951 TTTATTCGAG AGCAAAACTC CATCTGCGCT CATTCTTACA CCAACCAGAG 
1001 AGTTAGCCAT TCAGATAGAG AGACAAGCTA AAGAATTGAT GAGTGGCCTG 
1051 CCACGCATGA AAACTGTGCT TCTTGTAGGG GGCTTACCCT TACCCCCACA 
1101 GCTTTATCGT CTGCAACAAC ATGTTAAGGT TATCATAGCA ACCCCTGGGC 
1151 GACTTCTGGA TATAATAAAG CAGAGCTCTG TAGAACTCTG TGGTGTAAAG 
1201 ATTGTGGTAG TAGATGAAGC TGATACCATG TTAAAGATGG GTTTTCAACA 
1251 ACAAGTGCTT GACATTTTGG AAAACATTCC TAATGATTGT CAGACCATTT 
1301 TGGTTTCAGC CACAATTCCA ACTAGCATAG AACAGCTAGC AAGCCAGCTT 
1351 CTGCATAATC CTGTGAGAAT TATCACTGGA GAAAAGAACC TACCTTGTGC 
14 01 CAATGTACGT CAGATTATTT TGTGGGTAGA AGACCCAGCC AAAAAGAAAA 
1451 AATTATTTGA AATTTTAAAT GATAAGAAAC TCTTTAAGCC TCCAGTGTTA 
1501 GTATTTGTGG ACTGCAAACT AGGAGCAGAT CTTTTGAGTG AAGCCGTTCA 
1551 GAAAATCACA GGGCTGAAAA GCATATCTAT ACATTCGGAG AAGTCGCAAA 
1601 TAGAAAGGAA AAACATATTG AAGGGATTAC TTGAAGGAGA CTATGAAGTT 
1651 GTAGTGAGCA CAGGAGTCTT GGGACGAGGC CTAGACTTGA TCAGTGTCAG 
17 01 GCTGGTTGTC AATTTTGATA TGCCTTCAAG TATGGATGAG TATGTCCATC 

17 51 AGGAAAATAC CTACAAGTCT ACTTGGAGGA ATCCCCAGCA TTTTCAACAG 

18 01 GATGTCAGAA TGACCTTGGG CTATGTTGGC AAAGCACAAT GGGAAGAAGA 
18 51 CAACCAATTG AAGGTCAAAC TAGGCCTTAA AAAAAATTGT TCTTCCTAAA 
1901 TGAAACTTTA TGTAAGACCC AAGCTTCCTT TATGTAAAAA TAGGATACTC 
1951 ACTAGGCTTT GGGGCTGACA ATGGTTTTTA AATCTTGCTA ATCTTCCCTG 
2001 GAATGAAACC AGCATGACTC AAAGAGAAAA AGAGAGTCTA TAATATTTTC 
2051 TAATCCCTGA GTTCTTTTCT TTATATATTA AAAAGGATTA TTAGGCTGGG 
2101 TGTGGTGGCT CACGCCTGTA ATCCCAGCAC TTTGGGAGGC CGAGGGGAGT 
2151 GGATCACCTG AGTTCGAGAC CAGCCTAACC AACATGGAGA AACCCTGTCT 
22 01 CTACTAAAAA TACAAAATTA GCCAGGCGTG GTGGCGCATG CCTGTAATCC 
2251 CAGCTACTCA GGAGGCTACA GCAGGAGAAT TGCTTGAACT CGGGAGGCAG 
2301 AGCCAAGATC GCACCACTGC ACTCCAGCCT GGGCAACAAG AGTGAAACTC 
2351 TGTCTCAAAA TAATATTAAT GATAATAATA ATAATAATAA TAGGGATTAC 
2401 TTGCATAATT GTTCTTTTAA AATTATTGGC AGTATTGCTG AATGTATTTA 
2451 GATTTTTTCA CCAAGTGACA ACAACTGAAT TCATAAAGAT TCATCAACAA 
2501 GACCTGATAA AAAAAAATGT AAGCATATTA TAGTGGATAC TTCCAAGACT 
2551 CTTGGTCTAA CATGTATTAG AAAGCAGAAG GAGCCCAGGC ACAGGGGCTC 
2601 CCGCCGGTAA TCCCAAAGCT TTGGGAAGCC AAGGCAGGTG GATCGCTTGA 
2651 GCTCAGGAGT TAGAGACCAG CCTGGGCAAC ATGGTGAAAT CCCGTCACCA 
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2701 CAAAAAAATG CAAAAATTAA CTGGGCGTGG TGGCATGCAC CTGTAGTCCC 

2751 AGCTACTCTG GAGGCTGAGG TGAGGGGAAT CACCTGAGCC GGGGGAATCA 

2 801 CCTGAGCCCA GGGAAGTTGA GGCTGCTGTG AGCCATGGTC ATGACACTGC 

2851 CCTCCAGCCT GGACAACAGA TTGAGACCCT GTCTCAAAAA AAAAAAAAAA 
2 901 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



Medline : 

A putative mammalian RNA helicase with an arginine-serine-rich 
domain 



Peptide information for frame 1 



ORF from 157 bp to 1896 bp; peptide length: 580 
Category: strong similarity to known protein 
Prosite motifs: ATP_GTP_A (247-255) 
LEUCINE_ZIPPER (298-320) 



1 MFVPRSLKIK RNANDDGKSC VAKIIKPDPE DLQLDKSRDV PVDAVATEAA 

51 TIDRHISESC PFPSPGGQLA EVHSVSPEQG AKDSHPSEEP VKSFSKTQRW 

101 AEPGEPICVV CGRYGEYICD KTDEDVCSLE CKAKHLLQVK EKEEKSKLSN 

151 PQKADSEPES PLNASYVYKE HPFILNLQED QIENLKQQLG ILVQGQEVTR 

201 PIIDFEHCSL PEVLNHNLKK SGYEVPTPIQ MQMI PVGLLG RDILASADTG 

251 SGKTAAFLLP VIMRALFESK TPSALILTPT RELAIQIERQ AKELMSGLPR 

301 MKTVLLVGGL PLPPQLYRLQ QHVKVIIATP GRLLDIIKQS SVELCGVKIV 

351 VVDEADTMLK MGFQQQVLDI LENIPNDCQT ILVSATIPTS IEQLASQLLH 

401 NPVRIITGEK NLPCANVRQI ILWVEDPAKK KKLFEILNDK KLFKPPVLVF 

451 VDCKLGADLL SEAVQKITGL KSISIHSEKS QIERKNILKG LLEGDYEVVV 

501 STGVLGRGLD LISVRLVVNF DMPSSMDEYV HQENTYKSTW RNPQHFQQDV 
551 RMTLGYVGKA QWEEDNQLKV KLGLKKNCSS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_23blO, frame 1 
PIR:AS7514 RNA helicase HEL117 - rat, N = 2, Score » 615, P = 1.6e-60 



TREMBL:AB018344_1 gene: "KIAA0801"; product: "KIAA0801 protein"; Homo 
sapiens mRNA for KIAA0801 protein, complete cds., N - 1, Score = 615, P 
= 2.8e-59 



TREMBL:CEF01F1_1 gene: "F01F1.7"; Caenorhabdi ti s elegans cosmid 
F01F1., N = 2, Score = 365, P = 1.9e-58 



TREMBL:AF083255_1 product: "RNA helicase-related protein"; Homo 
sapiens RNA helicase-related protein mRNA, complete cds., N = 2, Score 
= 556, P = 1.5e-57 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N = 1, Score = 591, P = 1.6e-57 



>PIR:A57514 RNA helicase HEL117 - rat 
Length = 1,032 



HSPs: 



Score = 615 (92.3 bits), Expect = 1.6e-60, Sum P(2) = 1.6e-60 
Identities = 140/394 (35%), Positives = 236/394 (59%) 



Query: 14 4 EKSKLSNPOKADSEPESPLNASYVYKEHPFILNLQEDQIENLKQQL-GILVQGQEVTRPI 202 

++ KL P P ++ Y E P + + ++++ + ++ GI V+G+ +PI 

Sbjct: 313 KQRKLLEPVDHGKIEYEPFRKNF-YVEVPELAKMSQEEVNVFRLEMEGITVKGKGCPKPI 371 

Query: 203 IDFEHCSLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAFLLPV- 261 
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+ C + + ++LKK GYE PTPIQ Q IP + GRD++ A TGSGKT AFLLP+ 



Sbjct : 


372 


KSWVQCGI SMKILNSLKKHGYEKPTPIQTQAI PAIMSGRDLIGI AKTGSGKTIAFLLPMF 


431 


Query : 


262 


— IM — RALFESKTPSALILTPTRELAIQIERQAKELMSGLPRMKTVLLVGGLPLPPQLY 


317 






IM R+L E + P A+I+TPTRELA+QI ++ K+ L ++ V + GG + Q+ 




Sbjct : 


432 


RHIMDQRSLEEGEGPIAVIMTPTRELALQITKECKKFSKTLG-LRVVCVYGGTGISEQIA 


490 


Query : 


318 


RLQQHVKVI IATPGRLLDI IKQSS VbLGGVKI VVvlJhADlMLKMbr WvvvLDILENI 


374 






L++ 4-+I+ TPGR++D++ +S L V VV+DEAD M MGF+ QV+ I++N-H 




Sbjct: 


491 


ELKRGAEIIVCTPGRMIDMLAANSGRVTNLRRVTYVVLDEADE^MFDMGFEPQVMRIVDNV 


550 


Query: 


375 


PNDCQTILVSATIPTSIEQLASQLIjHNPVKi I IGEKNLPCANVRQI ILWVEDPAKKKKLF 


434 






D QT++ SAT P ++E LA ++L P+ + G +++ C++V Q ++ +E+ K KL 




Sbjct: 


551 


RPDRQTVMFSATFPRAMEALARRILSKPIEVQVGGRSVVCSDVEQQVIVIEEEKKFLKLL 


610 


Query: 


435 


EILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEG 


494 






E+L + V++FVD + AD L + + + + +S+H Q +R +1+ G 




Sbjct: 


611 


ELLGHYQE-SGSVI It VDKQLHAUGLLKDLMRAS -YPLMSLHGGlDijYDRDSI INDr KNG 


668 


Query: 


495 


DYFVVVSTGVLGRGLDLIS VRL VVWFDMP^^MDEYVHO 512 








+ + +V+T V RGLD+ + LWN+ P+ ++YVH+ 




Sbjct: 


669 


TCKLLVATSVAARGLDVKHLILVVNYSCPNHYEDYVHR 706 




Score 


= 37 


(5.6 bits), Expect = 1.6e-60, Sum P(2) = 1 . 6e-60 




Identities = 


= 13/36 (36%), Positives - 17/36 (47%) 




Query: 


132 


KAKHLLQVKEKEE- — KSKLSNPQKADSEPESPLNA 164 








KA++ + KEK E SK K D E E +A 




Sbjct: 


113 


KAENRSRSKEKAEGGDSSKEKKKDKDDKEDEKEKDA 14 8 





Pedant information for DKFZphfbr2_23blO, frame 1 



Report for DKFZphfbr2_23bl 0 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
YOR204W] 2e 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
influenzae 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[ BLOCKS ] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 



580 

64572.24 
6. 13 

TREMBL : CEF01F1 



1 gene: "F01F1.7"; Caenorhabditis elegans cosmid F01F1. 8e-61 



-49 



30.10 nuclear organization [S. cerevisiae, YNL112w] 2e-53 
04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-53 

04.05.03 mrna processing (splicing) [S. cerevisiae, YPL119c] 5e-53 

30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 2e-49 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] 2e-4 6 



06.10 assembly of protein complexes 
04.99 other transcription activities 
1 genome replication, transcription, 
HI0892] 3e-35 

04.05.01.07 chromatin modification 

98 classification not yet clear-cut 
09.01 biogenesis of cell wall 
30.16 mitochondrial organization 

99 unclassified proteins [S 
r general function prediction 



[S. cerevisiae, YLL008w] 3e-43 
[S. cerevisiae, YDL160c] 4e-39 
recombination and repair [H. 



[S. cerevisiae, YMR290c] 6e-34 
[S. cerevisiae, YOR046c] 3e-32 
[S. cerevisiae, YJL033w] 8e-30 
[S. cerevisiae, YDR194c] 5e-23 
cerevisiae, YGL0 64 c ] le-16 

[M. jannaschii, MJ1401] 5e-ll 



11.10 cell death [S. cerevisiae, YMR190c] le-06 

03.19 recombination and dna repair [S. cerevisiae, YMR190c] le-06 

BL00115B Eukaryotic RNA polymerase II heptapeptide repeat proteins 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

nucleus 6e-53 

RNA binding 9e-52 

DEAD box 2e-43 

transmembrane protein le-21 

DNA binding 5e-48 

ATP 4e-57 

purine nucleotide binding 2e-43 

P-loop 4e-57 

hydrolase 6e-42 

protein biosynthesis 2e-43 

ATP binding 2e-50 

WW repeat homology le-4 9 

translation initiation factor eIF-4A 2e-43 
DEAD/H box helicase homology 4e-57 
recQ helicase homology 8e-06 
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[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ P F AM ] 

[PFAM] 

[KW] 

[KW] 



unassigned DEAD/H box helicases 4e-57 

ATP-dependent RNA helicase DBP1 2e-53 

ATP-dependent RNA helicase DHH1 6e-40 

tobacco ATP-dependent RNA helicase DB10 le-49 

Bloom's syndrome helicase 8e-06 

ATP_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

CK2_PHOSPHO_SITE 8 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 7 

ASN_GLYCOSYLATION 1 

Helicases conserved C-terminal domain 

DEAD and DEAH box helicases 

Alpha_Beta 

LOW COMPLEXITY 3.10 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MFVPRSLKIKRNANDDGKSCVAKI IKPDPEDLQLDKSRDVPVDAVATEAATIDRHISESC 

ccccceeeeccccccccceeeeeeeeccccceeecccccccccchhhhhhhhhhhhcccc 

PFPSPGGQLAEVHSVSPEQGAKDSHPSEEPVKSFSKTQRWAEPCEPICVVCGRYGEYICD 

cccccccceeeeccccccccccccccccccccccccccccccccccceeeeccccceeec 

KTDEDVCSLECKAKHLLQVKEKEEKSKLSNPQKADSEPESPLNASYVYKEHPFILNLQED 

cccccccchhhhhhhhhhhhhhccccccccccccccccccccccceeeccccccccchhh 

QIENLKQQLGILVQGQEVTRPI IDFEHCSLPEVLNHNLKKSGYEVPTPIQMQMI PVGLLG 

hhhhhhhhheeeeccccccccccccccccchhhhhhhhhhhccccccccccccceeeecc 

RDILASADTGSGKTAAFLLPVIMRALFESKTPSALILTPTRELAIQIERQAKELMSGLPR 

cceeeeeccccccceeeehhhhhhhhcccccceeeeecchhhhhhhhhhhhhhhhccccc 

MKTVLLVGGLPLPPQLYRLQQHVKVIIATPGRLLDIIKQSSVELCGVKIVVVDEADTMLK 

. . . xxxxxxxxxxxxxxxxxx 

eeeeeeecccccchhhhhhhhheeeeeeccccchhhhhhheeeeeeeeeeeehhhhhhhh 

MGFQQQVLDILENIPNDCQTILVSATIPTSIEQLASQLLHNPVRIITGEKNLPCANVRQI 

cccchhhhhhhhhcccccceeeeecccchhhhhhhhhhhhceeeeeeeccccccccccce 

ILWVEDPAKKKKLFEILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSEKS 

eeecccchhhhhhhhhhhhhccccceeeeeeecccchhhhhhhhhhhhccceeeccccch 

QIERKNILKGLLEGDYEVVVSTGVLGRGLDLISVRLVVNFRMPSSMDEYVHQENTYKSTW 

hhhhhhhhhhhccccceeeeehhhhhhcccceeeeeeeeecccccccceeeecccccccc 

RNPQHFQQDVRMTLGYVGKAQWEEDNQLKVKLGLKKNCSS 

ccccccchhhhhhhccccchhhhhhhhhhhhhhhcccccc 



Prosite for DKFZphf br2_23bl0 . 1 



PS00001 


163- 


■>167 


ASN 


GLYCOS YLATION 


PDOC000O1 


PS00005 




6->9 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


97- 


>100 


PKC 


PHOSPHO 


"site 


PDOC00005 


PS00005 


251- 


■>254 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


477- 


>480 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


513- 


>516 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


535- 


■>538 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


539- 


>542 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


122- 


>126 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


156- 


>160 


CK2 


"PHOSPHO_ 


"site 


PDOC00006 


PS00006 


209- 


>213 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


221- 


>225 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


340- 


>344 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


389- 


>393 


CK2 


PHOSPHO 


"site 


PDOC00006 


PS00006 


480- 


>484 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


524- 


>528 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00007 


489- 


>497 


TYR~ 


"PHOSPHO 


"site 


PDOC00007 


PS00008 


66 


,->72 


MYRISTYL 




PDOC00008 


PS00008 


8C 


i->86 


MYRISTYL 




PDOC00008 
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PS0000B 
PS00008 
PS0000B 
PS00008 
PS00017 
PS00029 



195->201 
250->256 
490->496 
573->579 
247->255 
298->320 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

ATP_GTP_A 

LEUCINE ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOCD0008 
PDOC00017 
PDOC00029 



Pfam for DKFZphf br2_23b!0 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



DEAD and DEAH box helicases 



209 



k gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
+LP+ + N+++ G+E PTPIQ+Q IP+ L GRD++A A TGSGKTAAF 
SLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAF 257 



llPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
L+P++ + + + ++P ALIL+PTRELA+QI +++++++ + ++ ++ 

258 LLPVIMRALFES--KTPS ALILTPTRELAIQIERQAKELMSGLPRMK 



302 



ImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleMLV 
++++GG+++ +Q+ +L++ + ++IATPGRL+D+I++ ++ L ++++V 
303 TVLLVGGLPLPPQLYRLQQHV-KVIIATPGRLLDIIKQSSVELCGVKIVV 351 

MDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARrFM 
DEAD ML MGF++Q+ +1+ IP + QT++ SAT+P +I++LA ++ 
352 VDEADTMLKMGFQQQVLDILENIP — NDCQTILVSATI PTSIEQLASQLL 399 

RNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
+NP+RI+ ++++L N++Q++ +VE + K +L+++++ 
400 HNPVRIITGEKNLPCA-NVRQI ILWVE-DPAKKKKLFEILN 438 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Helicases conserved C-terminal domain 



458 



*EileeWLknl . GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTDVgg 
++L+E ++ G++ ++IH+ ++Q ER +I++ +G+Y V ++T V+.G 
DLLSEAVQKITGLKSISIHSEKSQIERKNILKGLLEGDYEVVVSTGVLG 



506 



RGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
RG+D+++V++V+N+DMP +++ Y++ + T + 
507 RGLDLISVRLVVNFDMPSSMDEYVH-QENTYKST 



539 
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DKFZphfbr2_23b21 



group: signal transduction 

DKFZphfbr2_23b21 . 1 encodes a novel 193 amino acid protein which is nearly identical to bovine 
neurocalcin . 

Neurocalcin is a Ca (2+) -binding protein with three putative Ca (2+) -binding domains (EF-hands) . 
In cattle, 6 isoforms are differentially expressed in the central nervous system, retina and 
adrenal gland. Homology with recoverin indicates involvement in Ca2+ dependent activation of 
guanylate cyclase. 

The new protein can find application in modulating/blocking the guanylate cyclase-pathway . 



nearly identical to bovine neurocalcin 

complete cds complete cDNA 
EST hits 

Sequenced by AGOWA 

Locus: /map="574.6 cR from top of Chr8 linkage group" 
Insert length: 3300 bp 

Poly A stretch at pos . 3279, polyadenylation signal at pos . 3249 



1 GGGGAGAATC TGGTGGATGC TGGAGCTTGC TGCTGCTGCT ACTGCTGTTT 
51 CCAGGGGCTG CAGAGCATGG ACTGTTAAAT CTTGCACTTC TTCTGAGTGA 
101 GCTGAATTCT TGCCGCCAGG ATGGGGAAAC AGAACAGCAA GCTGCGCCCG 
151 GAGGTCATGC AGGACTTGCT GGAAAGCACA GACTTTACAG AGCATGAGAT 
201 CCAGGAATGG TATAAAGGCT TCTTGAGAGA CTGCCCCAGT GGACATTTGT 
251 CAATGGAAGA GTTTAAGAAA ATATATGGGA ACTTTTTCCC TTATGGGGAT 
301 GCTTCCAAAT TTGCAGAGCA TGTCTTCCGC ACCTTCGATG CAAATGGAGA 
351 TGGGACAATA GACTTTAGAG AATTCATCAT CGCCTTGAGT GTAACTTCGA 
401 GGGGGAAGCT GGAGCAGAAG CTGAAATGGG CCTTCAGCAT GTACGACCTG 
451 GACGGAAATG GCTATATCAG CAAGGCAGAG ATGCTAGTGA TCGTGCAGGC 
501 AATCTATAAG ATGGTTTCCT CTGTAATGAA AATGCCTGAA GATGAGTCAA 
551 CCCCAGAGAA AAGAACAGAA AAGATCTTCC GCCAGATGGA CACCAATAGA 
601 GACGGAAAAC TCTCCCTGGA AGAGTTCATC CGAGGAGCCA AAAGCGACCC 
651 GTCCATTGTG CGCCTCCTGC AGTGCGACCC GAGCAGTGCC GGCCAGTTCT 
701 GAGCCCTGCG CCCACCAATC GAATTGTAGA GCTGCTTGTG TTCCCTTTTG 
751 ATTCTTCTTT TTAACAATTT TTTTTTTTTT TTGCCAAACA ATATCAATGG 
801 TGATGCCGTC CCCTGTGCGG TCTGATGCGC CTTCCTCCGT GACGCCTTCA 
851 GCCTCTTTTG TCGTGGATGC TTCGTGGGAA TGCCCAGAGC CCCAGTGTGC 
901 TTGTGGAGAG CATGGACAGA CTTCGTGGTG TTCATTGTTT GATGATTTTT 
951 AATCGTTACT ATTATTTCTT TTTATTCTAA TGTCTCTGTT CTAAAACGTA 
1001 AGACTCGGGG GTTGGGGCAA AAGAAGGGAA ACCCATCCAG TCCTGTGATT 
1051 CTATTGCAAG CTTCAAGGGG CTTTTGTTTG AAAGACAAAA CTCCCCACCT 
1101 GGGTCTGTTG TCACACGTGC CGTAGGGGTG ATGGATGGCA CCGGATGCTG 
1151 GATTCCCCAA GAACAAGTTA CCCTCTGGGG TGAGGCTATT CCAGCGAGCT 
1201 GGGACATTTC CCCATGGGGG CCCACTCCCC TCTCTTCCCC AGCAGGCTGT 
1251 AGTTTCTAAG CTGTGAACAT TTCAAGATAA ATTAACAGAG GAGAGGAAAA 
1301 AGATGGCTCA GCTATTTTTT CACAGGTTTA CACTAGTTGA GCTAATATGC 
1351 GTGTCTTTGG AAATTAAACA CAAATGGTAA CATATTCCAA AACCAGACCC 
1401 ATCTTGTTGC CTATTGTGAT AAAATAAAAA GACGGCTGTA TATAACATAT 
14 51 TGGGTAATGC AGACCAAATT AAGTGTTTTG CCTTGTTTAA ATGAAATGCA 
1501 TGTTTAGTGA GCACTAATAC AATCTTATTC CAGAAGACTG TTTTTAGTAG 
1551 CTTATTGTGA AGTAAGACAA CTATAATGAA TGTCTGTCTT GTTTGGAAGT 
1601 CATATCTGTC TTTGCACAAA TGTACCAATC GACAAGTATA TTTTATATAT 
1651 TCCATAAAAA TACAAAGTAA CCCTGACTAG GGCCCAACTT TAATTTTGAA 
1701 TGCATTTCCA GAGTGGCCAT GCCTAGAGGG CAGATGCAGA GCAGGTGGTA 
1751 GTGGGACAGG ACAATTGGAG CACAGGAATG TTAACATGTA TGACAGGGGA 
1801 CCAGTAGGGT GGTTTCCCTC TCAGGCCCAG CAGCCCATTG ACAGCATTAG 
1851 ACTGGCGGCA TGGTGCTTTT CTGAGCAGAT CAATACTCTG CAGACTCGAA 
1901 AAAACATCAC ATACATTCTT GGAACTTCCC AGTGGTTTAA TCTATGTGCA 
1951 TGGTTAGGGA GCCAGGCCTG GAATATTCAG TTTCCCTGCC CCTGTTAAAG 
2001 AATCAGAGGT TGGGCAGTCA TCAAATTCAT CATAAAGACA TGGGCAAGTG 
2051 TGTCTGTGGT TTCCAAGGCC CCCCTATGGA GAATCCAAAA GTATTTTCCA 
2101 TTGCCGTGCT CTTTGAATGC AGACTTCTAT TTCCAGAAGT GACAGCACAA 
2151 GTCTGAGTTG CTGTTTGGTC TGGTGACCTC AGACACACTA ATTTGAATTG 
2201 AAAGCTAAGA GTAAAAATTT GCTGGTTACA GGCGAGTCAT ACTCTTGCAA 
2251 GTAGTTAGCA AAGGGAGGCC CAAATTCTCA AGGTTGTTGA TGGGGAACTT 
2301 GCCACTAAGA GAAGGCAGAG AGGTCCCTAG TGGGTATATT TGCTGCCAAG 
2351 CCACTTGCCA AAGAAGAGGA ACCACAGAAA GAGAGACATC ATGACCAGGA 
2401 GAAAAATGTG ACTAGACATG CTAACCTCCA GGTTTTTATA TATGACTTGA 
2451 GTCTGCTGTA ATTGGCAGCA GAAATCCAAA TTTGTATGGT AGACCAAAAA 
2 501 GAACCAAATC CATAGGGTGA AATTTTGAGA CCTAGACTCT GTAAAAATAA 
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2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 



TCCTAGTCTT 
CTTGCCAAAT 
GCGAATTAAC 
GCCCTATTGG 
TCGAAGTCCT 
TAATGTTTTG 
CTGAGCCAGA 
AGAGGTCTCC 
GAATGCCGAC 
CACTGACCCT 
CCAGGAAGGT 
GACTCCACAG 
CTGCTGGGCT 
AACCTGTTCT 
ATAAAGTGGC 



CCTCCAGGGG 
TCCTCCATGG 
CTAAGACACA 
CAGTGCTCAG 
AGTTCCTTCC 
AGAAACCTGC 
CCCACTCTGT 
GGCTATTCCA 
ACTTCCAGAA 
GTCTGTATTT 
CTTTGTATGT 
CACCCAGAGG 
GTTCATTGTC 
GTCCCAAATA 
TTACGACCTG 



TCAGTTCCTC 
CCAAGTGTTA 
GAAGGCAGAC 
GAGCTGCATC 
TTTGATTCTC 
CTGGGCTCTG 
TCCTTGGAAC 
GAAAGAAAAG 
TGTATAGAAA 
TCTCGGAGGT 
CGAATCCAGT 
ACTGCATGCC 
ATTGCTGTGT 
AAACCAGCCT 
AAGGATTCTA 



ACAGTGGTTC 
AAATCTGTGT 
TGGGTGAGGA 
CCACTTTTCC 
CTTTGGTAGG 
CCCTTAGTCA 
CTAGAGCTGG 
TGAGCCACAT 
TAGTCCCTGT 
TGTTTTTCTC 
GCACTCAAGT 
TCAAGGTTTA 
TCAGGGACCT 
GTGATGTTCA 
AAAAAAAAAA 



TGTACCAAAA 
TTGGAAAATA 
GACCTAGCAT 
CTGCTCTGAA 
TGGAATCAGT 
TGACATCTCG 
AGTGAGGAGT 
GCAGGCTGAT 
CCTGGCCTGC 
CTTCTCCTTC 
TTGGCCAAGG 
TGTCACTCCT 
TTGGAAATAA 
AGGGACTGGA 
AAAAAAAAAA 



BLAST Results 



Entry HS431350 from database EMBL: 
human STS WI-15914. 
Score = 1308, P = 3.1e-53, identities = 276/285 

Entry HSG19929 from database EMBL: 
human STS A002C26. 
Score = 926, P = 1.5e-35, identities = 186/187 

Entry AF052142 from database EMBL: 

Homo sapiens clone 24665 mRNA sequence. 

Score = 7378, P = 0.0e+00, identities = 1482/1487 

3' UTR 



Medline entries 



93247712: 

Neurocalcin family: a novel calcium-binding protein abundant in bovine 
central nervous 
system. 

94045365: 

Distinct regional localization of neurocalcin, a Ca (2+) -binding 
protein, in the bovine adrenal gland. 

96407688: 

Crystallization and preliminary X-ray crystallographic studies of 
recombinant bovine 

neurocalcin delta. 

96066284 : 

Distribution pattern of three neural calcium-binding proteins (NCS-1, 
VILIP and recoverin) 

in chicken, bovine and rat retina. 



Peptide information for frame 1 



ORF from 121 bp to 699 bp; peptide length: 193 
Category: strong similarity to known protein 
Prosite motifs: EF_HAND (73-86) 
EF_HAND (109-122) 
EF_HAND (157-170) 



1 MGKQNSKLRP EVMQDLLEST DFTEHEIQEW YKGFLRDCPS GHLSMEEFKK 
51 IYGNFFPYGD ASKFAEHVFR TFDANGDGTI DFREFIIALS VTSRGKLEQK 
101 LKWAFSMYDL DGNGYISKAE MLVIVQAIYK MVSSVMKMPE DESTPEKRTE 
151 KIFRQMDTNR DGKLSLEEFI RGAKSDPSIV RLLQCDPSSA GQF 

BLASTP hits 

Entry JH0616 from database PIR: 
neurocalcin (clone pCalN) - bovine 
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Score = 1001, P = 5.2e-101, identities = 192/193, positives = 192/193 
Entry GGU91630_1 from database TREMBL: 

product: "neurocalcin" ; Gallus gallus neurocalcin mRNA, complete cds. 
Score = 998, P = l.le-100, identities = 191/193, positives = 192/193 

Entry NECD_BOVIN from database SWISSPROT: 
NEUROCALCIN DELTA. 

Score = 996, P = 1.8e-100, identities = 191/192, positives = 191/192 

Entry S47565 from database PIR: 
BDR-1 protein - human 

Score = 934, P = 6.5e-94, identities = 174/193, positives = 187/193 
Entry 150676 from database PIR: 

gene Rem-1 protein - chicken >TREMBL : GGREM1_1 gene: "Rem-1"; G. gallus 
rem-1 mRNA 

Score = 933, P = 8.4e-94, identities = 174/193, positives = 186/193 



Alert BLASTP hits for DKFZphfbr2_23b21, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_23b21 , frame 1 

Report for DKFZphfbr2_23b21 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT]- 

[ 

[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
0.001 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[ SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 



193 

22215.30 
5.35 

PIR:JH0616 neurocalcin (clone pCalN) - bovine le-109 
98 classification not yet clear-cut [S. cerevisiae, YDR373w) 3e-54 

30.03 organization of cytoplasm [S. cerevisiae, YKL190w] 2e-18 
03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YKL190w] 2e-18 

03.01 cell growth [S. cerevisiae, 

13.04 homeostasis of other ions 
04.05.01.04 transcriptional control 
30.04 organization of cytoskeleton 

08.19 cellular import [S. cerevisiae, YBR109c] 0.001 
03.22 cell cycle control and mitosis [S. cerevisiae, YBR109c] 0.001 



YKL190w] 2e-18 
[S. cerevisiae, 
[S. cerevisiae, 
[S. cerevisiae, 



YKL190w] 
YKL1 90w ) 
YBR109C] 



2e-18 
2e-18 
0.001 



03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR109c] 



[S. cerevisiae, YBR109c] 0.001 
cerevisiae, YBR109c] 0.001 



10.02.99 other morphogenetic activities 
30.05 organization of centrosome [S 
BL00018 

1.34.1.5.18 Recoverin [bovine (Bos taurus) 8e-55 
34.1.5.17 Recoverin [human (Homo sapiens) 5e-58 
34.1.5.16 Calcineurin regulatory subunit (B-chain le-06 
34.1.5.15 Myosin Regulatory Chain [chicken (Gallu 2e-29 
5.14 Myosin Regulatory Chain [bay scallo 5e-33 
5.13 Myosin Essential Chain [chicken (Gallu 4e-26 
5.12 Myosin Essential Chain [bay scallo 6e-27 
5.11 Calmodulin [Paramecium tetraurelia le-15 
5.10 Calmodulin [Drosophila melanogaster 2e-16 



dl rec_ 
dl jsa_ 
dltcob 
d2mysc 
dlscmc 
d2mysb 
dlscmb 
d lclm_ 
d4cln_ 
dlcfc_ 
dlahr_ 
d3cln_ 
dltrcb 
dlcll_ 
dlrtpl 
d5tnc_ 
dlpvaa 
dltnp ' 



1. 
1. 
1 . 
1. 
1 . 

1 . 34 . 1 . 

1.34.1. 
1.34.1. 



.34. 
. 34 . 



1.34.1.5.9 Calmodulin [African frog (Xenopus laevis) 
1.34.1.5.8 Calmodulin [chicken gallus gallus 4e-16 
34.1.5.7 Calmodulin [rat (Rattus rattus) 2e-16 



2e-16 



.34.1.5.6 Calmodulin (bovine (Bos taurus) 8e-08 
.34.1.5.5 Calmodulin [human (Homo sapiens) 2e-16 
.34.1.4.5 Parvalbumin [rat (Rattus rattus) 8e-06 
.34.1.5.2 Troponin C [turkey (Meleagris gallopavo) 3e-13 
.34.1.4.3 Parvalbumin [pike (Esox lucius) 6e-06 
.34.1.5.1 Troponin C [chicken (Gallus gallus) 9e-ll 
2.7.1.107 Diacylglycerol kinase 2e-08 
blocked amino end le-100 
phosphotransferase 2e-08 
duplication 4e-17 
tandem repeat 7e-06 
heterodimer 4e-17 
heart 6e-09 
zinc 2e-08 

serine/threonine-specif ic protein kinase le-06 
muscle contraction le-08 
acetylated amino end 4e-09 
ATP 2e-08 

skeletal muscle 6e-09 
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r n i d fT.7 1 
I rlKJ\IW J 


signal transduction le-91 




n If "I a O & H P 

piUUciii Miidoc £ c uo 


f D T 1? If TXt 1 


calcium binding Is - 100 


rpTP V TAT 1 


alternative splicing 2e — 13 


f PTRFfW 1 


Illtr Lily -La L CU dllLLllU dL 1 U J. c U zj 


T D T D If TaI 1 


thin filaments le — 08 


r DTPifM i 

I. if ±t\l\l/v J 


lipoprotein 1 e — 101 


T D T D VTaI 1 


cardiac muscle 6e-09 


T PTRVM 1 
L C Lt\t\rt J 


mu scle 6 e — 09 


f PTRTfM 1 


myr istylation 1 e — 100 




nr nana le iui 


r D T O If r*i 1 
L J 


retina 2e-51 


I bUFr AN J 


calcium-dependent protein kinase 2e-08 


L bUrr AM J 


unassigned calmodulin-related proteins 8e-41 


L SUPr AM J 


spec- related protein LpSl 7e-06 


|_ b U tr C AM J 


calmodulin repeat homology le — 101 


[_ bUrt AIM J 


human diacyl glycerol kinase 2e— 08 


[jure rti i j 


Lj_LLJl_C_Lll I\ll[a JC Lj Z. _L 1 L l_. LJ-LL1L1-L11U I CUCa L IlLJinOLOLjy £ fcT UO 


[SUPFAM] 


protein kinase homology 2e-08 


[SUPFAM] 


calmodulin le-101 


[PROSITE] 


EF HAND 3 


[PROSITE] 


CK2_PH0SPH0_SITE 7 


[PROSITE] 


PKC PHOSPHO SITE 3 


[PFAM] 


EF hand 


[KW] 


All Alpha 


[KW] 


3D 



SEQ MGKQNSKLRPEVMQDLLESTDFTEHEIQEWYKGFLRDCPSGHLSMEEFKKIYGNFFPYGD 

lrec- HHHHHHHHHTTTTCCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTC 

SEQ ASKFAEHVFRTFDANGDGTIDFREFIIALSVTSRGKLEQKLKWAFSMYDLDGNGYISKAE 

lrec- HHHHHHHHHHHH CEEEHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHH 

SEQ MLVI VQAI YKMVSS VMKMPEDESTPEKRTEKI FRQXDTNRDGKLSLEEFIRGAKSDPSIV 

lrec- HHHHHHHHHHCCTTGGGCTTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHH 

SEQ RLLQCDPSSAGQF 

lrec- HHHCCCH 



Prosite for DKFZphfbr2_23b21 . 1 



PS00005 


92->95 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


149->152 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


158->161 


PKC_PHOSPHO~ 


"site 


PDOC00005 


PS00006 


23->27 


CK2 PHOSPHO" 


"site 


pnoc00006 


PS00006 


44->48 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


106->110 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


117->121 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


143->147 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


158->162 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


165->169 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00018 


73->86 


EF HAND 




PDOC00018 


PS00018 


109->122 


EF HAND 




PDOC00018 


PS00018 


157->170 


EF HAND 




PDOC00018 



Pfam for DKFZphf br2_23b2 1 . 1 



HMM_NAME 

HMM 

Query 



EF hand 



68 



*MFrmMDkDGDGyIDFEEFmeMMkem* 
+FR +D +GDG+IDF EF+ +++ 
VFRT FDANGDGTI DFREFI IALSVT 



92 



30.75 100 128 1 29 dkf zphfbr2_23b21 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus: 
Query *EIqEMFrmMDkDGDGyIDFEEFmeMMkem* 
++++F+M+D DG+GYI++ E++++++++ 

dkfzphfbr2 100 KLKWAFSMYDLDGNGYISKAEMLVIVQAI 128 

Query 176 1 29 dkf zphfbr2_23b21 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus : 
HMM *EIqEMFrmMDkDGDGyIDFEEFmeMMkem* 

+++FR MD+++DG+++ EEF++ K+ 
Query 148 RTEKIFRQMDTNRDGKLSLEEFIRGAKSD 176 
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DKFZphfbr2_23f2 



group: brain derived 



DKFZphfbr2_23f 2 encodes a novel 182 amino acid protein with weak similarity to S. pombe 
Vps29p. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to Vps29p 

complete cDNft, complete cds, EST hits 

S.cerevisiae and S. pombe Vps29p are involved in vacuolar protein 
sorting 

part of the cDNA is encoded by HSAC2350, splice pattern 4 exons 
Sequenced by AGOWA 
Locus: /map="12q24" 
insert length: 1016 bp 

Poly A stretch at pos . 996, polyadenylation signal at pos. 974 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



GAATGGGGAG 
CGGAGCCCGG 
CCACACCGGT 
AGGAAAAATT 
GTTATGACTA 
GACTTCGATG 
ACAGTTCAAA 
ATATGGCCAG 
ATCTCGGGAC 
CTACATTAAT 
ACATTATTCC 
ACCTATGTGT 
ATACAAAAAA 
TCATTGTCCT 
GTATCACTTT 
TAATACATAA 
TTACAGTATA 
ACTTGTTAAG 
GAACTTTATT 
GTATACATTT 
AAAAAAAAAA 



GAGCCAGAGG 
TGACACGATG 
GCAACAGTTT 
CAGCACATTC 
CCTCAAGACT 
AGAATCTGAA 
ATTGGTCTGA 
CTTAGCCCTG 
ACACACACAA 
CCAGGTTCTG 
ATCATTTGTG 
ATCAGCTAAT 
CCTTAAAGCC 
GTTGAAATCA 
TATAATATTT 
TTGCTCCAAG 
TGGATTCTAT 
AAAAATTTAT 
CCAAAAGTAG 
TTCTCTTCTC 
AAAAAA 



AAGAGGGCGG 
TTGGTGTTGG 
GCCAGCTAAA 
TCTGCACAGG 
CTGGCTGGTG 
TTATCCAGAA 
TCCATGGACA 
TTGCAGAGGC 
ATCTGAAGCA 
CCACTGGGGC 
TTGATGGATA 
TGGAGATGAT 
AGGCCTGTCT 
AGTAATTAAA 
TGCAGTAAAA 
CTTCCTGTAA 
GAAAAAATGT 
CCTTGTAAGT 
TGCATGTGGA 
CAGTAATAAA 



CGACGGTGGT 
TATTAGGAGA 
TTCAAAAAAC 
AAACCTTTGC 
ATGTTCATAT 
CAGAAAGTTG 
TCAAGTTATT 
AATTTGATGT 
TTTGAGCATG 
ATATAATGCC 
TCCAGGCTTC 
GTGAAAGTAG 
TGATGATTTT 
CATTTAAGAG 
TATAATACCA 
ACTATAAGAA 
CCACAACACA 
ATCTTCAAAG 
GAAAGAATCT 
CAATTACCTT 



GGTGACTGAG 
TCTGCACATC 
TCCTGGTGCC 
ACCAAAGAGA 
TGTGAGAGGA 
TGACTGTTGG 
CCATGGGGAG 
GGACATTCTT 
AAAATAAATT 
TTGGAAACAA 
TACAGTGGTC 
AACGAATCGA 
TGGTTTTTTT 
CCACAAAATT 
TCTTCTCTGT 
TATATTTAGT 
GTAATTGGTC 
TTGATATTTG 
AGACTTTCTT 
TCATTGAAAA 



BLAST Results 



Entry HSAC2350 from database EMBLNEW: 

Homo sapiens 12q24 PAC P424M6 Length = 167,217 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 68 bp to 613 bp; peptide length: 182 
Category: similarity to known protein 
Prosite motifs: RGD (60-63) 



1 MLVLVLGDLH IPHRCNSLPA KFKKLLVPGK IQHILCTGNL CTKESYDYLK 
51 TLAGDVHIVR GDFDENLNYP EQKVVTVGQF KIGLIHGHQV IPWGDMASLA 
101 LLQRQFDVDI LI SGHTHKSE AFEHENKFYI NPGSATGAYN ALETNIIPSF 
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151 VLMDIQASTV VTYVYQLIGD DVKVERIEYK KP 

BLASTP hits 

Entry CEZK1128_6 from database TREMBL: 
"ZK1128.1"; Caenorhabditis elegans cosmid ZK1128 
Length = 523 

Score = 400 (140.8 bits). Expect = 2.3e-37, P = 2.3e-37 
Identities = 81/150 (54%), Positives = 106/150 (70%) 

Entry S46793 from database PIR: 

hypothetical protein YHR012c - yeast (Saccharomyces cerevisiae) 
Length = 282 

Score = 180 (63.4 bits), Expect = 3.7e-37, Sum P(3) = 3.7e-37 
Identities = 35/71 (49%), Positives = 44/71 (61%) 

Entry AB011824_1 from database TREMBL : 
"Vps29"; Schizosaccharomyces pombe mRNA for Vps29, 
partial cds. Schizosaccharomyces pombe (fission yeast! 
Length =17 6 

Score = 189 (66.5 bits), Expect = 2.7e-27, Sum P(2) = 2.7e-27 
Identities = 33/72 (45%), Positives = 50/72 (69%) 



Alert BLASTP hits for DKFZphfbr2_23f2, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23f 2, frame 2 



Report for DKFZphf br2_23f 2 . 2 



[LENGTH] 182 

[MW] 20445.84 

[pi] 6.29 

[HOMOL] TREMBL : CEZK1 1 2 8_6 gene: "ZK1128.8"; Caenorhabditis elegans cosmid ZK1128 2e-51 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YHR012w] 
le-27 

[FUNCAT] 08.13 vacuolar transport [S. cerevisiae, YHR012w] le-27 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YHR012w] 

le-27 

[FUNCAT] 30.08 organization of golgi [S. cerevisiae, YHR012w] le-27 

[FUNCAT] 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YHR012w] le-27 

[ FUNCAT] r general function prediction [M. jannaschii, MJ0623] le-16 

( BLOCKS] BL01269D 

(BLOCKS] BL01269A 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL A 

[PROSITE] PKC_PHOSPHO_SITE 1 

[KW] Alpha_Beta 



SEQ MLVLVLGDLHIPHRCNSLPAKFKKLLVPGKIQHILCTGNLCTKESYDYLKTLAGDVHIVR 

PRD ccceeecccccccccccchhhhhhhhhhcceeeeeecccccchhhhhhhhhhhhceeeee 

SEQ GDFDENLNYPEQKVVTVGQFKIGLIHGHQVIPWGDMASLALLQRQFDVDILISGHTHKSE 

PRD cccccccccccceeeeeccceeeeecccccccccchhhhhhhhhhhcceeeeeccccccc 

SEQ AFEHENKFYINPGSATGAYNALETNIIPSFVLMDIQASTVVTYVYQLIGDDVKVERIEYK 

PRD ccccccccccccccccccccccccccccceeeeeccccceeeeeeeecccceeeeeeeec 

SEQ KP 

PRD CC 



Prosite for DKFZphfbr2_23f 2 .2 



PS00005 
PS00008 
PS00008 
PS00008 
PS00008 
PS00016 



116->119 
38->44 
83->89 
133->139 
137->143 
60->63 



PKC_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

RGD 



PDOC00005 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00016 



(No Pfam data available for DKFZphfbr2_23f 2 .2) 
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DKFZphfbr2_23124 



group: intracellular transport and trafficking 

DKFZphfbr2_23124 . 2 encodes a novel 348 amino acid protein with similarity to human 
glycoprotein gp36b and canine VIP36 glycoprotein. 

The vesicular protein VIP36 (36 kDa vesicular integral membrane protein) shows homology to 
leguminous plant lectins. The protein is localized to the Golgi apparatus, endosomal and 
vesicular structures and the plasma membrane. VIP36 binds to sugar residues of 
glycosphingolipids and/or glycosylphosphatidyl-inositol anchors and might provide a link 
between the extracellular/luminal face of glycolipid rafts and the cytoplasmic protein 
segregation machinery. Gp36 is located within the endoplasmatic reticulum. For the novel 
protein, a lectin character is predicted. Due to the intracellular localisation of the homolog 
proteins, it should be involved in intracellular transport and trafficking. 

The new protein can find application in modulating/blocking intracellular transport and 
trafficking. 



strong similarity to human GP36b glycoprotein 
complete cDNA, complete cds, EST hits 

potential start at Bp 29 matches kozak consensua ANNatgG 
similarity to lectins, 

Sequenced by AGOWA 

Locus: /map="2" 

Insert length: 2416 bp 

Poly A stretch at pos . 2394, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 



GGGGGATGAA 
TTGGGTCGTG 
AGGATGTTAC 
AGTCGGGGCG 
CGAAGCCCTA 
ATGGGCAATG 
GCAAAGTAAA 
ACTGGGAGTT 
CTGCATGGGG 
AGGGCCTGTG 
TAGACACCTA 
ATCTCAGCCA 
TGGGCGGCCT 
ATTACGACAC 
ATGATGGATA 
CGGAGTCCGC 
GGGATCTCTC 
ACAGTGGAGA 
GCCCTCAGTG 
CCCTGAGTGG 
TCTGTATTTG 
ACAGAGCCGA 
GACTGTCACC 
GCCTGGAGAG 
CACTGGAGTT 
GACATCTAAC 
TGATGTGCCT 
GAATTTACGT 
AGGCTGCCGT 
CCACAAAGAA 
CGTTGGGTTT 
TTGGAAACCA 
ATTCATTGTC 
CCTCTCTGTT 
TTCATTAGGT 
CTAGCATGGG 
TCTCTTCAGG 
GAGAAGCCTG 
CAGCTCCAGG 
CATGTGGAGA 
GTAGTTACGA 
CTTTCTTCTT 
TTTTTTTTCT 
TGAGAGCAAC 
GAGCAGCACA 



GGGTCGTTGG 
GCAGCAGTGG 
TCCTTCTTCT 
GGTCAAACGT 
CCAGGGTGTG 
CCATGGTGAT 
CAGGGTGCCT 
GCAGGTGCAC 
ATGGCTTGGC 
TTTGGAAACA 
CCCCAATGAG 
TGGTGAACAA 
ACAGAGCTGG 
CTTCCTGGTG 
TTGATGGCAA 
CTGCCCCGCG 
AGATAATCAT 
GAACCCCAGA 
GACAATATGA 
CCTGGCCCTC 
CCATAGTCAT 
AAGCGCTTCT 
CATGAGGTAT 
TGTTCTTGTC 
TTGAATGCAG 
TCTGGTCTGG 
TTCCCTGCAG 
GGTTGTGATG 
GTTGTTTGAC 
TTAAAAACTG 
TGCATTTGAC 
GGATGGAAAC 
CTCTCTGTGT 
GGGGCCTGGG 
GGCCCTAGGG 
TCTTGGGTCT 
CCCTCAGTGA 
GAAGACACCA 
TTTGATCAAA 
TGTTTCTGGA 
TTTTTGGAAT 
ACACCTTGGG 
TAATGGACAA 
AGACCCTCAT 
GCCTGAGTGC 



TGGGAAAGAT 
CGGCGATGTT 
TTTGTTGGGG 
TCGAGTACTT 
GGCACAGGCA 
GACCCAGTAT 
TGTGGAACCG 
TTCAAAATCC 
AATCTGGTAC 
TGGACAAATT 
GAGAAGCAGC 
CGGCTCCCTC 
GAGGCTGCAC 
ATTCGCTACG 
GCATGAGTGG 
GCTACTACTT 
GATGTCATTT 
AGAGGAAAAG 
AGCTGCCTGA 
TTCCTCATCG 
TGGTATCATA 
ACTGAGCCCT 
GGAAGGAGCG 
TCTAGCAGCT 
GGACCCCGCA 
GAAGCCACCC 
TCCTTCCATG 
CCAAAATCAC 
TCAGAAGGCC 
GTAACACCAC 
CCAACCCTCT 
TTCTTCCCTG 
GCAACCTGAG 
GCTGCAGAAC 
AGATGGCTTT 
ATTGGCATGT 
AGTTTGGCTA 
TGGATGCCAT 
CCAAAAGCAA 
CTTGCTAGAG 
CCCTCTTTGA 
CTTGGATATT 
GGGACAGTTG 
CATCTGTGCC 
TGGCCTCTGT 



GGCGGCGACT 
TGTCGGCTCG 
TCTGGGCAGG 
GAAACGGGAG 
GTTCCTCACT 
ATCCGCCTTA 
GGTGCCATGT 
ATGGACAAGG 
ACAAAGGATC 
TGTGGGGCTG 
AAGAGCGGGT 
AGCTATGATC 
AGCCATTGTC 
TCAAGAGGCA 
AGGGACTGCA 
CGGCACCTCC 
CCTTGAAGTT 
CTCCATCGAG 
GATGACAGCT 
TCTTTTTCTC 
CTCTACAACA 
CCTGCTGCCA 
GGCACTGGCC 
GGTTGGGGAC 
TTCCCATGGT 
ACCCCAGGGC 
TGGGAGCAGA 
GGAACAGAAT 
CTTCTACTTC 
AGGCTTTCTG 
GCCTACCTGA 
CCTTACCTTC 
CTGGGAAAGG 
ACACCTGCGT 
CTGCTTTGGA 
CCATGGCCTT 
AAGGTTGGTG 
GGATTAGCTG 
CATTTGTCAT 
CCTGCTTAGC 
GTGCTGAAAG 
GCCCAGAGAA 
CTGTTCTCAT 
TGGAAGAGTT 
CAACCCTTAT 



CTGGGACCCC 
GGATGGGTCC 
GGCCACAGCA 
CACTCGCTGT 
GTGGAATCTG 
CCCCAGATAT 
TTCCTGAGAG 
AAAGAAGAAT 
GGATGCAGCC 
GGAGTATTTG 
ATTCCCCTAC 
ATGAGCGGGA 
CGCAATCTTC 
TTTGACGATA 
TTGAAGTGCC 
TCCATCACTG 
GTTTGAACTG 
ATGTGTTCTT 
CCACTGCCGC 
CCTGGTGTTT 
AATGGCAGGA 
CCACTTTTGT 
TGAGCATGCA 
TATATTCTGT 
TGTGCATGGG 
AATGCTGCTG 
GGTGTGAAGA 
TTCATAGCCC 
AGTTTTGAAT 
ACCATCCATT 
GGAGCTTTCT 
CTTTCACTCC 
CATTTGGATG 
TTCGCTGGCC 
TCACTGTTCC 
CCCAATCAAG 
TAAAAATCAA 
TGCAACTGAC 
GTGGTCTGAC 
TGCATGTTTT 
TGTAAGGAAG 
GAAATTTGGC 
GTTCCAAGTC 
CACTGTCATT 
TCCACTGCCT 
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2251 TATTTGACAA GGGGTTACAT GCTGCTCACC TTACTGCCCT GGGATTAAAT 
2301 CAGTTACAGG CCAGAGTCTC CTTGGAGGGC CTGGAACTCT GAGTCCTCCT 
2351 ATGAACCTCT GTAGCCTAAA TGAAATTCTT AAAATCACCG ATGGAACCAA 
2401 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS622145 from database EMBL: 
human STS WI-6746. 
Score = 1079, p = 5.1e-43, identities = 219/223 

Entry G42541 from database EMBLNEW: 

SHGC-58649 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1091, P = 1.7e-43, identities = 219/220 



Medline entries 



94265253: 

A putative novel class of animal lectins in the secretory pathway 
homologous to leguminous 
lectins . 

94208543: 

VIP36, a novel component of glycolipid rafts and exocytic carrier 
vesicles in epithelial cells. 



Peptide information for frame 2 



ORF from 29 bp to 1072 bp; peptide length: 348 
Category: strong similarity to known protein 



1 MAATLGPLGS WQQWRRCLSA RDGSRMLLLL LLLGSGQGPQ QVGAGQTFEY 

51 LKREHSLSKP YQGVGTGSSS LWNLMGNAMV MTQYIRLTPD MQSKQGALWN 

101 RVPCFLRDWE LQVHFKIHGQ GKKNLHGDGL AIWYTKDRMQ PGPVFGNMDK 

151 FVGLGVFVDT YPNEEKQQER VFPYISAMVN NGSLSYDHER DGRPTELGGC 

201 TAIVRNLHYD TFLVIRYVKR HLTIMMDIDG KHEWRDCIEV PGVRLPRGYY 

251 FGTSSITGDL SDNHDVISLK LFELTVERTP EEEKLHRDVF LPSVDNMKLP 

301 EMTAPLPPLS GLALFLIVFF SLVFSVFAIV IGIILYKKWQ EQSRKRFY 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23124, frame 2 

PIR:G01447 GP36b glycoprotein - human, N = 1, Score = 1001, P = 
5.9e-101 

SWISSPROT:VP3 6_CANFA VESICULAR INTEGRAL -MEMBRANE PROTEIN VIP36 
PRECURSOR (VIP36)., N = 1, Score = 990, P = 8.6e-100 

TREMBL : CET04G9_2 gene: "T04G9.3"; Caenorhabditis elegans cosmid 
T04G9., N = 1 , Score = 614, P = 6e-60 

PIR:S42626 ER-golgi intermediate compartment protein - human, N = 2, 
Score = 397, P = le-42 



>PIR:G01447 GP36b glycoprotein - human 
Length = 356 

HSPS : 

Score = 1001 (150.2 bits), Expect = 5.9e-101, P = 5.9e-101 
Identities = 197/356 (55%), Positives = 256/356 (71%) 

Query: 1 MAATLGPLGSWQQWRRCLSARDG SRMLLLLLLLGSGQGPQQVGAGQTFEYLK 52 

MAA G + W RRCL R G + L LLLLLGS + G + E+LK 

Sbjct: 1 MAAE-GWIWRWGWGRRCLG-RPGLLGPGPGPTTPLFLLLLLGSVTA— DITDGNS-EHLK 55 
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Query: 53 REHSLSKPYQGVGTGSSSLWNLMGNAMVMTQYIRLTPDMQSKQGALWNRVPCFLRDWELQ 112 

REHSL KPYQGVG+ S LW+ G+ M+ +QY+RLTPD +SK+G++WN PCFL+DWE+ 
Sbjct: 56 REHSLIKPYQGVGSSSMPLWDFQGSTMLTSQYVRLTPDERSKEGSIWNHQPCFLKDWEMH 115 

Query: 113 VHFKIHGQGKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVF 172 

VHFK+HG GKKNLHGDG+A+WYT +DR+ PGPVFG+ D F GL +F+DTYPN+E ERVF 
Sbjct: 116 VHFKVHGTGKKNLHGDGI ALWYTRDRLVPGPVFGSKDNFHGLAI FLDTYPNDETT-ERVF 174 

Query: 173 PYISAMVNNGSLS YDHERDGRPTELGGCTAI VRNLHYDTFLVI RYVKRHLT I MMDIDGKH 232 

PYIS MVNNGSLSYDH +DGR TEL GCTA RN +DTFL +RY + LT+M D++ K+ 
Sbjct: 175 PYISVMVNNGSLS YDHSKDGRWTELAGCTADFRNRDHDTFLAVRYSRGRLTVMTDLEDKN 234 

Query: 233 EWRDCIEVPGVRLPRGYYFGTSSITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLP 292 

EW++CI++ GVRLP GYYFG S+ TGDLSDNHD+IS+KLF+L VE TP+EE + P 
Sbjct: 235 EWKNCIDITGVRLPTGYYFGASAGTGDLSDNHDIISMKLFQLMVEHTPDEESIDWTKIEP 294 

Query: 293 SVDNMKLPEMTAPLP PLSGLALFLI VFFSLVFSVFAI VIGI I LYNKWQEQSRK 345 

SV+ +K P+ P PL+G +FL++ +L+ V V+G +++ K QE++ K 

Sbjct: 295 SVNFLKSPKDNVDDPTGNFRSGPLTGWRVFLLLLCALLGIVVCAVVGAVVFQKRQERN-K 353 

Query: 346 RFY 348 
RFY 

Sbjct: 354 RFY 356 

Pedant information for DKFZphfbr2_23124, frame 2 



Report for DKFZphfbr2_23124 .2 

[LENGTH] 348 

[MW] 39711.10 

[pi] 8.55 

[HOMOL] PIR:G01447 GP36b glycoprotein - human le-101 

[PIRKW] lectin 2e-37 

[PIRKW] transmembrane protein 2e-37 

[PIRKW] endoplasmic reticulum 2e-37 

[PIRKW] Golgi apparatus 2e-37 

[PROSITE] AMIDATION 1 

[PROSITE] MYRISTYL 5 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] Alpha_Beta 

[KW] SIGNAL_PEPTIDE 39 

[KW] LOW_COMPLEXITY 7.76 % 

SEQ MAATLGPLGSWQQWRRCLSARDGSRMLLLLLLLGSGQGPQQVGAGQTFEYLKREHSLSKP 
SEG xxxxxxx 



PRD ccccccccccccccccccccccchhhhhhhhhhhcccccccccccchhhhhhhhhhhccc 

SEQ YQGVGTGS S SLWNLMGNAMVMTQ Y IRLTP DMQS KQGALWNRVPCFLRDWELQVH FK I HGQ 

SEG 

PRD cccccccccceeecccccccccceeeeccchhhhhcccccccccchhhhhhhheeeeecc 

SEQ GKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVFPYISAMVN 

SEG 

PRD ccccccccceeeeeecccccccccccccccccceeeeeecccccccccccccceeeeeec 

SEQ NGSLS YDHERDGRPTELGGCTAI VRNLHYDT FLVIRYVKRHLTIMMDIDGKHEWRDC I EV 

SEG 

PRD ccccccccccccccccccccccccccccccceeeehhhhhhheeeeeccccccccccccc 

SEQ PGVRLPRGYYFGTSSITGDIiSDNHDVI SLKLFELTVERTPEEEKLHRDVFLPSVDNMKLP 

SEG 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhccccccccccccccccccccccc 

SEQ EMTAPLP PLSGLALFLI VFFSLVFSVFAI VIGI ILYNKWQEQSRKRFY 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphf br2_23124 . 2 

PS00001 181->185 ASN_GLYCOS YLATION PDOC00001 

PS00002 35->39 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 19->22 PKC PHOSPHO SITE PDOC00005 
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pennons 


£ O O ^ 1 1 


rHUorHU 




p nop n n n fiR 


pqnnoos 


3->^d £ 


trl\^ rflUDrilU 


SITE 




PS00006 


19->23 


CK2 PHOSPHO 


"site 


PDOC00D06 


PS00006 


279->283 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


43->49 


MYRISTYL 




PDOC00008 


PS00008 


63->69 


MYRISTYL 




PDOC00008 


PS00008 


65->71 


MYRISTYL 




PDOC00008 


PS00008 


96->102 


MYRISTYL 




PDOC00008 


PS00008 


198->204 


MYRISTYL 




PDOC00008 


PS00009 


120->124 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphf br2_23124 .2) 
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DKFZphfbr2_23nl6 



group: signal transduction 

DKFZphfbr2_2 3nl6. 1 encodes a novel 292 amino acid protein with weak similarity to putative 
phosphatidylinositol-4-phosphate 5-kinase of Arabidopsis thaliana. 

The novel proteins contains a WW domain which has been originally described as a short 
conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
repeated up to 4 times in some proteins. It has been shown to bind proteins with particular 
proline-motif s, [AP] -P-P- [AP] -Y, and thus resembles somewhat SH3 domains. This domain is 
frequently associated with other domains typical for proteins in signal transduction 
processes. Examples of proteins containing the WW domain are Dystrophin, Utrophin, vertebrate 
YAP protein (binds the SH3 domain of the Yes oncoprotein) , murine NEDD-4 (embryonic 
development and differentiation of the central nervous system) , IQGAP (human GTPase activating 
protein acting on ras) . Therefore the new protein should be involved in intracellular signal 
transduction . 

The new protein can find application in modulating/blocking intracellular signal transduction 
pathways . 



similarity to putative phosphatidylinosi tol-4-phosphate 5-kinase 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2936 bp 

Poly A stretch at pos. 2916, polyadenylation signal at pos . 2B73 



1 GGGGGCGCTC CCGAGAAAGA GTGAGGGCGC GACGCGCACC AACGGTGGAG 
51 GGATGTTTCA GCAGCCCCTG AGAAGGAAGA GGAGGAAGCT GAGGGCCCGC 
101 TGAGGGCGCA GGACCTGAGG GAGTCCTACA TCCAGCTCGT CCAGGGTGTG 
151 CAGGAGTGGC AGGATGGTTG CATGTACCAG GGGGAGTTTG GGTTGAACAT 
2 01 GAAGCTTGGA TATGGCAAAT TCTCTTGGCC CACAGGCGAG TCATACCATG 
251 GGCAGTTTTA CCGGGACCAC TGCCATGGCC TGGGTACCTA CATGTGGCCA 
301 GATGGCTCCA GTTTCACGGG CACATTTTAC CTCAGCCACC GAGAAGGCTA 
351 CGGCACCATG TACATGAAGA CACGGCTTTT CCAGACTCAC TGCCACAACG 
401 ACATTGTCAA CCTTCTCCTG GACTGTGGGG CCGACGTGAA CAAGTGCTCA 

4 51 GATGAGGGTC TCACGGCACT CAGCATGTGT TTCCTCCTCC ACTACCCCGC 
501 CCAGTCCTTC AAGCCCAATG TTGCTGAACG GACCATACCT GAGCCCCAGG 

5 51 AACCTCCAAA ATTCCCAGTT GTTCCAATCC TTTCATCATC ATTTATGGAC 
601 ACAAACCTGG AGTCTCTGTA CTATGAGGTG AACGTGCCTT CCCAGGGTAG 
651 CTATGAGCTG AGGCCACCGC CAGCACCACT GCTCCTGCCA CGCGTCTCAG 
701 GCAGCCACGA GGGCGGCCAC TTCCAGGACA CCGGGCAGTG TGGGGGGTCC 
7 51 ATAGACCACA GGAGCAGCTC TCTGAAGGGG GACTCCCCGT TGGTGAAGGG 
801 CAGCCTTGGC CATGTGGAAA GCGGGCTTGA GGACGTGTTG GGAGACACAG 
851 ACCGGGGCAG TCTGTGCAGT GCTGAGACGA AATTTGAGTC CAACTTGTGT 
901 GTGTGCGACT TCTCCATCGA GCTCTCGCAG GCCATGCTGG AGAGAAGCGC 
951 CCAGTCCCAC AGCTTGCTGA AGATGGCCTC GCCCTCACCG TGCACCAGCA 

1001 GCTTCGACAA AGGGACCATG CGGAGGATGG CGCTGTCCAT GATCGAGTAG 
1051 GTCCTGGCAC CAGCTGGTGG GGGTGGAGGG CCACCATCAG GGCTGAATCC 
1101 TATGCTCAGC AGACCCACGT CTCTTCCCTG TGCCAGTGGG AGGCGTTGTG 
1151 TCTGGAGATG TGTGTCTGAA TGTGTGAGCA TCCCTGTGTC GGTGGCTCCA 
1201 TGCCATGGCC AGCCCTGTGG GGGTGCCACG GTGACGGGCT GTTTTCAGTG 
1251 CCACCCCAGC CCTGTGGGGG TGCCACGGTG ACGGGCTGTT TTCAGTACCA 
1301 CGCCAGCCCT GCTTTGGCCT TTGGCACTGG CCTGAAGTGT CTCTGTGGGA 
1351 GCCTCAGCAG GGGCCACTGT CAGGGGTCCT ATCCTAGCCA TAGTGCACGT 
1401 GAGTGACACC TGCCTGGGCA GCTCTCACAC CCCTGCTGTC CACCCTGTCT 

14 51 ATACCAGTGT GTCTCAAAAT GTGGTCTATG CACCCCCGGG GGTCCAAGAC 
1501 CCTTTCAGGG AGTCTGTGGG GTCAAAATGA TTCTCTTGAT AACCCTGAGA 

15 51 CTCTGTTAGC CTTCTCCTTG TGTTGATGTT GGTGGATGGT ATGAAGACAG 
1601 GGCCGTGCAG ACCACCAGCC CCCAGCGTGC AGGGCAGCAG TGCCCGGCCT 
1651 GCTTGGGGGC ATGGTATTCC TTCACCACGG TGTGCACTTG CGGGGATGCC 
1701 TGTCTCACTG AAGAATGCCT TTGACTAAGC AGAAAAGCAA TGACAAATTG 
1751 CATTAAATCT TGCTCCTTGC GTACACACCC CTCGAATATT CTGGGTCGGA 
1801 AAACATGGGA AGGACACTGA TGTGTGTCTG CCACAGACCA AGGCACACCG 
1851 CTTCCCCGCA AGAAGCGCTT CCCCCAGGGC CAGAGTAGCA ACAGAATGCG 
1901 GCATCTTCCC AACCTCCTGC CCCATTTTTG ATTGGAAGAA TGACCACTGG 
1951 TATGTGGCTG TTCATTCTCC TGAACACAGC CTGCCACTTT AAGGAAAACA 
2001 TATGACACTA TTTGTTGCTG GCGAAATTTA CATTTTCAAG TGAATAGCAG 
2051 AATTCTGGAC ACTTGCCACC ACCACCAAAA CCTTCATAGC TTCCCTTAAC 
2101 TTTGAGACAT GGGTGTTCAG AGGTTTTTCA CGTGAGATGG CGTTAGCAGC 
2151 GCAGTTTTGT GATACTGCCT GAAGACATGC CGACAGTGCC CAGATCTCTT 
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22 01 CTATTGGTGA GCCAGCTTTT CCCACACGGC CAAGTTCTGA TGTTGAACCA 
2251 TTGCCAGGTG GGTGAAGATC CATTGACAGT GAGAGGTGGG CCCGTGGGCT 
2 301 TCAGTGCAGC CAGGCGCAGA AGGCTGGTTC ATGAGTGTCC AGCTCCGCCA 
2351 GGTAGCTAGC TCACCACCCC CAGCCTGGGT TCATGTAGTT CAAATAGGAA 
2 4 01 GACCACGATG ATCAGAAAGG CTGCTCAAAT ACTCCTTCGT CCAGCCGCGT 
2451 ACCTGGGGGA GGCTGAATCT CCACTCACTT CCACCAAGGC TGTGCAGAGC 
2 501 AGATAGGGGA ATCCAGCAAA GGTGGAAAAC AGTGCCATCC TTCTCCCCAA 
2551 CTGGTTTTGT TTTGTAAAAT AACTTTTTGT GACAGTGTTA CTTATTAGTA 
2601 ACATGCAGTG GGTTTGTTAT GGTTAACAAG TTGGTGAGCA TTATTGAGAG 
2651 GTGAAGCCAG CTGAGCTTCT GGGTTGGGTG GGGACTTGGA GAACTTTTGT 
2701 GTCTAGCTAA AGGATTGTAA ATGCACCAAT CAATGCTCAG TGTCTAGCTA 
2 7 51 AAGGATTGTA AATGCACCAA TCAGCACTCT GTAAAATTGA CCAATCAGCG 
2801 TTCTGTAAAA TGGACCAATC AGTGGTCTGT AAAATGGACC AGTCAGCAGG 
2851 ATGTGGGCGG GGCCAAAAAA GGGAATAAAA GCTGGCCACC GCCAGGCTCC 
2901 CCACCAGCCT GCAGCGAAAA AAAAAAAAAA AAAAAA 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 172 bp to 1047 bp; peptide length: 292 
Category: similarity to unknown protein 
Prosite motifs: WW DOMAIN 1 (19-24 J 



1 MYQGEFGLNM KLGYGKFSWP TGESYHGQFY RDHCHGLGTY MWPDGSSFTG 

51 TFYLSHREGY GTMYMKTRLF QTHCHNDIVN LLLDCGADVN KCSDEGLTAL 

101 SMCFLLHYPA QSFKPNVAER TIPEPQEPPK FPVVPILSSS FMDTNLESLY 

151 YEVNVPSQGS YELRPPPAPL LLPRVSGSHE GGHFQDTGQC GGSIDHRSSS 

201 LKGDSPLVKG SLGHVESGLE DVLGDTDRGS LCSAETKFES NLCVCDFSIE 

251 LSQAMLERSA QSHSLLKMAS PSPCTSSFDK GTMRRMALSM IE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23nl6, frame 1 

TREMBL:AB005902_1 product: "AtPlP5Kl"; Arabidopsis thaliana mRNA for 

AtPIP5Kl, complete cds., N = 2, Score - 138, P = l.le-06 

TREMBL : AF01 9380_1 product: "putative phosphatidylinositol-4-phosphate 
5-kinase"; Arabidopsis thaliana putative 

phosphatidylinositol-4-phosphate 5-kinase mRNA, complete cds., N = 2, 
Score = 138, P - 1.4e-06 

PIR:T02098 probable phosphatidylinositol-4-phosphate 5-kinase - 
Arabidopsis thaliana, N = 2, Score = 135, P = 6.7e-06 

>TREMBL:AB005902_1 product: "AtPIP5Kl"; Arabidopsis thaliana mRNA for 
AtPIP5Kl, complete cds. 
Length = 683 

HSPs: 

Score = 138 (20.7 bits), Expect = l.le-06. Sum P(2) = l.le-06 
Identities = 23/61 (37%), Positives = 35/61 (57%) 

Query: 1 MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 60 

MY+G++ G GKFSWP+G +Y G+F G GT+ DG ++ GT+ + G+ 

Sbjct: 34 MYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGH 93 

Query: 61 G 61 
G 



Sbjct: 94 G 94 
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Score = 112 (16.8 bits), Expect = 9.7e-04, Sum P(2) = 9.7e-04 
Identities = 19/51 (37%), Positives = 27/51 (52%) 

Query: 12 LGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYGT 62 

+G GK+ W G YG + R GG+WP G+++ G F EG+GT 
Sbjct: 22 IGSGKYLWKDGCMYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGT 72 

Score = 97 (14.6 bits), Expect = 4.4e-02, Sum P(2) = 4.3e-02 
Identities = 19/60 (31%), Positives = 32/60 (53%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+GEF G+G F+ G++Y G + D HG G + +G + GT+ + ++G G 

Sbjct: 58 YEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRG 117 

Score = 93 (14.0 bits), Expect = 1.2e-01, Sum P(2) = l.le-01 
Identities = 18/62 (29%), Positives = 34/62 (54%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + + K G+G+ + G+ Y G + R+ G G Y+W +G+ +TG + + G G 

Sbjct: 81 YRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKG 140 

Query: 62 TM 63 

+ 

Sbjct: 141 LL 142 

Score = 91 (13.7 bits), Expect = 2.0e-01, Sum P(2) = 1.8e-01 
Identities = 18/51 (35%), Positives = 24/51 (47%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTF 52 

Y GE+ ++GG WPGYG+ GG+W DGSS G + 

Sbjct: 127 YTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNGVFTWSDGSSCVGAW 177 

Score = 90 (13.5 bits), Expect = 2.6e-01, Sum P(2) = 2.3e-01 
Identities = 17/60 (28%), Positives = 31/60 (51%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + N++ G G++ W G Y G++ G G +WP+G+ + G + +G G 

Sbjct: 104 YEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNG 163 

Score = 45 (6.8 bits), Expect = l.le-06. Sum P(2) = l.le-06 
Identities = 14/62 (22%), Positives = 26/62 (41%) 

Query: 215 VESGLEDVLGDTDRGSLCSAETKFESNLCVCDF — SIELSQAMLERSAQSHSLLKMASPS 272 

V+SG + G+ +C E+ E+ CD ++E S +R + + + 

Sbjct: 205 VDSGAGSLGGEKVFPRICIWESDGEAGDITCDIIDNVEASMIYRDRISVDRDGFRQFKKN 264 

Query: 273 PC 274 
PC 

Sbjct: 265 PC 266 



Pedant information for DKFZphf br2_23nl6, frame 1 



Report for DKFZphf br2_23nl6 . 1 



[LENGTH] 2 92 

[MM] 32214.44 

[pi] 5.51 

[HOMOL] TREMBL: AB005902_1 product: "AtPIP5Kl"; Arabidopsis thaliana mRNA for AtPIPSKl, 
complete cds . 7e-08 

[BLOCKS] BL01137A Hypothetical YBL055c/yjjV family proteins 

[PROSITE] WW_DOMAIN_l 1 

[PROSITE] MYRISTYL 5 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] PKC_PHOSPHO_SITE 5 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.11 % 



SEQ MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 

SEG 

PRD cccccccccccccccceeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ GTMYMKTRLFQTHCHNDI VNLLLDCGADVNKCSDEGLTALSMCFLLHYPAQSFKPNVAER 

SEG 

PRD cccchhhhhheeeccccchhhhhcccccccccccccchhhhhhhhhccccccccccceee 

SEQ TIPEPQEPPKFPVVPILSSSFMDTNLESLYYEVNVPSQGSYELRPPPAPLLLPRVSGSHE 
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SEG xxxxxxxxxxxx 

PRD eccccccccceeeeeeeccccccccccceeeeeecccccccccccccccccccccccccc 

SEQ GGHFQDTGQCGGSIDHRSSSLKGDSPLVKGSLGHVESGLEDVLGDTDRGSLCSAETKFES 

SEG 

PRD cccccccccccccccccccccccccceeecccccccccccccccccccccceeeeecccc 

SEQ NLCVCDFSIELSQAMLERSAQSHSLLKMASPSPCTSSFDKGTMRRMALSMIE 

SEG 

PRD cccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhccc 



Prosite for DKFZphfbr2_23nl6. 1 



PS00005 


55->58 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


112->115 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


200->203 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


226->229 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


282->285 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


55->59 


CK2_PHOSPHO" 


"site 


PDOC00006 


PS00006 


121->125 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


140->144 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


144->148 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


217->221 


CK2 PHOSPHO" 


"sits 


PDOC00006 


PS00006 


236->240 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


276->280 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


45->51 


MYRISTYL 




PDOC00008 


PS00008 


86->92 


MYRISTYL 




PDOC0O0O8 


PS00008 


177->183 


MYRISTYL 




PDOC00008 


PS00008 


188->194 


MYRISTYL 




PDOC00008 


PS00008 


229->235 


MYRISTYL 




PDOC00008 


PS01159 


19->44 


WW DOMAIN 1 




PDOC5C020 



(No Pfara data available for DKFZphfbr2_23nl6. 1 ) 
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DKFZphfbr2_23o24 



group: brain derived 

DKFZphfbr2_23o24 encodes a novel 139 amino acid protein with similarity to CAAX-box proteins. 

The CAAX box is a prenyl group binding site found in a number of eukaryotic proteins, such as 
which is found in Ras- and ras-like proteins such as Rho, Rab, Rac, Ral, and Rap, as well as 
in nuclear lamins A and B, some G protein alpha and gamma subunits and some dnaJ-like 
proteins. These proteins are posttranslationally modified at this site by the attachment of 
either a farnesyl or a geranyl-geranyl group to a cysteine residue. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to lectins 

complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 
Locus: unknown 



Insert length: 3564 bp 

Poly A stretch at pos . 3541, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 



GAATGGCTCC 
ATGGGCCTTC 
AACATGGCAG 
GAAGTGAGTG 
TTGTTTGTTT 
TTGCTGGGAC 
CAGGGTTACT 
TGTGCCAGGC 
TAACAGGTGA 
GTTTCCATCA 
CTCCAGCACT 
CATCTCGCCT 
ACCATCCTTC 
CTCCTATGTC 
TTGCAGATGA 
AGGAACCCCC 
ACGGGGGATA 
GAAACGACCT 
CCCTTCACTG 
GGGGAGCCTC 
CCGGGGCAGT 
GTGGGGGTCA 
GGGAAGCTGA 
ATTCATGCCA 
CACAAGAGCC 
CCCAAAAACT 
AAGCCTCTGT 
TCTCCAGGTG 
TGGTCTTTCC 
TCCTCAGAGT 
ATAAGTCTTC 
TTTAGTCGAA 
CAAATTGCCA 
ATATTTCAAA 
AAATCTATAT 
TTCTGTTTGA 
GAACATGGAG 
TGAGCACCGT 
GGAGTGTTTT 
TACTCTGTTG 
TTAAAGTGTT 
CCATATTAGA 
AGAAACGGTC 
GGAAAGCAGT 
TGAGGGGGTG 
ACGTCCTCTC 
TCCTTTCTGA 
GGGGTTCCCT 
GAAAAATTAC 



GCAGATGGCC 
AGCAGGGGGT 
AACTGCTCAG 
CAGTTCATTT 
CGTAACTTTA 
CGTTACTCAG 
CCTCAGAATC 
CCTATGCCTG 
CCACTGGGGT 
ACACCCAGAT 
GCCTCCTCAC 
GGTGAGGTCA 
CCCCTGTGCA 
CCCTTCACCC 
TGGAAGAGAA 
AGCCCAAGCC 
CGCCGGTGCT 
CACCCCTCCA 
GCCCACCCAG 
ACCTCTACCC 
CACGTCAGGA 
TGTAGTCTGA 
GCCTGGGTGC 
CAGACCCACC 
AGGCACACAC 
CCAGCTTTGC 
GACCAAACCC 
TTTTTCAGAG 
CGGATCATGA 
CATATGAGAC 
CAAAAATGTA 
AATATCGTGA 
CTGTTAACAG 
CCCTTTTCTG 
ACAGGTTTTT 
ACAGCTGCTA 
TTACACCAAG 
GCAGCCAAGA 
CTACATAGCG 
AGTGCTTCAT 
TATAAAACAG 
TCATCACAAA 
TTCCCACACT 
AAAATCAGCG 
ACTCATGGGC 
CTGCCTCTCA 
CATTTCCTAG 
TTTAAATTTG 
AGAGAGATGA 



GGCACTGAGA 
TGCGGGGGGA 
TGGGAGACTC 
GTAATCTTGT 
AAGGTATGCA 
AGTTCCTAGA 
ACTTGTCACT 
GAGGTTGGGA 
AAGCACTGTG 
GACCGTGCCT 
CCCACCCCTT 
CGGCTTAGCC 
GATTGGAGGA 
CCCATGGCAC 
GACTCCAGGT 
TCACTGCTCG 
GTTTCCCTGC 
ACCACTTTCC 
GGC AGTTGAC 
ACAGGGCCGC 
TGGAGAGGTC 
AATGACCTGC 
CTTTTTGGTG 
TTCTTGAGCA 
TGAGCAGAGA 
AGAGACCAAG 
GGAGCTTGCC 
GACTTGGTTT 
AAGGATCTGC 
TGAAACTGCT 
GGGTATTAAG 
TTCAGGTATA 
AAAACACACC 
CCCACACATT 
T T T T AAT TAG 
ATGTCAATTC 
AATTTTAAAA 
CTGAGAGATC 
TATAATTATG 
GTTTGAGGTA 
GAAAAATCCA 
ATTATATATA 
TGCTTTAAAT 
AGGAGCTCGT 
CAAGCAGGGC 
CTCTCTGGAG 
ACATCAGACT 
TTCACTCTAG 
TGTGTTGGGT 



GCCAGCAAGA 
GCTTTAAACT 
TCAGCACAGA 
TGTCGAGTTC 
CTTTATATAG 
AATGTACACA 
TCTTTAAATG 
GCTTCATCTA 
TGACTGCAAA 
ATGTGCCCCT 
TCTGCAGCTC 
TGTTGGCCAG 
GGCCAGGTCT 
AGATGAGACA 
TGCCAGGTGT 
TGTTCCCAGC 

TCAGATACAA 
AAGGTGCCAG 
AGAGGGATGC 
GGCCTTGTCC 
CCATGTCAGC 
CGATGGTCCA 
CCTACTCTGA 
ACAACACATA 
AAGTCCCTGT 
GTTCTTCTCT 
CTTCTGAGGC 
AAATTTGTTC 
CGCAAAGGTG 
TATAACATTT 
AGTTTAGTGA 
TTTAGACATT 
CCAAGCACAT 
CTTAAAAATA 
CTTGGAAAAG 
CTGTGGGAAG 
CAAAGACGCT 
AGTCTGAGAC 
GAGCCACACA 
TTTTCGTGTT 
CGAGCAGGTA 
TAGCAGAGTC 
GGCCATGACC 
GGGAAAAATG 
CACACAGGTA 
ACTGGACTTC 
TTGCTACTTA 
TTAGCATTTG 
AAGAGATGGT 



AGCGGAGGAG 
GAGCCCTGTA 
CGGTCATGGG 
TGGGTTTTTT 
ATTTATTTAT 
GCTTTTTTAC 
AATGAATGAA 
CATCACATTC 
GCCAGGGTGT 
GTTGTCCTCC 
CTCATCTAAA 
TGGCCCCACC 
CTCCCCTTAG 
TTCACAGAGT 
GTCCACTCTC 
CAACCCCAGC 
CCAGTTACCA 
GACAGAGAAG 
CCTCCTTGGA 
TGGATTCTCA 
CAGTTCTTTG 
GGCTGAGCCA 
CTTGAGTTGG 
TAGCCACCAA 
CGCCTCACCA 
ACCTTTGCAG 
CTCTAGCATT 
ACCCCAAATG 
AATCTGAGTC 
CCGTGACCTA 
CATTAAAAAG 
TGATTCATGC 
TAATGCCTAG 
ATATACTGAG 
AGCAGTTGTA 
AAAGACCAAA 
GTCCCTTTCC 
CTGTGATTAA 
AGTGGGCCAT 
CCAACTTACA 
TTGACACTAT 
ATAAACAATG 
TAGTGTTTAG 
AGACGGGCCC 
CCAGGCCGCC 
CTTTACTGCC 
GTACACAAAC 
CAGAAGCTGT 
TTAAAAGTCC 
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24 51 AGCTTGCTGT TTTTCATTAA GTGTCTTGAA AATGAGTAAG TGGCGTTCCT 

2501 GGAGGGGAAC AATCATATAA TTCCGCAGGG TGGGTCTAAA CTTGTTTTCT 

2551 GATAGTGTTT AGCAGCTCAT GGCTCTGAGG GCACCTGATA ACACAGCAGC 

2601 CAGGCGCTGA TGAGAAGTGT GTGCCAGACA GACCCGAGTG TGGCTTGGCT 

2 651 CTTGCCTTAT GTTCCTTTCT CTGTTCAGAG AAGCGTGAGA TGAGATTTTG 

2701 TGATTATATT GCACTCCTTG GGCTGACTTT CCCATGCACA GAATGTTTTA 

2751 CACATCCTGA TAGCTGAGCT GAAAATGCAA AGAGAAGGGA AAATGCCTTA 

2801 AATTGTTCTG GCTAATTTAG AAGCAGCAGG CCTTGGAAGT CTTTGTCCTG 

28 51 TGTCCCTGAA CAAATCTTAT GGGAGCTCTG GTACCTATGC CAGAAAATGC 

2901 ACATAGGCAC AACACTTTTA CATACACGTT CACACACCCC ACCCTTATGG 

2951 AGAACTTTTT TCTAAATAAG AGAAAGAAAA ATTTTAAGAC TTACAAGTTA 

3001 TGTTTAGGTA TTTTACATGG TTCAGAAAAC AAGACATGAA GCGGTATAAA 

3051 CTGAGAAGTC TTGTTCCCAC AACCCCACGT GCCAGGTACA CATAACCATT 

3101 TTTATTCACC TCTAGCTTGT GCTTCCAATG TTTGTTAGGC ATATGTAAAT 

3151 AAGTGAATAG ATAAGCATTT CTCCCTCCTT TTGCTGACAT GAGTGGTGGC 

3201 ATGTTTTGCC CCTGGCTTTT ATCCCTTGAC CCCATTCCAG TACCTAGAGA 

3251 CCTGCTTCAT TTTTTTAGAT GTGTAATACT TCATGTGTGC GTGTGCCTTA 

3301 GTGATTAACT CGTGCACTGT GCAGGGACAT CGGGCTGGGA TCAGTTTGTT 

3351 CACTGATATA TACAGCGCTG CGGGAGATAC CCTCACATGT GTATCATTTG 

34 01 GTCCATGTGC AGGTGTGTCT GGAAGATAGA ATTCTAGGCG TAGAATTGAT 

3451 AGGTTAAATG TATTTATAGG GAAAAAATCA ATATAAAACT TTGCGTGTAA 

3501 TGATATTTGC GTGCTTTTTT TTTTAATTTT TTTACCCAAA TAGTAAAAAA 

3551 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 656 bp to 1072 bp; peptide length: 139 
Category: similarity to known protein 



1 MSPSPPMAQM RHSQSLQMME EKTPGCQVCP LSGTPSPSLT ARVPSQPQHG 
51 GYAGAVSLLR YNQLPETTSP LQPLSKVPGQ RSPSLAHPGQ LTEGCPPWRG 
101 ASPLPTGPRP CPGFSPGQSR QDGEVPCQPV LWWGSCSLK 



BLASTP hits 



Entry CEEGAP7_1 from database TREMBL: 

gene: "EGAP7.1"; Caenorhabditis elegans cosmid EGAP7 . 

Score = 123, P = 2.3e-07, identities = 35/103, positives = 44/103 

Entry MMBPC35_1 from database TREMBL: 

Mouse carbohydrate binding protein 35 mRNA, 3' end. 

Score = 113, P = 2.2e-06, identities = 40/103, positives = 44/103 

Entry A28651 from database PIR: 

galactose-specif ic lectin - mouse >TREMBL : MMMAC2A_1 Mouse mRNA for 
Mac-2 antigen 

Score = 113, P = 2.2e-06, identities = 40/103, positives = 44/103 



Alert BLASTP hits for DKFZphfbr2_23o24, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23o24, frame 2 

Report for DKFZphf br2_23o24 . 2 



[LENGTH] 139 

[MW] 14748.91 

[pi] 8.90 

[PROSITE] PRENYLATION 
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[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 1 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] PKC_PHOSPHO_SITE 1 

[KW] All_Alpha 



SEQ MSPSPPMAQMRHSQSLQMMEEKTPGCQVCPLSGTPSPSLTARVPSQPQHGGYAGAVSLLR 

PRD cccchhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccchhhhhhhh 

SEQ YNQLPETTSPLQPLSKVPGQRSPSLAHPGQLTEGCPPWRGASPLPTGPRPCPGFSPGQSR 

PRD hhcccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ QDGEVPCQPVLWWGSCSLK 

PRD ccccccccccccccccccc 



Prosite for DKFZphfbr2_23o24 . 2 



PS00005 40->43 PKC_PHOSPHO_SITE PDOC00005 

PS00006 119->123 CK2_PHOSPHO_SITE PDOC00006 

PS00008 50->56 MYRISTYL PDOC00008 

PS00013 12 6->137 PROKAR_LIPOPROTEIN PDOC00013 

PS00294 136->140 PRENYLATION PDOC00266 



(No Pfam data available for DKFZphf br2_23o24 . 2) 
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DKFZphfbr2_23o5 



group: brain derived 

DKFZphf br2_23o5 encodes a novel 3 50 amino acid protein with no known similarity 
No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

potential start at Bp 24 matchs Kozak consensus ANNatgG 
Sequenced by AGOWA 
Locus: /map="7q21-q22" 
Insert length: 1736 bp 

Poly A stretch at pos. 1714, polyadenylation signal at pos . 1680 



1 GGGGGAGGAT CAAAGTAGGC AAGATGGCGT CGAGCGGCGG GGAGCCAGGG 
51 AGTTTATTTG ATCACCACGT CCAGAGGGCG GTATGCGACA CACGGGCCAA 
101 ATATCGAGAG GGACGACGGC CTCGTGCTGT GAAGGTATAT ACAATCAATT 
151 TGGAATCTCA GTACTTATTA ATACAAGGAG TTCCTGCTGT GGGAGTCATG 
201 AAGGAATTAG TTGAGCGATT CGCTTTATAT GGTGCAATTG AACAGTACAA 
251 TGCTCTAGAT GAATACCCAG CAGAAGACTT TACTGAAGTT TATCTTATTA 
301 AATTTATGAA CTTACAAAGT GCAAGGACAG CCAAGAGAAA AATGGATGAA 
351 CAGAGTTTCT TCGGTGGATT GCTTCATGTG TGCTATGCTC CAGAATTTGA 
401 AACAGTTGAA GAAACTAGAA AAAAACTACA AATGCGGAAG GCATATGTAG 
451 TAAAAACTAC TGAAAATAAA GACCATTACG TGACAAAGAA GAAATTGGTT 
501 ACAGAGCATA AAGACACAGA GGATTTTAGA CAAGACTTCC ACTCAGAGAT 
551 GTCTGGATTT TGTAAAGCTG CTTTGAACAC TTCTGCAGGG AACTCAAATC 
601 CTTATCTTCC GTATTCCTGT GAATTGCCTT TATGTTATTT CTCCTCAAAA 
651 TGTATGTGTT CATCCGGGGG ACCTGTAGAC AGAGCACCAG ACTCCTCTAA 

7 01 GGATGGTAGA AACCATCATA AAACAATGGG GCATTATAAC CACAATGACT 
751 CTTTGCGGAA AACACAGATA AACTCTTTGA AAAACTCAGT GGCCTGCCCT 

8 01 GGTGCACAAA AGGCTATTAC GTCTTCAGAG GCAGTTGACA GATTTATGCC 
851 TAGGACAACA CAACTGCAGG AGCGCAAAAG AAGAAGAGAA GATGATCGTA 
901 AACTTGGAAC TTTTCTTCAA ACAAACCCAA CTGGTAATGA GATTATGATT 
951 GGACCTCTGT TACCAGACAT CTCTAAAGTG GATATGCACG ATGACTCATT 

1001 GAATACAACG GCGAATTTAA TTCGGCATAA ACTTAAAGAG GTATTTCATC 
1051 TGTGCCAAAG CCTCCAGAGG ACAAGCCAGA AGATGTACAT ACAAGTCATC 
1101 CATTAAAACA AAGAAGAAGA ATATAGAGTG CCAGCAGCAA CTTAGTATTT 
1151 TCTAAAAAGA ACATTTATTA TTTATTTTTA GCCTGTCATT TTAATTCTTC 
1201 AAGAGATTTT ACTGCTGGTA TTTTTTGATG CACTCCTCTT TGTAATTTCA 
12 51 TTCAAGCCAT TTGTCTAAAG TCATTTCTTT GTTTTTTGGG AGATGGAGTC 
1301 TTGCTCTGTT GCCCAGGCTG GAATGCAGTG GCGTGATCTC GGCTCACTGC 
1351 AACCTCCACC TCCCGGGTTC AAGCGATTCT CCTGCCTCAG CCTCCTGAGT 
14 01 ATCTGGGATT ACAGGCGTGC ACCACCATGC CTGGCTAAGT TTTGTGTTTT 
14 51 TTTTAGTAGA GATGGGTTTT CACCATATTG GTCAGGCTGG TCTCGAACTC 
1501 CTGACCTTGT GATACACCTG CCTCAGCCTC CCAAAGGGAT GAGCCACCGC 
1551 GCCTGGCCCA TTTCTTCTTT TTTTGACCCA TACTTAATGT TGCAGAAACT 
1601 ATTCTTGTCA TAACATTATC TCTCATGTAC AGTAATTATA TGTAAATTAA 
1651 TTGAAGCAAA TATGGAAACT TTACAATAGA AATAAAGATA GGCAGCCAGC 
1701 GTCTGTTTCC AATTATAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry AC005156 from database EMBL: 

Homo sapiens PAC clone DJ1099C19 from 7q21-q22, complete sequence. 
Score = 2897, P = 2.4e-154, identities = 583/586 
2 exons covering Bp 4 65-1723 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 24 bp to 1103 bp; peptide length: 360 
Category: similarity to unknown protein 



1 MASSGGEPGS LFDHHVQRAV CDTRAKYREG RRPRAVKVYT INLESQYLLI 
51 QGVPAVGVMK ELVERFALYG AIEQYNALDE YPAEDFTEVY LIKFMNLQSA 
101 RTAKRKMDEQ SFFGGLLHVC YAPEFETVEE TRKKLQMRKA YVVKTTENKD 
151 HYVTKKKLVT EHKDTEDFRQ DFHSEMSGFC KAALNTSAGN SNPYLPYSCE 
201 LPLCYFSSKC MCSSGGPVDR APDSSKDGRN HHKTMGHYNH NDSLRKTQIN 
251 SLKNSVACPG AQKAITSSEA VDRFMPRTTQ LQERKRRRED DRKLGTFLQT 
301 NPTGNEIMIG PLLPDISKVD MHDDSLNTTA NLIRHKLKEV FHLCQSLQRT 
351 SQKMYIQVIH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23o5, frame 3 

TREMBL:AC005824_10 gene: "F15K20. 11"; Arabidopsis thaliana chromosome 
II BAC F15K20 genomic sequence, complete sequence., N = 2, Score = 114, 
P - 3.6e-ll 



>TREMBL:AC005824_10 gene: "F15K20 . 11 " ; Arabidopsis thaliana chromosome II 
BAC F15K20 genomic sequence, complete sequence. 
Length - 227 

HSPs: 



Score = 114 (17.1 bits), Expect = 3.6e-ll, Sum P(2) = 3.6e-ll 
Identities = 21/41 (51%), Positives = 29/41 (70%) 



Query ; 


103 


AKRKMDEQSFFGGLLHVCYAPEFETVEETRKKLQMRKAYVV 143 








AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ 




Sbjct: 


51 


AKRKLDESSFLGNRLQIS YAPE YEN VNDTKDKLESRRKEVL 91 




Score 


= 107 


(16.1 bits), Expect = 2.6e-10, Sum P(2) - 2.6e-10 




Identities = 


= 50/191 (26%), Positives = 83/191 (43%) 




Query: 


103 


AKRKMDEQSFFGGLLHVCYAPEFETVEETRKKLQMRKAYVVKTTENKDHYVTKKKLVTEH 


162 






AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ + T + VT+ 




Sbjct: 


51 


AKRKLDESSFLGNRLQIS YAPEYENVNDTKDKLESRRKEVLARLNPQKEKSTSQ--VTKL 


108 


Query: 


163 


KDTEDFRQDFHSEMSGFCKAALNT5AGNSMPYLPYSCELPLCYFSSKCMCSSGGPVDRAP 


222 






+ D S + + GN+ P S + YF+S M + V 




Sbjct: 


109 


AGPALTQTDNVSSQRREMEYQFHR— GNA-PVTRVSSDQE— YFASSSMNQTVKTV 


159 


Query: 


223 


DSSKDGRNHHKTMGHYNHNDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQ 


282 






K++++H+++N+ P+Q S RP ++Q+Q 




Sbjct: 


160 


-REKLNKTREENISSLSHCKQIEESG-NQKRLQ PSSQTQPEESGNQKRLQP-SSQIQ 


213 


Query: 


283 


-ERKRRREDDRK 2 93 








+ KR R D+R+ 




Sbjct : 


214 


PDLKRTRVDNRR 225 




Score 


= 102 


(15.3 bits), Expect = 3.6e-ll, Sum P(2) = 3.6e-ll 




Identities = 


= 22/55 (40%), Positives = 38/55 (69%) 




Query: 


26 


KYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMKELVERFALYGAIEQY— NALDE 80 








+Y++ P AV+VYT+ ES+Y++++ VPA+G +L+ F YG +E++ LDE 




Sbjct: 


3 


RYKD-ETP-AVRVYTVCDESRYMIVRNVPALGCGDDLMRLFMTYGEVEEFAKRKLDE 57 





Pedant information for DKFZphfbr2_23o5, frame 3 



Report for DKFZphfbr2_23o5 . 3 



[LENGTH] 360 

[MW] 41105.85 

[pi] 8.89 

[HOMOL] TREMBL:AC005824_10 gene: "F15K20 . 11"; Arabidopsis thaliana chromosome II BAC 
F15K20 genomic sequence, complete sequence. 5e-12 

[PROSITE] AMIDATION 1 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHC_SITE 7 
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[PROSITE] PKC_PHOSPH0_SITE 9 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.17 % 



SEQ MASSGGEPGSLFDHHVQRAVCDTRAKYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMK 

SEG 

PRD ccccccccceeeecceeeeehhhhhhhhhccccceeeeeeecccceeeeeeccccchhhh 

SEQ ELVERFALYGAIEQYNALDEYPAEDFTEVYLIKFMNLQSARTAKRKMDEQSFFGGLLHVC 

SEG 

PRD hhhhhhhhhhhhhhhhhhccccccceeeeeeehhhhhhhhhhhhhhhhhccccccceeee 

SEQ YAPEFETVEETRKKLQMRKAYVVKTTENKDHYVTKKKLVTEHKDTEDFRQDFHSEMSGFC 

SEG 

PRD eccchhhhhhhhhhhhhhhhheeeeccccceeeeeeeeeeeccccchhhhhhhhhcccce 

SEQ KAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAPDSSKDGRNHHKTMGHYNH 

SEG 

PRD eeeeccccccccccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ NDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQERKRRREDDRKLGTFLQT 

SEG xxxxxxxxxkxxxxx 

PRD cccceeeeccccccccccccceeeeecceeeeeccccchhhhhhhhhhhhccceeeeeec 

SEQ NPTGNEIMIGPLLPDISKVDMHDDSLNTTANLIRHKLKEVFHLCQSLQRTSQKMYIQVIH 

SEG 

PRD cccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhccc 



Prosite for DKFZphfbr2_23o5. 3 



PS00001 


185- 


>189 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


241- 


>245 


asn" 


"glycosylation 


PDOC03001 


PS00001 


327- 


■>331 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00005 


99- 


■>102 


PKC" 


PHOSPHO 


SITE 


PDOCOOOUb 


PS00005 


102- 


>105 


PKC" 


"PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


131- 


■>134 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


154- 


>157 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


207- 


■>210 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


224- 


>227 


PKC" 


"PHOSPHO] 


"site 


PDOC00005 


PS00005 


243- 


>246 


PKC" 


"PHOSPHO" 


STTE 


PDOC00005 


PS00005 


251- 


>254 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


351- 


>354 


PKC" 


PHOSPHO 


"site 


PDOC00005 


PS00006 




4->8 


CK2" 


"PHOSPHO 


SITE 


PDOC00006 


PS00006 


1C 


->14 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


127- 


>131 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


224- 


>228 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


266- 


>270 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


303- 


>307 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


317- 


>321 


CK2~ 


"PHOSPHO 


STTE 


PDOC00006 


PS00008 


5 


->11 


MYRISTYL 




PDOC00008 


PS00008 


260- 


>266 


MYRISTYL 




PDOC00008 


PS00009 


29 


->33 


AMI DAT I ON 




PDOS00009 



(No Pfam data available for DKFZphf br2_23o5 . 3 ) 
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DKFZphfbr2_2a2 



group: brain derived 

DKF2phfbr2_2a2 . 3 encodes a novel 167 amino acid protein with weak similarity to human 52K 
autoantigen Ro/SS-A 

The novel protein contains a C3HC4 Zinc finger "RING finger" motive. 

This domain is probably involved in mediating protein-protein interactions. 

Proteins containing a RING-finger are: mammalian V(D)J recombination activating protein 

(RAG1), mouse rpt-1, human rfp, human 52 Kd Ro/SS-A protein and others. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to 52K autoantigen Ro/SS-A - human 

complete cDNA, complete cds, few EST hits 

Seguenced by Qiagen 

Locus : unknown 

Insert length: 1376 bp 

Poly A stretch at pos. 1355, polyadenylation signal at pos. 1340 



1 GGGGACTCCA AATTAGAAAG GGGACGTCTA GTGGGTTGCC CGGGAGGGGT 
51 GGCGGGAGCG GTCCTGGAAA TAATCTGTCC TCTGTCGCCG GGAACTGGCG 
101 AGGTAGTTCC TTCGCGGTGG AGAGACCTGG AATGGCCAAA TATCAAGGTG 
151 AAGTTCAAAG TTTGAAACTG GATGATGATT CAGTTATAGA AGGAGTAAGC 
201 GACCAAGTAC TTGTGGCAGT TGTGGTCAGT TTCGCTTTGA TTGCTACCCT 
251 GGTATATGCA CTTTTCAGAA ATGTACATCA AAACATTCAC CCAGAAAACC 
301 AGGAGCTAGT AAGGGTACTT CGAGAACAGC TTCAAACAGA ACAGGATGCA 
351 CCTGCTGCCA CTCGACAGCA GTTCTACACT GACATGTACT GTCCCATCTG 
401 CCTGCACCAA GCCTCCTTCC CGGTGGAGAC CAACTGTGGA CATCTTTTTT 
4 51 GTGGTGCCTG CATTATTGCT TACTGGCGAT ATGGTTCATG GCTTGGGGCA 
501 ATCAGTTGTC CAATCTGTAG ACAAACGGTA ACCTTACTCC TAACAGTATT 
551 TGGTGAAGAT GATCAGTCTC AGGATGTTCT GAGATTGCAT CAGGATATTA 
601 ATGATTATAA CCGGAGATTC TCAGGGCAAC CCTGATCTAT TATGGAGAGA 
651 ATTATGGATC TACCCACTTT ACTGAGGCAT GCATTCAGGG AAATGTTTTC 
701 AGTCGGGGGC CTTTTCTGGA TGTTTCGCAT CAGGATAATA CTTTGTTTAA 
751 TGGGAGCTTT TTTCTATCTT ATATCACCTC TAGATTTTGT ACCTGAAGCC 
801 TTGTTTGGAA TTCTAGGCTT TCTAGATGAT TTCTTTGTCA TCTTTTTATT 
851 GCTTATCTAC ATCTCTATTA TGTATCGAGA AGTGATAACC CAAAGGCTAA 
901 CTAGATGAAA AAGGAAACAA AACTGAGTTT ACTAGGATAT CTGAGCTAAT 
951 GTAGAACATC AAACAGAAGG ACCCATGGCA GTATAAAGCA ATGAAGCAAT 
1001 GGAGTATTAT CTCACAAATA TAAAACCACT ATAAGACAAA CATTTGATTA 
1051 TCATTTGACA AATACCTAGG TATAACTGGA ATTTTCATGT TTGAAGTTCT 
1101 AATATTAAGT TTAGAATTAT AATGATCTAC AGTTGTATCT TGATTCTATG 
1151 TTGTCTGGAA AAAATATGGA ATTATATAAA AAGGGATGCT TTTATATATT 
1201 TTTCTTTTCC CCAGAATTAC TTAGATTAAT TAGATGTATA GTAAAATATT 
12 51 GTTAAATGTC AGTTTATCCA TCTTATCCTT CTCAGCAGGT ACCTATATGA 
1301 TAATATATAG CTGTGAAACT CATCTAAATA TTTTTGTTCC AATAAAATAT 
1351 TATATACTAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 132 bp to 632 bp; peptide length: 167 
Category: similarity to known protein 
Classification: unset 
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Prosite motifs: ZINC_FINGER C3HC4 (102-112) 



1 MAKYQGEVQS LKLDDDSVIE GVSDQVLVAV VVSFALIATL VYALFRNVHO 
51 NIHPENQELV RVLREQLQTE QDAPAATRQQ FYTDMYCPIC LHQASFPVET 
101 NCGHLFCGAC IIAYWRYGSW LGAISCPICR QTVTLLLTVF GEDDQSQDVL 
151 RLHQDINDYN RRFSGQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2a2, frame 3 

TREMBL : CEY3BF1A_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosmid 
Y38F1A, N = 1, Score = 194, P = 2e-15 

PIR:T05222 hypothetical protein F17I5.130 - Arabidopsis thaliana, N = 
1, Score = 159, P = 1.4e-10 

TREMBLNEW:AB025011_1 gene: "TRIF"; product: "Trif-d"; Mus musculus 
mRNA for Trif-d, complete cds . , N = 1, Score = 108, P = 2.6e-06 

PIR:A37241 52K autoantigen Ro/SS-A - human, N = 1, Score = 115, P = 
5e-05 



>TREMBL:CEY38F1A_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosmid Y38F1A 
Length = 283 

HSPs : 

Score = 194 (29.1 bits). Expect = 2.0e-15, P = 2.0e-15 
Identities = 52/149 (34%), Positives = "78/149 (52%) 

Query: 16 DSVIEGVSDQVLVAVVVSFALIATLVYALFRNVHQNIHPENQELVRVLREQLQTEQDAPA 75 

D +E ++ Q+ +A+ V F ++ + A Q E RQ+ T++ 

Sbjct: 41 DPDVE-LATQITMATAVIF-IVKAIFDAWQSRRRQRAASRMDENAE — RNQIITQRRISE 96 

Query: 76 ATRQQFYTDMYCPICLHQASFPVETNCGHLFCGACIIAYWRYGSWLGA-ISCPICRQTVT 134 

A Q + CPICL ASFPV T+CGH+FC CII YW+ + C +CR T 

Sbjct: 97 ALHQSSHE CPICLANASFPVLTDCGHIFCCECIIQYWQQSKAI VTPCDCAMCRSTFY 153 

Query: 135 LLLTV FGEDDQSQDVLRLHQ-DINDYNRRFS 164 

+LL V G +++ D ++ + I+DYNRRFS 

Sbjct: 154 MLLPVHWPTMGTSEETDDHIQENNIRIDDYNRRFS 188 



Pedant information for DKFZphfbr2_2a2, frame 3 



Report for DKFZphfbr2_2a2 . 3 



[LENGTH] 167 

[MW] 18941.65 

[pi] 4.91 

[HOMOL] TREMBL : CEY38 F1A_8 gene: " Y38F1A . 2" ; Caenorhabditis elegans cosmid Y38F1A le- 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YDR265w] le-04 

[FUNCAT] 30.19 peroxisomal organization [S. cerevisiae, YDR265w] le-04 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR323c] 2e-04 

[BLOCKS] BL00518 Zinc finger, C3HC4 type, proteins 

[PROSITE] ZINC_FINGER_C3HC4 1 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.59 % 

SEQ MAKYQGEVQSLKLDDDSVIEGVSDQVLVAVVVSFALIATLVYALFRNVHQNIHPENQELV 

SEG xxxxxxxxxxx 

lrmd- 

SEQ RVLREQLQTEQDAPAATRQQFYTDMYCPICLHQASFPVETNCGHLFCGACIIAYWRYGSW 

SEG 

1 rmd- HHHHHHBTTTTTEETTTEEEETTTEEEEHHHHH HHHHH 



SEQ LGAISCPICRQTVTLLLTVFGEDDQSQDVLRLHQDINDYNRRFSGQP 
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SEG 

lrmd- HCCB-TTTTT 



PS00518 



Prosite for DKFZphf br2_2a2 . 3 
102->112 ZINC FINGER C3HC4 PDOC00449 



Pfam for DKFZphf br2_2a2 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Zinc finger, C3HC4 type (RING finger) 

*CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CP 

CPIC L+ P++++CGH+FC +CI+ + CP 

87 CPIC LHQ ASFPVETNCGHLFCGAC 1 1 AYV7RYGSWLGAI SCP 



+C 

128 IC 



129 



127 
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DKFZphfbr2_2bl7 



group: transmembrane protein 

DKFZphfbr2_2bl7 encodes a novel 285 amino acid protein with similarity to D. melanogaster 30K 
protein . 

The protein contains 3 transmembrane regions. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to Drosophila hypothetical 30K protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 3 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1426 bp 

Poly A stretch at pos. 1345, polyadenylation signal at pos . 1330 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



GGGGGTATTT 
TGTCGAGCCC 
ACCTCGTACA 
CTGAGCGCTC 
CCACCGGCAC 
AGTCTTTGCT 
GTCAGAAGCG 
TGGGACCGCC 
AAAGGACCTT 
GCTGGGTGTA 
ATTGAGCAGA 
ATCTGCACAT 
GGGGTTGGAG 
AGTCTGAATG 
AGGAGCTGTC 
TGGTGGCTGG 
CTGCTGATGG 
ACAGAAGGAT 
GCAGACTACA 
CAGGAAGATG 
CCTTCCTAGA 
GCTCTGAACT 
TGAATGCCAA 
GTGCTGGTAC 
TTTAACTAAG 
TAAATACATA 
TATATCCACC 
AAAAAAAAAA 
AAAAAAAAAA 



CCAAGGACTC 
TCTGGCAGAG 
GTTACGCTCT 
AAGTTTGTCC 
CGCGGAGCTT 
GCCGAAGCTG 
GCTTCCCTAC 
TCCGGGAGCT 
GCTAATATCT 
TGGGGGAATA 
GCCAGGCAGA 
CGTGCTGCCA 
AACTGCAGTG 
TATACCGAAA 
ACGGGAAGTC 
TGGCATAATT 
CATTTCAGAA 
CGAAAGGCAC 
AGTTACTGAG 
AACCTGAGAA 
AACCCTTCAG 
TGAAACTCAC 
CAGACAGGCC 
CTGTGGTGGC 
AATGGGGCTG 
CTTATGTTTG 
CACCTAGATT 
AAAAAAAAAA 
AAAAAAAAAA 



CAAAGCGAGG 
GGTTAACCTG 
CCCGCGGCAC 
GTAGGTCGAG 
TCTCTGTAGA 
TGACTGCCGA 
GTCCCAGAGC 
GTTTGGCAAA 
GTAAGACGGC 
CCAGCTTTTA 
AATTTATCAT 
CACGAGGCTT 
TTTGTGACTA 
TAAAGATGCC 
TTTTTAGGAT 
GGAGCCTTGC 
GTACTCTGGT 
TCCATGAGr.7 
CACCTCCCTG 
TGATGCTAAG 
TAATAGATAA 
TGGAGAGCTG 
ACTCTTTGGT 
AGTGGCTTGC 
TTGTACTCTC 
TATTAATCTA 
TTAAGCAGTA 
AAAAAAAAAA 
AAAAAA 



CCGGGGACTG 
GGTCAAATGC 
GTCCGCGAGG 
AGAAGGCCAT 
GCATTGTGCC 
TTCGGAAGTC 
CCTATTACCC 
GATGAACAGC 
GGCTACAGCA 
TTCATGCTAA 
AACCGGTTTG 
CATTCGTTAT 
TATTCAACAC 
TTAAGCCATT 
AAACGTAGGC 
TGGGCACTCC 
GAGACTGTTC 
AAAACTGGAA 
AGAAAATTGA 
AAAATTGAAG 
ACAAGACAAG 
AAGGGAGCTG 
CAGCCTGCTG 
TCTTGTCTTT 
ACTTTACTTA 
TCAATATATG 
AATAAAACAT 
AAAAAAAAAA 



AAGGTGTGGG 
ACGGATTCTC 
ACTTGAAGTC 
GGAGGTGCCG 
TATTTCCCCG 
CTTGAGGAGC 
GGAATCTGGA 
AGAGAATTTC 
GGCATCATTG 
ACAACAATAC 
ATGCTGTGCA 
GGCTGGCGCT 
AGTGAACACT 
TTGTAATTGC 
CTGCGTGGCC 
TGTAGGAGGC 
AGGAAAGAAA 
GAGTGGAAAG 
AAGTAGTTTA 
CACTGCTAAA 
GACTGAAAGT 
CCATGTCCGA 
ACAAATTTAA 
TTCTTTTCTT 
TCCTTAAATT 
CATACATGAA 
TTCGCAAAAG 
AAAAAAAAAA 



BLAST Results 



Entry HSG19630 from database EMBL: 
human STS A001T27. 
Score - 961, P = 1.2e-36, identities = 193/194 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORE from 189 bp to 1043 bp; peptide length: 285 
Category: similarity to unknown protein 
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1 MEVPPPAPRS FLCRALCLFP RVFAAEAVTA DSEVLEERQK RLPYVPEPYY 
51 PESGWDRLRE LFGKDEQQRI SKDLANICKT AATAGIIGWV YGGIEAFIHA 
101 KQQYIEQSQA EIYHNRFDAV QSAHRAATRG FIRYGWRWGW RTAVFVTIFN 
151 TVNTSLNVYR NKDALSHFVI AGAVTGSLFR I MVGLRGLVA GGIIGALLGT 
201 PVGGLLMAFQ KYSGETVQER KQKDRKALHE LKLEEWKGRL QVTEHLPEKI 
251 ESSLQEDEPE NDAKKIEALL NLPRNPSVID KQDKD 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2bl7 , frame 3 

PIR:JQ1024 hypothetical 30K protein (DraRP140 5' region) - fruit fly 
(Drosophila melanogaster) , N = 1, Score = 312, P = 6.1e-28 



>PIR:JQ1024 hypothetical 30K protein (DmRP140 5' region) - fruit fly 
(Drosophila melanogaster) 
Length = 261 

HSPs: 



Score 


= 312 


(46.8 bits). Expect = 6.1e-28, P = 6.1e-28 




Identities ■ 


■ 68/231 (29%), Positives = 125/231 (54%) 




Query: 


30 


ADSEVLEERQKRLPYVPEPYYPESGWDRLRELFGKDEQQRI SKDLANICKTAATAGIIGW 


89 






AD V +E + ++ E+G +RL+++F DE I +L ++ + +IG 




Sbjct: 


23 


ADEIVDKENKTYKAFLASKPPEETGLERLKQMFTIDEFGSIFSELNSVYQAGFLGFLIGA 


82 


Query: 


90 


VYGGI PAFIHAKQQYI EQSQAEI YHNRFDAVQSAKRAATRGFI RYGWRWGWRTAVFVTIF 


149 






+YGG+ A ++E +QA + + FDA + T F + G++WGWR + F T + 




Sbjct : 


83 


IYGGVTQSRVAYMNFMENNQATAFKSHFDAKKKLQDQFTVNFAKGGFKWGWRVGLFTTSY 


142 


Query: 


150 


NTVNTSLNVYRNKDALSHFVI AGAVTGSLFRINVGLRGLVAGGI IGALLGTPVGGLLMAF 


209 






+ T ++VYR K ++ . ++ AG++TGSL+++++GLRG+ AGGIIG LG G + 




Sbjct : 


143 


FGIITCMSVYRGKSSIYEYLAAGSITGSLYKVSI.GLRGMAAGGIIGGFLGGVAGVTSLLL 


202 


Query: 


210 


QKYSGETVQERKQKDRKALHELKLEEWKGRLQVTEHLPEKIESSLQEDEPE 2 60 








K SG +++E ++ +4K RL E++ + + +++ PE 




Sbjct: 


203 


MKASGTSMEE VRYWQYKWRLDRDENIQQAFKKLTEDENPE 242 





Pedant information for DKFZphfbr2_2bl7, frame 3 



Report for DKFZphf br2_2bl7 . 3 



[LENGTH] 285 

[MW] 32177.88 

[pi] 8.65 

[HOMOL] PIR:JQ1024 hypothetical 30K protein (DmRP140 5' region) - fruit fly (Drosophila 
melanogaster) 7e-20 

[PROSITE] MYRISTYL 7 

[PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] S I GNALPE PT I DE 25 

[KW] TRANSMEMBRANE 3 

[KW] LOW_COMPLEXITY 5.96 % 



SEQ MEVPPPAPRS FLCRALCLFPRVFAAEAVTADSEVLEERQKRLPYVPEPYY PESGWDRLRE 

SEG 

PRD cccccccceeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhh 

MEM 

SEQ LFGKDEQQRI SKDLANICKTAATAGI I GWVYGGI PAFIHAKQQYI EQSQAEI YHNRFDAV 

SEG 

PRD hhcccchhhhhhhhhhhhhhhhcccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QS AHRAATRGFIRYGWRWGWRT A VFVT I FNTVNTSLNVYRNKDALSHFVI AGAVTGSLFR 

SEG 

PRD hhhhhhhhhhhccccccccceeeeeeeeccccccceeecccccccceeeeecccccceee 

MEM ^^M^lMM^IM^t^IMMMMMMMM^MMMMM^^MM M 

SEQ INVGLRGLVAGGIIGALLGTPVGGLLMAFQKYSGETVQERKQKDRKALHELKLEEWKGRL 
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SEG . . xxxxxxxxxxxxxxxxx 

PRD eecccccccccceeeeeccccccchhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QVTEHLPEKIESSLQEDEPENDAKKIEALLNLPRNPSVIDKQDKD 

SEG 

PRD ccccccccchhhhhccccccchhhhhhhhhhcccccceeeccccc 

MEM 



Prosite for DKFZphfbr2_2bl7 . 3 



psooooi 


153- 


>157 


ASN GLYCOSYLATION 


PDOC00001 


PS00006 


53 


->57 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


108- 


>112 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


216- 


>22Q 


CK2 PHOSPHO' 


SITE 


PDOC00006 


PS00006 


253- 


>257 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


277- 


>281 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


92 


->98 


MYRISTYL 




PDOCOOOOS 


PS00008 


172- 


>178 


MYRISTYL 




PDOC00008 


PS00008 


187- 


>193 


MYRISTYL 




PDOC00008 


PS00008 


191- 


>197 


MYRISTYL 




PDOCOOOOS 


PS00008 


195- 


>201 


MYRISTYL 




PDOCOOOOS 


PS00008 


199- 


>205 


MYRISTYL 




pdocoooos 


PS00008 


204- 


>210 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_2bl7 . 3) 
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DKFZphfbr2_2b5 



group: cell structure and motility 

DKFZphfbr2_2b5 encodes a novel 957 amino acid protein with strong similarity to collagens. 

The novel protein contains the typical (xxG)n repeat of collagen proteins and a 

Pfam von willebrand factor type A domain. Therefore, the protein seems to be a new collagen 

alpha chain. 

The new protein can find application in modulation of connective tissue, bone and cartilage 
development and maintainance . 



similarity to collagen proteins 

shows typical (xxG)n repeat of collagen proteins 
[PFAM] von Willebrand factor type A domain 

Sequenced by Qiagen 

Locus: /map="6" 

Insert length: 4160 bp 

Poly A stretch at pos. 4141, polyadenylation signal at pos . 4119 



1 GGGGGCCCGC TGCAGGGAGA 

51 TTTCAGCGCA GGTCTTGCTC 

101 CTGTCCCCCT GGCGCAACAC 

151 GGACACTGCG CCAGGAATCC 

201 ACATGGCTCA CTATATTACA 

251 CAGAATTCTG TGTTAGCTGA 

301 TGCTCCGACA GATTTAGTTT 

351 CAGAAAACTT TGAAATAGTG 

401 TTTGACATAG GGCCGAAGTT 

451 CTACCCTGTG CTGGAGATTC 

501 TGACGGCAGC AGTGGAATCC 

551 GGGAAGGCCA TCCAGTTTGC 

601 ATTTCTGACT AAGATAGCAG 

651 ACGTCAAGGA TGCAGCTCAA 

701 GCTATTGGTG TTGGTTCAGA 

751 CAACAAGCCT TCGTCTACTT 

801 TATCCAAAAT AAGGGAAGTG 

851 TGTCCAACAC GAATTCCAGT 

901 TCTTTTGGGT TTAGATGTAA 

951 CACCAAAAAA GATAAAAGGA 

1001 GAACTCACAA GCAATGTTTT 

1051 TGTGTCTACT CAAAGATTTA 

1101 TATTAACTAT TGATGGAAGG 

1151 GACAAAATCT TATTATTTAC 

1201 GGTTACCTTT GCTAACCCTC 

1251 ACCAAATTCG TCTCTTAGTA 

1301 GACCAACAAA TTGAAAACAA 

1351 CAATGGGCAA ACCCAAATTG 

1401 AGTTTGATGT CCAAAAGTTG 

14 51 CGGGAGACAG CATGTGAGAT 

1501 TCCCAGTGAT GTAGGTTCAA 

1551 AACCAGGACT TCAAGGCCCC 

1601 GGCTACCCTG GACAACCTGG 

1651 TGCAGGGACA CCAGGTGTTC 

17 01 GACTACCAGG TTACAAAGGA 

17 51 CGTGGACTTC CTGGTTTTCC 

1801 TGAAATGGGT GCCAAAGGAG 

1851 AGGGTGCAAA AGGTGAAAAG 

1901 CCTGCTGGAG AACCAGGAAG 

1951 CGGTTTCAAG GGAGAAGCAG 

2001 CACGGGGAGA GCCTGGAATC 

2051 GGCCAAAAGG GAGAAATTGG 

2101 CCCAGGGATG CCTGGTTTAA 

2151 GAACACCGGG ATCTAAGGGA 

2201 CCTGGGGCTT CAGGGCTCAA 

2251 AGAACCAGGA TACATGGGTT 

2301 AAGGAAATCA AGGTGAAAAA 

2351 AGACAGGGAA TTCCAGGGCA 

24 01 AGGAGAGAGA GGTGAAAAGG 

24 51 CAAAAGGAGA ATCTGGGGTG 

25 01 GGGCAACCTG GGGATCCAGG 
2551 GCCCGGAAGA GAGTTTTCAG 



ACGGACTCCG GGCGGAGGGC AGCCAATCCG 
GGGTTGGGCT TGCCACTGCC TGGAACATAC 
TCAGCTGGCT GCGACCGCAA CCCCGAGCCT 
TAAAACCAAA ATATTAGAAC GAAAACAGAA 
TTTCTCTGCA TGGTTTTGGT GCTGCTTCTT 
AGATGGGGAA GTAAGATCAA GTTGTCGTAC 
TCATCTTAGA TGGCTCTTAT AGTGTTGGCC 
AAAAAGTGGC TTGTCAATAT CACAAAAAAC 
TATTCAAGTT GGAGTGGTTC AATATAGTGA 
CTCTCGGAAG CTATGATTCA GGAGAACATT 
ATACTCTACT TAGGAGGAAA CACAAAGACA 
GCTCGATTAC CTTTTTGACA AGTCCTCACG 
TGGTACTTAC GGATGGCAAG TCCCAAGATG 
GCAGCAAGAG ATAGTAAGAT AACATTATTT 
AACAGAAGAT GCCGAACTTA GAGCTATTGC 
ATGTGTTTTA TGTGGAAGAC TATATTGCAA 
ATGAAGCAGA AACTTTGTGA AGAATCTGTC 
GGCAGCTCGT GATGAAAGGG GATTTGATAT 
ATAAAAAGGT TAAGAAAAGA ATACAGCTTT 
TATGAAGTAA CATCAAAAGT TGATTTATCA 
CCCAGAAGGT CTTCCTCCAT CATATGTATT 
AAGTCAAGAA AATTTGGGAT TTATGGAGAA 
CCACAAATAG CAGTTACCTT AAATGGTGTG 
AACAACCAGC GTAATTAATG GCTCACAAGT 
AAGTTAAGAC GTTGTTTGAT GAAGGCTGGC 
ACAGAACAAG ATGTGACTTT GTATATTGAT 
GCCCTTACAT CCAGTTTTAG GGATCTTGAT 
GAAAATATTC TGGAAAAGAA GAAACTGTTC 
CGAATCTACT GTGACCCAGA ACAGAACAAC 
TCCTGGATTT AATGGAGAGT GCCTTAATGG 
CTCCAGCTCC CTGTATTTGT CCTCCGGGAA 
AAAGGTGACC CTGGACTGCC TGGGAACCCT 
TCAAGATGGT AAGCCTGGAT ATCAGGGAAT 
CAGGATCTCC AGGAATACAA GGAGCTCGAG 
GAACCAGGGC GAGATGGTGA CAAGGGTGAT 
TGGGCTTCAT GGCATGCCAG GATCAAAGGG 
ACAAAGGATC ACCTGGATTT TATGGCAAAA 
GGGAATGCTG GCTTCCCTGG CCTCCCTGGA 
ACATGGAAAG GATGGATTAA TGGGTAGTCC 
GATCCCCTGG TGCTCCGGGG CAGGATGGAA 
CCAGGATTTC CTGGAAACCG AGGATTAATG 
GCCTCCAGGA CAGCAAGGAA AAAAAGGAGC 
TGGGAAGCAA TGGCTCACCA GGCCAGCCTG 
AGCAAAGGTG AACCTGGAAT TCAAGGGATG 
GGGAGAACCA GGAGCAACGG GTTCCCCAGG 
TACCCGGGAT TCAAGGAAAA AAGGGGGACA 
GGTATTCAGG GTCAAAAGGG AGAAAATGGA 
ACAGGGAATT CAAGGCCATC ATGGTGCAAA 
GAGAACCTGG TGTCCGAGGT GCCATTGGAT 
GATGGCTTGA TGGGGCCCGC AGGTCCTAAG 
TCCTCAGGGA CCCCCAGGTT TGGATGGGAA 
AACAATTTAT TCGACAAGTT TGCACAGATG 
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2 601 TAATAAGAGC CCAGCTACCA GTCTTACTTC AGAGTGGAAG AATTAGAAAT 
2 651 TGTGATCATT GCCTGTCCCA ACATGGCTCC CCGGGTATTC CTGGGCCACC 
2701 TGGTCCGATA GGCCCAGAGG GTCCCAGAGG ATTACCTGGT TTGCCAGGAA 
2751 GAGATGGTGT TCCTGGATTA GTGGGTGTCC CTGGACGTCC AGGTGTCAGA 
2 801 GGATTAAAAG GCCTACCAGG AAGAAATGGG GAAAAAGGGA GCCAAGGGTT 
2851 TGGGTATCCT GGAGAACAAG GTCCTCCTGG TCCCCCAGGT CCAGAGGGCC 

2 901 CTCCTGGAAT AAGCAAAGAA GGTCCTCCAG GAGACCCAGG TCTCCCTGGC 
2951 AAAGATGGAG ACCATGGAAA ACCTGGAATC CAAGGGCAAC CAGGCCCCCC 
3001 AGGCATCTGC GACCCATCAC TATGTTTTAG TGTAATTGCC AGAAGAGATC 
3051 CGTTCAGAAA AGGACCAAAC TATTAGTGTC TGATGCCTCA TTCAGCAGCC 
3101 TAGGCATGGT GCTTTTTCTG TGGTCTTTTG CATCTCAGGA AGATAACCAA 
3151 CAGTATCCCT TGAAAAGAAA CTTAAGTACC TCGGTGTTTT TATTTTTTTT 
3201 TTCTTATGGA AAAAAATATA AAAGATCACA TATACTGATT TTAAAGGCTC 
32 51 CTCAGTCATT TGGAGCCCTT GGATTAGCAG CATTAATTAA ATCTCAAGGG 
3301 TTTCTTGTAA AGTCCATTTA TGTTAATCAA AGTTGAATAT AAAAATCCAC 
3351 CATTGCCTGT TAGCCAGTCA GTTTTAGTCA CTGTGAAATA TTTCACATTC 
3401 AGCCTCCATG CAGTAGAGAT TTGAGTTTAA TTTCATGTCC ATGTGACTTT 
3451 CATGTTTCCT ATCTCATAGC TCATGCTACT ACATAAGCCA AAACATGTAT 

3 501 CTCATCATTG GAAGTAAGAT CAGGGCTGAT ATTCACCTGG GATAGACAGT 
3551 ATTGGTGAAC TACTCATTTA CTACAGTGTC TCAGCCTTGA TAAAGGGCAG 
3601 TGGATTGCCT GTTGTTCGGT GTTGTGAATA GCACCTCTGA ATAAGATTAG 
3651 AGTGTTTCTT AATTCATTTC AAACTCTAAA ATTAGATTAA TGGTGGTGCT 
3701 AAGAAAGAGT ATTAATTACT TTGGGAATGG TCAAAATTAA CATTAAAAAC 
3751 ATTTTAGACA AAAAGTT T C A TTGTACATTC AAAGAAAATG TAAGTTTGGA 
3801 AGTACTAAAA GACTATTTTA TACTTGTTGA TTAATCGGAA TGTTTGTTGT 
3851 ATGCCTTCAT TTTCCATTTC ACTTATATGT GCATGTCCAT ATATGTTAAT 
3901 TTTCATTGTA GCAAAGCTAA TGGAAATAAA GCTAATGCTC TAGTTGAAAG 
3951 AAAAGGAAAA CTCCTGAAAT CCTAGAATGT CTTGTTATTT TTAGCTGACT 

4 001 GTAAAATATT ATGAACAGTC TTTGTGTATT GTGCTTAATG CTTTTGTAAG 
4 051 AAACAGAATT TGAAATATTT CATCCTTGTC ATGCTCAAAA TTTTGTTACA 
4101 TGCTTGTTAT TCAGAGTATA ATAAAGTTTT GTACAGGCCT GAAAAAAAAA 
4151 AAAAAAAAAA 



BLAST Results 



Entry HS682J15 from database EMBLNEW : 

Human DNA sequence *** SEQUENCING IN PROGRESS **» from clone 682J15 
Score = 6240, P = 0.0e+00, identities = 1256/1263 
13 exons matching Bp 2015-4118 

Entry HS708F5 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 708F5 
Score = 2775, P = 1.0e-221, identities = 739/912 
10 exons matching Bp 5-1745 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 203 bp to 3073 bp; peptide length: 957 
Category: similarity to known protein 



1 MAHYITFLCM VLVLLLQNSV LAEDGEVRSS CRTAPTDLVF ILDGSYSVGP 

51 ENFEIVKKWL VNITKNFDIG PKFIQVGVVQ YSDYPVLEIP LGSYDSGEHL 

101 TAAVESILYL GGNTKTGKAI QFALDYLFDK SSRFLTKIAV VLTDGKSQDD 

151 VKDAAQAARD SKITLFAIGV GSETEDAELR AIANKPSSTY VFYVEDYIAI 

201 SKIREVMKQK LCEESVCPTR I P VAARDERG FDILLGLDVN KKVKKRIQLS 

251 PKKIKGYEVT SKVDLSELTS NVFPEGLPPS YVFVSTQRFK VKKIWDLWRI 

301 LTIDGRPQIA VTLNGVDKIL LFTTTSVING SQVVTFANPQ VKTLFDEGWH 

351 QIRLLVTEQD VTLYIDDQQI ENKPLHPVLG ILINGQTQIG KYSGKEETVQ 

401 FDVQKLRIYC DPEQNNRETA CEIPGFNGEC LNGPSDVGST PAPCICPPGK 

451 PGLQGPKGDP GLPGNPGYPG QPGQDGKPGY QGIAGTPGVP GSPGIQGARG 

501 LPGYKGEPGR DGDKGDRGLP GFPGLHGMPG SKGEMGAKGD KGSPGFYGKK 

551 GAKGEKGNAG FPGLPGPAGE PGRHGKDGLM GSPGFKGEAG SPGAPGQDGT 

601 RGEPGIPGFP GNRGLMGQKG EIGPPGQQGK KGAPGMPGLM GSHGSPGQPG 

651 TPGSKGSKGE PGIQGMPGAS GLKGEPGATG SPGEPGYMGL PGIQGKKGDK 

701 GNQGEKGIQG QKGENGRQGI PGQQGIQGHH GAKGERGEKG EPGVRGAIGS 

751 KGESGVDGLM GPAGPKGQPG DPGPQGPPGL DGKPGREFSE QFIRQVCTDV 

801 IRAQLPVLLQ SGRIRNCDHC LSQHGSPGIP GPPGPIGPEG PRGLPGLPGR 
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851 DGVPGLVGVP GRPGVRGLKG LPGRNGEKGS QGFGYPGEQG PPGPPGPEGP 
901 PGISKEGPPG DPGLPGKDGD HGKPGIQGQP GPPGICDPSL CFSVIARRDP 
951 FRKGPNY 

BLASTP hits 
Entry HSC0L7A1X_1 from database TREMBL : 

gene: "COL7A1"; product: "collagen type VII"; Homo sapiens (clones: 
CW52-2, CW27-6, CW15-2, CW26-5, 11-67) collagen type VII intergenic 
region and (COL7A1) gene, complete cds. 

Score = 949, P = 3.4e-122, identities = 237/553, positives = 281/553 
Entry CA17_HUMAN from database SWISSPROT: 

COLLAGEN ALPHA l(VII) CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LC 
COLLAGEN). >TREMBL:HSCOL7Al_l gene: "COL7A1"; product: "alpha-1 type 
VII collagen"; Human alpha-1 type VII collagen (COL7A1) mRNA, complete 
cds . 

Score = 949, P = 3.6e-122, identities = 237/553, positives = 281/553 



Alert BLASTP hits for DKFZphfbr2_2b5, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2b5, frame 2 
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[Huma 8e-62 


EC] 


3.1.1.7 Acetylcholinesterase 7e-24 




PIRKW] 


blocked amino end le-43 




PIRKW] 


duplication 7e-46 




PIRKW] 


cornea le-35 




PIRKW] 


lung 2e-40 




PIRKW] 


leukocyte le-42 




PIRKW] 


skin le-40 




PIRKW] 


transmembrane protein le-37 




PIRKW] 


cartilage 3e-59 




PIRKW] 


hydroxylysine 4e-62 




PIRKW] 


connective tissue 3e-43 




PIRKW] 


triple helix 5e-82 




PIRKW] 


homot rimer 2e-37 




PIRKW] 


bone 6e-40 




PIRKW] 


Alport syndrome le-42 




PIRKW] 


laminin binding 2e-40 




PIRKW] 


liver 2e-40 




PIRKW] 


glycoprotein 5e-82 




PIRKW] 


carboxylic ester hydrolase 7e-24 




PIRKW] 


disulfide bond 7e-46 




PIRKW] 


cell binding 7e-4 6 




PIRKW] 


heterotrimer 4e-62 




PIRKW] 


calcium binding 8e-28 




PIRKW] 


alternative splicing 5e-82 




PIRKW] 


coiled coil 5e-82 




PIRKW] 


basement membrane 7e-4 6 




PIRKW] 


trimer 5e-82 




PIRKW] 


pyroglutamic acid 3e-43 




PIRKW] 


hydroxyproline 4e-62 




PIRKW] 


extracellular matrix 5e-82 




PIRKW] 


chondroitin sulfate proteoglycan 6e-41 




PIRKW] 


sulfoprotein 7e-39 




PIRKW] 


kidney le-42 




PIRKW] 


angiogenesis inhibitor 6e-36 




PIRKW ] 


Ehlers-Danlos syndrome 2e-40 




SUPFAM] 


fibronectin type III repeat homology 5e-82 




SUPFAM] 


scavenger receptor cysteine-rich domain homology le-37 




SUPFAM] 


C-type lectin homology 6e-30 




SUPFAM] 


collagen alpha 2(1) chain 5e-40 




SUPFAM] 


collagen alpha 1(1) chain 6e-44 
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[SUPFAM] fibrillar collagen carboxyl-terminal homology 6e-44 

[SUPFAM] animal Kunitz-type proteinase inhibitor homology 2e-38 

[SUPFAM) fibronectin type II repeat homology 6e-21 

[SUPFAM] complement Clq carboxyl-terminal homology le-38 

[SUPFAM] collagen alpha 3 (VI) chain 2e-31 

[SUPFAM] collagen alpha 1(IV) chain 7e-46 

[SUPFAM] collagen alpha 1(VI) chain 2e-37 

[SUPFAM] von Willebrand factor type C repeat homology Se-44 

[SUPFAM] unassigned collagens 4e-62 

[SUPFAM] von Willebrand factor type A repeat homology 5e-82 

[SUPFAM] collagen alpha KXIV) chain 5e-82 

[SUPFAM] pulmonary surfactant protein D 6e-30 

[SUPFAM] collagen alpha 1 (V) chain 7e-39 

[SUPFAM] collagen alpha l(VIII) chain le-38 

[SUPFAM] EGF homology le-35 

[PROSITE] AMIDATION 3 

[PROSITE] MYRISTYL 14 

[PROSITE] CK2_PHOSPHO_SITE 13 

[PROSITE] PKC_PHOSPHO_SITE 8 

[ PROSITE] ASN_GLYCOS YLATION 2 

[ PFAM] von Willebrand factor type A domain 

[KW] Irregular 

[KW] 3D 

[KW] SIGNAL_PEPTIDE 23 

[KW] LOW_COMPLEXITY 24.24 % 

SEQ MAHYITFLCMVLVLLLQNSVLAEDGEVRSSCRTAPTDLVFILDGSYSVGPENFEIVKKWL 

SEG 

latzB CCCEEEEEEEECCCCCCHHHHHHHHHHH 



SEQ VNITKNFDIGPKFIQVGVVQYSDYPVLEIPLGS YDSGEHLTAAVESILYLGGNTKTGKAI 

SEG 

latzB HHHHHHCCBTTTTEEEEEEEETTTEEEEETTTTTTTHHHHHHHHHHCCCCCCCCCHHHHH 

SEQ QFALDYLFDKSSRFLTKIAVVLTDGKSQDDVKDAAQAARDSKITLFAIGVGSETEDAELR 

SEG 

latzB HHHHHHHHCCTTTTTEEEEEEEECCCTTTTHHHHHHHHHHHCEEEEEEEECCCCCHHHHH 

SEQ AIANKPSSTYVFYVEDYIAISKIREVMKQKLCEESVCPTRIPVAARDERGFDILLGLDVN 

SEG 

latzB HHHGGGGGGGCECCHHHHHHHHHCHHHHHHHH 

SEQ KKVKKRIQLSPKKIKGYEVTSKVDLSELTSNVFPEGLPPSYVFVSTQRFKVKKIWDLWRI 

SEG 

latzB 



SEQ LTIDGRPQIAVTLNGVDKILLFTTTSVINGSQVVTFANPQVKTLFDEGWHQIRLLVTEQD 

SEG 

latzB 

SEQ VTLYIDDQQIENKPLHPVLGILINGQTQIGKYSGKEETVQFDVQKLRI YCDPEQNNRETA 

SEG 

latzB 

SEQ CEIPGFNGECLNGPSDVGSTPAPCICPPGKPGLQGPKGDPGLPGNPGYPGQPGQDGKPGY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ QGIAGTPGVPGSPGIQGARGLPGYKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGD 

SEG XX 

latzB 

SEQ KGSPGFYGKKGAKGEKGNAGFPGLPGPAGEPGRHGKDGLMGSPGFKGEAGSPGAPGQDGT 

SEG xxxxxxxxxxxxx 

latzB 

SEQ RGEPGIPGFPGNRGLMGQKGEIGPPGQQGKKGAPGMPGLMGSNGSPGQPGTPGSKGSKGE 

SEG xxxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGIQGMPGASGLKGEPGATGSPGEPGYMGLPGIQGKKGDKGNQGEKGIQGQKGENGRQGI 

SEG xxxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGQQGIQGHHGAKGERGEKGEPGVRGAIGSKGESGVDGLMGPAGPKGQPGDPGPQGPPGL 

SEG xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 

latzB 

SEQ DGKPGREFSEQFI RQVCTDVIRAQLPVLLQSGRIRNCDHCLSQHGSPGI PGPPGPIGPEG 

SEG xxxxx xxxxxxxxxxxxxxxx 
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latzB 

SEQ PRGLPGLPGRDGVPGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPEGP 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX xxxxxxxxxxxxxxxxxxx 

latzB 



SEQ PGISKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARRDPFRKGPNY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 



Prosite for DKFZphfbr2_2b5 . 2 



nf 1 A AH A 1 

PSUUUU1 


62 


!->66 


ASN GLYCOSYLATION 


PDUL.UUUU1 


PSUUUOl 


329- 


•>333 


ASN GLYCOSYLATION 


r L)tA,UUUU± 


PbUUUUb 


30->33 


PKC PHOSFHU 


SITE 




C* A A A A C 


116- 


•>119 


PKC PHOSPHO_ 


SITE 


r> r\ AAA AI^ 


PS00005 


131- 


■>134 


PKC PHOSPHO 


SITE 


PDULUUUUD 


PS00005 


250- 


■>253 


PKC_PHOSPH0_ 


SITE 


I\f\AAAA A A C 

PDOCUUUUb 


PS00005 


260- 


•>2 63 


PKC PHOSPHO 


SITE 


T*T\f*\/^ A A A A C 


PS00005 


286- 


>289 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


393- 


■>396 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


811- 


■>814 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


147- 


■>151 


CK2 PHOSPHO 


SITE 


n r\ /\ f, f\ n /C 


PS00006 


172- 


•>176 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


261- 


>265 


CK2_PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


343- 


•>347 


CK2_PHOSPHO_ 


_SITE 


PDOC00006 


PS00006 


357- 


>361 


CK2_PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


393- 


•>397 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


419- 


>423 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


531- 


•>535 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


600- 


■>604 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


657- 


>661 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


681- 


■>685 


CK2 PHOSPHO 


"site 


PDCC0U006 


PS00006 


750- 


■>754 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00006 


754- 


•>758 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


92 


:->98 


MYRISTYL 




PDOC00008 


PS00008 


112- 


•>118 


MYRISTYL 




PDOC00008 


PS00008 


236- 


■>242 


MYRISTYL 




PDOC00008 


PS00008 


276- 


•>282 


MYRISTYL 




PDOC00008 


PS00008 


380- 


•>386 


MYRISTYL 




PDOC00008 


PS00003 


494- 


•>500 


MYRISTYL 




PDOC00008 


PS00003 


527- 


•>533 


MYRISTYL 




PDOC00008 


PS00008 


596- 


>602 


MYRISTYL 




PDOC00008 


PS00008 


638- 


•>644 


MYRISTYL 




PDOC00008 


PS00008 


650- 


■>656 


MYRISTYL 




PDOC00008 


PS00008 


653- 


>659 


MYRISTYL 




PDOC00008 


PS00003 


665- 


•>671 


MYRISTYL 




PDOC00008 


PS00008 


743- 


>749 


MYRISTYL 




PDOC00008 


PS00008 


746- 


•>752 


MYRISTYL 




PDOC00008 


PS00009 


547- 


>S51 


AMIDATION 




PDOC00009 


PS00009 


628- 


>632 


AMIDATION 




PDOC00009 


PS00009 


694- 


>698 


AMIDATION 




PDOC0C009 



Pfam for DKFZphf br2_2b5 . 2 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



von Willebrand factor type A domain 

*DIVFLIDGSdSIGpqNFNrMKDFIeRMMERMDIgPDwIRVGVVQYSdNP 
D+VF++DGS S+GP NF+++K+ ++++ ++DIGP+ I+VGVVQYSD P 
37 DLVFILDGSYSVGPENFEIVKKWLVNITKNFDIGPKFIQVGVVQYSDYP 



85 



RqEmrFmFNDYQNKeEILQalqqMM/WMgggTNTGeAIQYVvrNMFweer 
E +++ Y + E++++A+ ++ ++GG T+TG AIQ++++++F +++ 
86 VLE — IPLGSYDSGEHLTAAVESIL-YLGGNTKTGKAIQFALDYLFDKSS 132 

GmRWenvPQVMIIITDGRSQDDIRDpIneMrrmaGIqvFalGIGNhDNnn 
+ +++++++TDG+SQDD++D+++++R+ 1+ FAIG+G 

133 RF LTKIAVVLTDGKSQDDVKDAAQAARD-SKITLFAIGVGSETE— 175 

WeELRelASePdEdHVFyVdDFeeLdnMqeqL* 
+ELR IA++P++ +VFYV+D+ +++ ++E + 
176 DAELRAIANKPSSTYVFYVEDYIAISKIREVM 207 
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DKFZphfbr2_2cl 
group: brain derived 

DKFZphfbr2_2cl encodes a novel 697 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 3973 bp 

Poly A stretch at pos. 3914, polyadenylation signal at pos. 3900 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



GGGGGGATTT 
GAGAAAGTTT 
CCTCGTTCCC 
CCTTGTCTGC 
GTTGGGAGAG 
AATTTCCACA 
TACTGGATGA 
ACATTGCTGG 
CTCTATTTAG 
GCATTCTTCA 
TTGCTATATG 
TTAATTTTGG 
CATACTCTAT 
ATCTTTGGTT 
TCCTTTAAAA 
ATCCATAGTG 
ATGTCCGTCA 
GTTGGATTTG 
TGTCATTTTG 
TGAAATCTTT 
TTTTTTTCCT 
TTTTATTTGC 
TTTCAGTAAC 
AGAAGACTTT 
TCTTTCCGCA 
CTGGCTTTTC 
CTTTTAACTC 
ATATTTTACT 
CCAAAGGGAT 
AGTCTTCTTG 
TGGAATTTTC 
CTCATGGGCT 
GGATATGCTA 
AACACTGCTT 
GCATGCTCAA 
TATGGATGTG 
CAAACTAAAA 
ATACGTATAT 
GCTCTAGCAG 
GAGAGAAAAG 
GCGAAAATTC 
TATATTGCAG 
AGCTGACCCG 
ACTGCAACTC 
AAAGCAGTAT 
GCCAACGGGA 
TTACATATCC 
TTTTGGATCT 
TTGGTTTCTT 
AATCTTAATT 
AATTATCACT 
AATACCTTGC 
TTATGGTATA 
TAGGGAGTCA 
CTGCACTTTC 



CGGCGGCGGA 
ACGCCGACAC 
CAAGGCGTGC 
TCTCCTCCTT 
CGCGTTGGTG 
AGAGAAAATG 
AATACAAGCG 
TTACATGTGT 
CAATGACAAG 
GTGCGGTATC 
CTTAGGTCTA 
TAATTTTTAT 
TACTATTTTT 
TGGATTCTTG 
ATGATGTAAA 
TTAAGGATAT 
TCGGCCCACT 
CCATTGCCAG 
CTTGTTGTAG 
CTTAGCTATT 
CATTGGAAAC 
CTGATAACTG 
TGAAAGATGG 
CAGTCGTTTT 
TTCAAACTTA 
CATTTTTGGA 
TTTGGGGATT 
CACAGGACAG 
GCGCCATTTT 
CAACAGCGAT 
TTGAGCATGT 
CTTCCATGAA 
TTGTGATTCC 
CCCCCAGAAC 
TGCTATCCAA 
ACTATTCCAC 
GCTTTCCTCG 
TTTGTATTAC 
GTGGAGATAC 
AATGGTTCCT 
AACCCCTTGG 
TGCAAGGAGC 
CCACAGCTAG 
CTGTAATAAC 
ATGGTGTGTC 
AGCGATGTGG 
CCTAGTGCAT 
GCAAAACTTG 
CCTACTGTGC 
TGGACCCCAA 
AACTTGCCAT 
TACTTCTGTA 
TATAAGGCGT 
AATGCTTACT 
AGGAATGTTT 



AACATGGCGG 
TGGCCTGTAT 
CGCCTCCCTG 
TTTGGTTTGG 
GCGACGGCCG 
TTGAAATAGG 
GTTAATTTTT 
AAATCACTGC 
ACTGGAAGAA 
TTGGCTATTT 
TACGTAAGAT 
TCTTGGTCTT 
CAATGGAAGC 
CTTGGCCTCC 
AGAAGAATCA 
TGTGCTCTCT 
TTACTAACCA 
CACAACTATG 
CTCTGGCTAT 
CCAAACTTAG 
TCCCAAAAAT 
ATCCTTTCCT 
AAACCCTTTT 
TGCTGGAATG 
GAGACACTCA 
ATTTTCAGGA 
CCATACCAAA 
ATTACAATAG 
TGCTTGATTT 
TTTGGGAGCA 
TCCTAATCGT 
TTGGGTAACT 
CACCAACTTC 
ATGTACAGGA 
AGATTTTTTG 
AAGTGGACTG 
AACTTCGGAC 
AGTGGGCACA 
ACTACGCCTT 
TTTGTTCCCG 
GTGAAAGAAG 
AGAGTTGATA 
GTGACTTTAC 
ATCTGCTGGA 
AAAACGGTGG 
CCAAGCACTG 
TTGGCAAATT 
TTTTAGGTGC 
TGGACACAGG 
AGCGGGATAT 
TTTTTGTATG 
GCTGCTCTCA 
TGGGAAAAAA 
GTAAATGCAT 
GCTTATGGTC 



TCGCGGCCGG 
TAGCGCGTAT 
TTCTCAGTCG 
TTTTGGAACT 
AGTCAGATCA 
AGTTGCGGAT 
GTAACGTGAG 
GTTATTGCTT 
GTAAATAGAG 
AGCCAGAATC 
GGGAAAAAAC 
TTTGTTCTTG 
AGCAAGTTTA 
TATGTTTTCT 
ACCAAATATT 
GGTGGAGAGA 
CAGTTGAATT 
TTGGTGGAGA 
GCTGATTATT 
TTATTTTTGC 
CCGATTGCTT 
TGACATTTAT 
TGTACCGTCG 
ATTGAGCTTA 
CCTCTGGTAT 
TGATTTGTCA 
TTAAATGACT 
CCTTGATAGA 
CAGAGCAGTT 
GTTTCCTGGC 
TTTGCCATTG 
GTTTAGGAGG 
TGCAGTCCTG 
GTTAAATTTG 
CATATCATAT 
TCATTTGATA 
AGTGGATGGA 
CCCATGGTAC 
GACACACTTA 
GCTTATTATC 
TGAGGAAAAT 
AAAACAGTAG 
AAAAGACTGG 
CTGAAAAGGG 
AGTGACTACA 
GATGTTACAC 
GGTTATGCGG 
TTGAAAAGAT 
ACAAGGCTTC 
TAATAAGCAC 
CTGTATTTTT 
CTTTGTCTTT 
CATTTTATAA 
AAGAGACGTT 
CTGATTAGAA 



GCCGGTAACG 
GGCCTCGGGC 
CAGGCTGAAG 
GACTCCGAGG 
CTATAAACAA 
ACATTGGATA 
GGAAAAGCCC 
TAGTCATTGT 
AAGTGAACAT 
AATTTATTGG 
AGCAAATTCC 
GAATCGCCAG 
AGTCTCTCCA 
TGATAATTCA 
TGCTTCTAAC 
ATTTCTGGCT 
TCTGGAGCTT 
AGTCTCTGAG 
GATCTGAGAA 
AGTTTTGTTA 
TTGCG7GTTT 
TTTAGTGGAC 
AAGAATTTGC 
CATTTTTTAT 
TTTGTAATAC 
TATTATTTTT 
GCCATAAAGT 
ATCATGGCAT 
GGTGTTCTTT 
AGCCAACAAA 
GAATCCATGG 
AACATCTGTT 
ATGGTCAGCC 
AGGTCTACTG 
GATTGAGACC 
CTCTGCATTC 
CCCAGACATG 
AGGAGAGTGG 
TAGAATGGTG 
GTATTAGACA 
TAATGACCAG 
ATATTGAAGA 
GTAGAATATA 
ACGCACAGTG 
CTCTGCATTT 
TTTCCTCGTA 
TCTGAACCTT 
TAAAAATGAG 
AAACTTGTCA 
TCATACTACC 
ATTTGTGGAA 
TCTTAAGTAA 
TGAAAGTATG 
AAAAATAACA 
AGAAACAGTT 
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27 51 GTCTATGCTC TGCAATGGTC AATGATGAAT TACTAATGCC TTATTTTCTA 

2801 GGCATATAAT AATAGTTTAG AGAATGTAGA CCAGATAAAT TTGTTTACTG 

2 8 51 TTTTAAGAAA ACTACCAGTT TACTTACAGA AGATTCTTTT TTCCAAACAG 

2901 TAGGTTTCAT CCAAGACCAT TTGAAGAACT GCAAACTCTT TCTCTTAGAA 

2 951 AAGAAAGAGG GCAGCCTAAA ATAAACGCAA AATTTGCTTA TACTCCATCA 

3001 CATTCAGATG TCTTGGTTGT GACTTATTAC CAGTGTGGCA GAGAACCCAA 

3051 GTTACATTTT AGATCAAAAT ATTCTTTATG TAGGTATTGT TAAAAGGCTA 

3101 GAGCCTACAA GTTGCTCTTC CATGCGTTGG TCAGGGGGCC CTGAAAACAC 

3151 TGGTAATATT AAGAGTCTTT CTCAGGGTAA CTTAATGTTT TCTTAATGAA 

32 01 CAGTGTTTCC AGCTACAAAT TCTTCCAATA AATTGTCTTC CTTTTTGAAA 

3251 AGTACTCTCA TAGAAGAAAT TTAGCAATTT CTCGTTGACT GACTCAGTCT 

3301 ATTTTAAGTA TTCAGAAAAG ATTTTGATCC CCATTGAGTT AATGCTCTGC 

3351 CTTGAAAATT ATTTTTCTGA TCCTTGTTAG TGATAACATT TTTTTTCTAC 

34 01 TGAAGGTCAG AGGATAGGAA ACAAGTATTT CTCTTCTGGT ATACATGTAA 

3451 TGTATTCTGT AAAAAAGTAT TCATATTGGC AATTTTAGTT AGGCATAATA 

3501 TTGTGGTTGT AATTTTTAAA ACTTAGTGTT TTGTCTGATT AAAGCAGGCA 

3551 CTGATCAGGG TATCTCCTAA GAGGTAATTC ACTTCTTATT CCTTTCCAAT 

3601 AATTATTACA TTCTAAATTT TCATCTATGA GAAATAACAA ACAAGAAGGG 

3651 AATAGAATTA AATTGGGGTA TAATCTAATC TTCATTGTTT AAATGGTTTG 

3701 CCTTCTCACC ATTGAAGCCA TTTTTTTATA GCCTCAGAAA GAGGAAATAA 

3751 TGCCTCCACC ATTTTCTACC TGGTGACTTG AAAATTGAAC TTTTAAGTTA 

3801 GGAAGAAGTT AGAGTCAGGG AACTTGTATA CCACTATCTA TGCAGCATTG 

38 51 TTATAGTCTG ATTATTTCTG TGTTTTGAAT ATGATTTTCC TAATGCTCTA 

3901 AATAAAATTT TGTTAAAAAT CAAAAAAAAA AAAAAAAAAA CTTATCGATA 
3951 CCGTCGACCT CGATGATGTC GAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 365 bp to 2455 bp; peptide length: 697 
Category: putative protein 
Classification: unset 



1 MCKSLRYCFS 
51 GLYVRWEKTA 
101 FLLGLLCFLD 
151 PTLLTTVEFL 
201 A1PNLVIFAV 
251 RWKPFLYRGR 
301 FGIFRMICHI 
351 HFCLISEQLV 
4 01 HELGNCLGGT 
4 51 IQRFFAYHMI 
501 YYSGHTHGTG 
551 PWVKEVRKIN 
601 NNICWTEKGR 
651 VHLANWLCGL 



HCLYLAMTRL 
NSLILVIFIL 
NSSFKNDVKE 
ELVGFAIAST 
LLFFSSLETP 
ICRRLSVVFA 
I FLLTLWGFH 
FFSLLATAIL 
SVGYAIVIPT 
ETYGCDYSTS 
EWALAGGDTL 
DQYIAVQGAE 
TVKAVYGVSK 
NLFWICKTCF 



EEVNREVNMH 
GLFVLGIASI 
ESTKYLLLTS 
TMLVEKSLSV 
KNPIAFACFF 
GMIELTFFIL 
TKLNDCHKVY 
GAVSWQPTNG 
NFCSPDGQPT 
GLS FDTLHSK 
RLDTLIEWWR 
LIKTVDIEEA 
RWSDYTLHLP 
RCLKRLKMSW 



SSVRYLGYLA 
LYYYFSMEAA 
IVLRILCSLV 
ILLVVALAML 
ICLITDPFLD 
SAFKLRDTHL 
FTHRTDYNSL 
IFLSMFLIVL 
LLPPEHVQEL 
LKAFLELRTV 
EKNGSFCSRL 
DPPQLGDFTK 
TG3DVAKHWM 
FLPTVLDTGQ 



RINLLVAICL 
SLSLSNLWFG 
ERISGYVRHR 
IIDLRMKSFL 
IYFSGLSVTE 
WYFVIPGFSI 
DRIMASKGMR 
PLESMAHGLF 
NLRSTGMLNA 
DGPRHDTYIL 
IIVLDSENST 
DWVEYNCNSC 
LHFPRITYPL 
GFKLVKS 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2cl, frame 2 

PIR:A71148 hypothetical protein PH0395 - Pyrococcus horik03hii, N = 1, 
Score =96, P = 0 . 12 



>PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii 
Length = 288 

HSPs: 

Score = 96 (14.4 bits), Expect = 1.3e-01, P = 1.2e-01 
Identities = 59/234 (25%), Positives = 116/234 (49%) 
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Query: 


77 


Sbjct: 


57 


Query : 


133 


Sbjct: 


116 


Query : 


193 


Sbjct: 


169 


Query: 


252 


Sbjct: 


226 


Query: 


309 


Sbjct: 


283 



IASILYYYFSMEAASLSLSNLWFGFLL--GL--LCFLDNSSFKNDVKEESTKYLLLTSIV 132 
++ +LYY F+ A ++ L G+LL + L +L N + V+ + K + ++ 



++ T+ FL+LVG ++ +L E +L ++ L+ L + 
-VI FTLVFLKLVGLQYSTQVILAEVTLFLVFLLYDLTPKHV 168 



M SF + + F +LL F T +N I + FI 



P ++ R+ R S+++ + L TF +L +F L +T L ++IP F++ + ++ 
-PPRDFKRRV-ERFSMMYLQVTSLSTFTVLVSFVYLGNTDLLRQYLIP-FAVNVVLILLS 282 



Pedant Information for DKFZphfbr2_2cl, frame 2 
Report for DKFZphf br2_2cl . 2 



[LENGTH] 697 

[MW] 79741.46 

[pi] 8.41 

[KW] TRANSMEMBRANE 11 

[KW] LOW_COMPLEXITY 9.76 % 



SEQ MCKSLRYCFSHCLYLAMTRLEEVNREVNMHSSVRYLGYLARINLLVAICLGLYVRWEKTA 

SEG 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhcccccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ NSLILVIFILGLFVLGIASILYYYFSMEAASLSLSNLWFGFLLGLLCFLDNSSFKNDVKE 

SEG . . xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccceeeeccccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

MEM . . . MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ ESTKYLLLTSIVLRTLCSLVERISGYVRHRPTLLTTVEFLELVGFAIASTTMLVEKSLSV 

SEG xxxxxxxxxxxx xxxx 

PRD ccchhhhhhhhhhhhhhhhhhhceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM ....MMMMMMMMMMMMMMMMM MMM 

SEQ ILLVVALAMLIIDLRMKSFLAIPNLVIFAVLLFFSSLETPKNPIAFACFFICLITDPFLD 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhcccccccccchhhhhhhhhcccccee 

MEM MMMMMMMMMMMMMM . . .MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM. 

SEQ IYFSGLSVTERWKPFLYRGRICRRLSVVFAGMIELTFFILSAFKLRDTHLWYFVIPGFSI 

SEG 

PRD eeeccccccccccceeecccccccchhhhhhhhhhhhhhhhhhhccccceeeeeeccccc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FGIFRMICHIIFLLTLWGFHTKLNDCHKVYFTHRTDYNSLDRIMASKGMRHFCLISEQLV 

SEG 

PRD hhhhhhhhhhhhhhhhhcccccccceeeeeeeccccccchhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MM 

SEQ FFSLLATAILGAVSWQPTNGIFLSMFLIVLPLESMAHGLFHELGNCLGGTSVGYAIVIPT 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhheeehhhhhhhhhhccccccccccceeeeeeec 

MEM MMMMMMMMMMMMMMM . . . .MMMMMMMMMMMMMMMMM 

SEQ NFCSPDGQPTLLPPEHVQELNLRSTGMLNAIQRFFAYHMIETYGCDYSTSGLSFDTLHSK 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhccccccccccccchhhhhh 

MEM 

SEQ LKAFLELRTVDGPRHDTYILYYSGHTHGTGEWALAGGDTLRLDTLIEWWREKNGSFCSRL 

SEG 

PRD hhhhhhhhhccccccceeeeeeccccccccceeeccccchhhhhhhhhhhhccccceeee 

MEM 

SEQ IIVLDSENSTPWVKEVRKINDQYIAVQGAELIKTVDIEEADPPQLGDFTKDWVEYNCNSC 
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SEG 

PRD eeeeecccccccchhhhhhccceeeeccceeeeeeeecccccccccccccceeeeccccc 

MEM 

SEQ NNICWTEKGRTVKAVYGVSKRWSDYTLHLPTGSDVAKHWMLHFPRITYPLVHLANWLCGL 

SEG 

PRD cceeeecccceeeeeeeecccccceeeecccccchhhhhhhcccccccchhhhhhhhhcc 

MEM 

SEQ NLFWICKTCFRCLKRLKMSWFLPTVLDTGQGFKLVKS 

SEG 

PRD eeeeeehhhhhhhhhhhhhhcceeeeccccccccccc 

MEM 



(No Prosite data available for DKFZphf br2_2cl . 2 ) 
(No Pfam data available for DKFZphf br2_2cl . 2 ) 
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group: signal transduction 

DKFZphfbr2_2cl7 . 3 encodes a novel 446 amino acid protein with similarity to yeast YMR131C and 
mammalian retinoblastoma-binding protein RbAp46 

The protein contains 1 WD-4Q repeat, which is typical for the beta-transducin subunit of G- 
proteins . The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to YMR131C and retinoblastoma-binding protein RbAp46 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

insert length: 2248 bp 

Poly A stretch at pos. 2230, polyadenylation signal at pos . 2200 



1 TGGGGAAGAT GGCGGCGCGC AAGGGTCGGC GTCGCACGTG TGAAACCGGG 
51 GAACCCATGG AAGCCGAGTC CGGCGACACA AGTTCCGAGG GCCCGGCCCA 
101 GGTCTACCTG CCCGGCCGGG GGCCGCCGCT ACGCGAAGGG GAGGAGCTGG 
151 TCATGGACGA GGAGGCCTAT GTGCTCTACC ACCGAGCGCA GACTGGCGCC 
201 CCCTGTCTCA GCTTTGACAT AGTCCGGGAT CACCTGGGAG ACAACCGGAC 
251 AGAGCTTCCT CTTACACTTT ACTTGTGTGC TGGGACCCAG GCTGAGAGCG 
301 CCCAGAGCAA CAGACTGATG ATGCTTCGGA TGCACAATCT GCATGGGACA 
351 AAGCCCCCAC CCTCAGAGGG CAGTGATGAA GAAGAAGAGG AGGAAGATGA 
401 AGAGGATGAA GAAGAGCGGA AACCTCAGCT GGAGCTGGCC ATGGTGCCCC 
451 ACTATGGTGG CATCAACCGA GTTCGGGTGT CATGGCTGGG TGAAGAGCCT 
501 GTGGCTGGGG TGTGGTCAGA GAAGGGCCAG GTGGAGGTGT TTGCGCTGCG 
551 GCGGCTTCTG CAGGTGGTGG AGGAGCCCCA GGCCCTGGCA GCCTTCCTCC 
601 GGGATGAGCA GGCCCAAATG AAGCCCATCT TCTCCTTCGC TGGACACATG 
651 GGCGAGGGCT TTGCCCTTGA CTGGTCCCCC CGGGTGACCG GTCGCCTGCT 
701 GACCGGTGAC TGTCAAAAGA ACATCCACCT CTGGACACCT ACGGACGGCG 
751 GCTCCTGGCA CGTGGACCAG CGGCCATTCG TGGGCCACAC ACGCTCTGTG 
801 GAGGACCTGC AGTGGTCACC GACTGAGAAC ACGGTGTTTG CCTCCTGCTC 
851 AGCTGACGCC TCCATCCGCA TCTGGGACAT CCGGGCAGCC CCCAGCAAGG 
901 CCTGCATGCT CACCACAGTC ACCGCCCATG ATGGGGACGT CAATGTCATC 
951 AGCTGGAGCC GCCGGGAGCC CTTCCTGCTC AGTGGCGGGG ATGATGGGGC 
1001 CCTCAAGATC TGGGACCTTC GGCAGTTCAA GTCTGGTTCC CCAGTGGCCA 
1051 CCTTCAAGCA GCACGTGGCC CCCGTGACCT CCGTCGAGTG GCACCCCCAG 
1101 GACAGCGGGG TCTTTGCAGC CTCGGGTGCA GACCACCAGA TCACACAGTG 
1151 GGACCTGGCA GTGGAGCGGG ACCCTGAGGC GGGCGACGTG GAGGCCGACC 
1201 CCGGACTGGC CGACCTCCCG CAGCAGCTGC TGTTCGTGCA CCAGGCCGAG 
1251 ACCGAGCTGA AGGAGCTGCA CTGGCACCCG CAGTGCCCAG GGCTCCTGGT 
1301 CAGCACGGCG CTGTCAGGCT TCACCATCTT CCGCACCATC AGCGTCTGAG 
1351 GCGTCCCACT GGCTCTGATC TTGCTTCCTG CTTGGAAACT GAAGTCGAAT 
1401 TGGGCTCCCC TGGAAGGGGT TCATTCAGGT CTGTTGACTG AGACTGGCCG 
1451 GCCTGTGGGC TGCCGTGATG GATTCTGTTT GACGTATTGT TCTCTAGAAG 
1501 GCCTGGCTCT GATCCAGTGA CCCCTCTCAC CAAAGAACTC GGTTTAACCA 
1551 GGGCTCTGTA AGACCACTCC CACCCAGAGA CTTGTGTGGC CTGGTGTGGC 
1601 CTGTGTGTCG GATTCCTTCC TGTCAGCTGT GACCCATTTG ACCTGTGTCC 
1651 CCAGAACCCA GTTTTTTGTT TGTTTGTTTG AGACGGAGTC TTGGTCTGTC 
1701 GCCCAGGCTG GAGTGCAGTA GCACGATCTT GGCTCACTGC AACCTCCGCC 
1751 TCCTGGGTTA AAGTGATTCT CTCAGCTCAG TCTCCCAGGT AGCTGGGATT 
1801 ACAGGCATGT GCCACCACAC CCCGTTAATT TTTGTATTTT TAGTAGAGAC 
1851 GGGGTTTCAC CATGTTGGCC AGGCTGGTCT CAAATTCTTG ATCTCAAGTG 
1901 ATCTGTCCGC CCCGGCCTCC CAGAGTGCTG GGTTGGGATT ACAGGCGTGA 
1951 GCCACCGCGT CCGGCTCAGG ACCCAGTTTT GGCTGCTGGT TCCCAGCAGG 
2 001 GGACTCGGGG GATATACAGT GGCTGCACCA AATTGGAGGT GTGGGTTCCT 
2051 CCAACACAAT TTGCTTCTGC CCGTTGTCTT CCTGCCAGCT GGGTTTGGCC 
2101 AGGATTTCTC CGTGTGGGGG CTACATGCGA CCCTCTCCCC TCCTCCCTGA 
2151 CTTTAGAGGC TGGTGCTGTG TCGGGAGGAA GGTCAGGGCT CCTGAGCAGC 
2201 AATAAAGGAC CAGGAAGAGG CCTGAGGTGG AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 9 bp to 1346 bp; peptide length: 446 
Category: similarity to known protein 
Classification: unset 
Prosite motifs: WD_REPEATS (323-338) 



1 MAARKGRRRT 
51 EEAYVLYHRA 
101 NRLMMLRMHN 
151 GINRVRVSWL 
201 QAQMKPIFSF 
2 51 HVDQRPFVGH 
301 LTTVTAHDGD 
351 QHVAPVTSVE 
4 01 ADLPQQLLFV 



CETGEPMEAE 
QTGAPCLSFD 
LHGTKPPPSE 
GEEPVAGVWS 
AGHMGEGFAL 
TRSVEDLQWS 
VNVISWSRRE 
WHPQDSGVFA 
HQGETELKEL 



SGDTSSEGPA 
IVRDHLGDNR 
GSDEEEEEED 
EKGQVEVFAL 
DWSPRVTGRL 
PTENTVFASC 
PFLLSGGDDG 
ASGADHQITQ 
HWHPQCPGLL 



QVYLPGRGPP 
TELPLTLYLC 
EEDEEERKPO 
RRLLQVVEEP 
LTGDCQKNIH 
SADASIRIWD 
ALKIWDLRQF 
WDLAVERDPE 
VETALSGFTI 



LREGEELVMD 
AGTQAESAQS 
LELAMVPHYG 
QALAAFLRDE 
LWTPTDGGSW 
IRAAPSKACM 
KSGSPVATFK 
AGDVEADPGL 
FRTISV 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2cl7, frame 3 

TREMBL : AC0059 17_1 4 gene: "F3P11.14"; product: "putative WD-40 repeat 
protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic 
sequence, complete sequence., N = 1, Score = 910, P = 2.7e-91 

PIR:S53061 hypothetical protein YMR131c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 691, P = 4.3e-68 

PIR:I49367 retinoblastoma-binding protein mRbAp46 - mouse, N = 1, Score 
= 338, P = l.le-30 

PIR:I39181 retinoblastoma-binding protein RbAp46 - human, N = 1, Score 
= 338, P = l.le-30 



>TREMBL:AC005917_14 gene: "F3P11.14"; product: "putative WD-40 repeat 

protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic sequence, 
complete sequence. 

Length =4 69 

HSPs: 



Score = 910 (136.5 bits), Expect = 2.7e-91, P = 2.7e-91 
Identities - 195/442 (44%), Positives = 259/442 (58%) 



Query: 


18 


EAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRAQTGAPCLSFDIVRDHLG 


77 






EA S + S P +V+ PG L +GEEL D AY H G PCLSFDI+ D LG 




Sbjct: 


18 


EASSSEIPSI-PTRVWQPGVDT-LEDGEELQCDPSAYNSLHGFHVGWPCLSFDILGDKLG 


75 


Query: 


78 


DNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKP PPSEGSDEEEEEEDEED- 


133 






NRTE P TLY+- AGTQAE A N + + ■)-+ N+ G + P + G+ E+E+E+DE+D 




Sbjct: 


76 


LNRTEFPHTLYMVAGTQAEKAAHNSIGLFKITNVSGKRRDVVPKTFGNGEDEDEDDEDDS 


135 


Query : 


134 


EEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFALRRLLQ 


185 






E + P.+++ V H+G +NR+R + W++ G V+V+ + L 




Sbjct: 


136 


DSDDDDGDEASKTPNIQVRRVAHHGCVNRIRAMPQNSH-ICVSWADSGHVQVWDMSSHLN 


194 


Query : 


186 


VVEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIHLWTPT 


245 






+ E + P+ +F+GH EG+A+DWSP GRLL+GDC+ IHLW P 




Sbjct: 


195 


ALAESETEGKDGTSPVLNQAPLVNFSGHKDEGYAIDWSPATAGRLLSGDCKSMIHLWEPA 


254 


Query : 


246 


DGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACMLTTVT 


305 






G SW VD PF GHT SVEDLQWSP E VFASCS D S+ +WDIR S A + 




Sbjct: 


255 


SG-SWAVDPIPFAGHTASVEDLQWSPAEENVFASCSVDGSVAVWDIRLGKSPAL SFK 


310 


Query : 


306 


AHDGDVNVISWSRREPFLL-SGGDDGALKIWDLRQFKSGSPV-ATFKQHVAPVTSVEWHP 


363 






AH+ DVNVISW+R +L SG DDG I DLR KG V A F+ H P+TS+EW 




Sbjct: 


311 


AHNADVNVI SWNRLASCMLASGSDDGTFSIRDLRLIKGGDAVVAHFEYHKHPITSIEWSA 


370 
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Query: 364 QDSGVFAASGADHQITQWDLAVERDPE AGDVEADPGLADLPQQLLFVHQGETEL 417 

++ A + D+Q+T WDL++E+D E A E DLP QLLFVHQG+ +L 

Sbjct: 371 HEASTLAVTSGDNQLTIWDLSLEKDEEEEAEFNAQTKELVNTPQDLPPQLLFVHQGQKDL 430 

Query: 418 KELHWHPQCPGLLVSTALSGFTIFRTISV 446 

KELHWH Q PG+++STA GF I ++ 
Sbjct: 431 KELHWHNQIPGMIISTAGDGFNILMPYNI 459 



Pedant information for DKFZphfbr2_2cl7 , frame 3 



Report for DKFZphfbr2_2cl7 . 3 



[LENGTH] 
[MWJ 
[pi] 
[HOMOL] 
Arabidopsis 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
palmitylati 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
YDL1 4 5c ] 5e 
[ FUNCAT] 
5e-09 
[ FUNCAT] 
TAF90 - TFI 
[FUNCAT] 
YMRl 1 6c] 5e 
[FUNCAT] 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
3e-06 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
2e-05 
[FUNCAT] 
2e-05 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

[S. 

[FUNCAT] 

[BLOCKS] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 



446 

49447 . 38 
4.82 

TREMBL:ACQ05917_14 gene: "F3P11.14"; product: "putative WD-40 repeat protein" 
thaliana chromosome II BAC F3P11 genomic sequence, complete sequence, le-90 
99 unclassified proteins [S. cerevisiae, YMR131c] 4e-65 

30.03 organization of cytoplasm [S. cerevisiae, YEL056w] 4e-15 

04.05.01.04 transcriptional control [S. cerevisiae, YEL056w] 4e-15 
06.07 protein modification (glycolsylation, acylation, myristylation, 
in, f arnesylation and processing) [S. cerevisiae, YEL056w] 4e-15 

04.05.01.07 chromatin modification [S. cerevisiae, YBR195c] 2e-13 
10.04.09 regulation of g-protein activity [S. cerevisiae, YBR195C] 2e-13 
06.10 assembly of protein complexes [S. cerevisiae, YBR195c] 2e-13 
03.16 dna synthesis and replication [S. cerevisiae, YBR195c] 2e-13 
09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 2e-13 

30.10 nuclear organization [S. cerevisiae, YPR178w] le-11 
04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] le-11 
06.13 proteolysis [S. cerevisiae, YGL003c] 4e-09 

03.22 cell cycle control and mitosis [S. cerevisiae, YGL003c] 4e-09 
30.09 organization of intracellular transport vesicles [S. cerevisiae, 



09 



08.07 vesicular transport (golgi network, etc.) 



04.05.01.01 general transcription activities 
ID subunit] 6e-09 

05.04 translation (initiation, elongation and termination) 

■08 



[S. cerevisiae, YDL145c] 
[S. cerevisiae, YBR198c 



[S. cerevisiae, 



02.16 fermentation [S. cerevisiae, YMR116c) 5e-08 

30.04 organization of cytoskeleton [S. cerevisiae, YLR429w] 3e-07 

30.19 peroxisomal organization [S. cerevisiae, YDR142c] 3e-06 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142c] 

08.10 peroxisomal transport [S. cerevisiae, YDR142c] 3e-06 

03.13 meiosis [S. cerevisiae, YLR129w) 4e-06 

08.01 nuclear transport [S. cerevisiae, YER107c] 4e-06 

03.01 cell growth [S. cerevisiae, YKL021C] 4e-06 

04.07 rna transport [S. cerevisiae, YER107c] 4e-06 

03.25 cytokinesis [S. cerevisiae, YCR057c] 2e-05 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YCR057c] 



01.01.04 regulation of amino-acid metabolism 



[S. cerevisiae, YIL046w] 



06.13.01 cytoplasmic degradation [S. cerevisiae, YIL046w] 2e-05 

04.01.04 rrna processing [S. cerevisiae, YLLOllw] 3e-05 

30.02 organization of plasma membrane [S. cerevisiae, YOR212w] 5e-05 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YOR212w] 5e-05 

10.05.07 g-proteins [S. cerevisiae, YOR212w] 5e-05 
BL00678 

d2trcb_ 2.51.3.1.1 Transducin (heterotrimeric G protein), gamm 5e-29 
plasma 6e-07 
duplication 4e-12 
hormone 6e-07 

transmembrane protein le-07 

stomach 6e-07 

actin binding le-07 

leucine zipper le-07 

signal transduction 2e-06 

heterotrimer 2e-06 

peripheral membrane protein 6e-07 

GTP binding 2e-06 

WD repeat homology le-63 

yeast coatomer complex alpha chain le-07 

GTP-binding regulatory protein beta chain 4e-07 

PRL1 protein 8e-09 
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[SUPFAM] MSI1 protein 4e-12 

[SUPFAM] coatomer complex beta 1 chain le-09 

[PROSITE] WD_RE PEATS 1 

[PFAM] WD domain, G-beta repeats 

[KM] All_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 3.14 % 

SEQ MAARKGRRRTCETGEPMEAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRA 

SEG 

IgotB 

SEQ QTGAPCLSFDIVRDHLGDNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKPPPSE 

SEG 

IgotB 

SEQ GSDEEEEEEDEEDEEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFAL 
SEG . . XXXXXXXXXXXXXX 

IgotB 

SEQ RRLLQVVEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIH 

SEG 

IgotB EEECCCCCEEEEEETTT-TCEEEEEETTTEEE 

SEQ LWTPTDGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACM 

SEG 

IgotB EEETTTT CEEEEEECCCCCEEEEEEETTTCE-EEEEETTTEEEEEETTT--TEEEE 

SEQ LTTVTAHDGDVNVISWSRREPFLLSGGDDGALKIWDLRQFKSGSPVATFKQHVAPVTSVE 

SEG 

IgotB EECBTTBTCCEEEEEETTTTTEEEEEETTTEEEEEE 

SEQ WHPQDSGV FAASGADHQITQWDLAVERDPEAGDVEADPGLADLPQQLLFVHQGETELKEL 

SEG 

IgotB 

SEQ HWHPQCPGLLVSTALSGFTIFRTISV 

SEG 

IgotB 

Prosite for DKF2phfbr2_2cl7 . 3 
PS00678 323->338 WD_REPEATS PDOC00574 

Pfam for DKFZphf br2_2cl7 . 3 

HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++GH+ V ++ +SP + +++S S D ++R+WD 
Query 257 FVGHTRSVEDLQWSPTENTVFASCSADASIRIWD 290 

24.88 304 336 1 34 dkf zphf br2_2cl7 . 3 similarity to YMR131C and retinoblastoma- 

binding protein RbAp46 

Alignment to HMM consensus: 
Query *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 
+ H+++V+ +++S + ++SG++DG +++WD 

dkfzphfbr2 304 VTAHDGDVNVISWSRREPF-LLSGGDDGALKIWD 336 
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DKFZphfbr2_2cl8 



group: brain associated 

DKFZphfbr2_2cl8 encodes a novel 302 amino acid protein with weak similarity to cyclin- 
dependent kinase pl30-PITSLRE. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

weak similarity to cyclin-dependent kinase pl30-PITSLRE 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 2835 bp 

Poly A stretch at pos . 2817, polyadenylation signal at pos . 2796 

1 TGGGGCGGAC GGCGAGGGAG TCCAGAGCCT TGAGCCCGGT GCTCCTCCCT 

51 CGCGCAGCGG TGGCTCTGCG GCCGCTGGAG TAAACACTGC CTTTGTTCCC 

101 TAGCGCCTCG TCTTTCGTCG CCCCGTGCCC TCACGCCGCC GGGCTCTGGC 

151 CGGCCCGCCC TCGGTCCTTG AACCCCATTT CGGCTCGTGC CGTGCGGATG 

201 CAGCTGCCGG GCCTGGGTTT GGGCATTGAG CGGGAGGAGG AGGAGGAGCG 

251 GCGGCGCCTG GGCGGCATGC GATGGGGAAC TGCTGCTGGA CGCAGTGCTT 

301 CGGACTGCTT CGCAAGGAAG CGGGGCGGCT GCAGCGAGTA GGCGGCGGCG 

351 GAGGATCCAA GTATTTTAGA ACATGCTCAA GAGGTGAGCA CTTGACAATA 

401 GAGTTTGAGA ATCTAGTAGA AAGTGATGAA GGGGAGAGCC CAGGAAGCAG 

451 TCATAGGCCT CTTACTGAGG AAGAAATTGT TGACCTAAGA GAAAGGCATT 

501 ATGATTCCAT TGCCGAAAAA CAAAAAGATC TTGATGAGAA AATTCAAAAA 

551 GAGTTAGCCT TACAAGAAGA GAAGTTAAGA CTAGAAGAAG AAGCTTTATA 

601 CGCTGCACAG CGTGAAGCAG CCAGGGCAGC AAAGCAGCGA AAGCTCTTGG 

651 AGCAAGAAAG GCAGAGAATT GTGCAGCAAT ATCATCCTTC CAACAATGGA 

701 GAATATCAAA GTTCAGGACC AG AAG AT G AC TTCGAATCTT GTTTGAGAAA 

751 TATGAAGTCA CAGTATGAAG TTTTTCGAAG TAGTAGACTC TCATCAGATG 

801 CTACAGTTTT GACACCAAAT ACAGAAAGCA GTTGTGATTT AATGACCAAA 

851 ACTAAATCAA CTAGTGGAAA TGACGACAGC ACATCCTTAG ATCTAGAGTG 

901 GGAAGATGAA GAAGGAATGA ATAGAATGCT TCCAATGAGA GAACGTTCCA 

951 AAACAGAGGA AGACATTCTA CGGGCAGCAC TTAAGTATAG CAACAAGAAG 

1001 ACTGGAAGTA ATCCTACATC AGCCTCTGAT GATTCCAATG GGCTGGAGTG 

1051 GGAAAATGAT TTTGTTAGTG CCGAAATGGA TGATAATGGA AATTCCGAGT 

1101 ATTCTGGATT TGTAAATCCT GTATTAGAAC TGTCTGATTC TGGCATAAGG 

1151 CATTCTGACA CAGATCAACA GACTCGATAG GGTAAAATTG TGTGACCTTG 

1201 TTTATCAGTT ATGACCAAAT GTTAAAAACC AACTAGAATG TATAAGTGAT 

1251 TGTGCTTAGC CTTTTTGTAA GGGAGATGTG TAAGAAACCA TGCTGTAAAT 

1301 GCTTATTTTA TTACAAAGGA GTAGGGATGA TAGGATCTGA ATTGATACAG 

1351 AATTAAGTGC AATTTCATCA TCTGCCTTCT GCTTTTCAAG ACCAATTTAA 

14 01 TGGTCCTGTC ATGTTACTGA TTAAATTTAC TTTGTCTTGT CTTTATAGCA 

1451 TTTCTGTTTA CTATGGTAGA TTTCCACTTT CAATTTTTAA AATTAATTTT 

1501 ACTTTGAATG ATTTATGAAG CCTATTTCAT TGTCTAACTA TGAAAATATT 

1551 AAGACTTTTT TGTTAATTCT CAGCCGATGT GAAGGAAGCA TGAGGAGGGA 

1601 TCGTCAGACT CAGATTTAGA ATAGTGTTCC CGTTTCCAGC ATTATTTATT 

1651 TCTATGACTT CTTTGGATTT TATTATCTAA TAGTAAGTAC AGTTGATGTG 

1701 GGTAGATGAC TCTAAGAAAT GCTGAAGTAT CGGCATTACA TGTGTTTATT 

17 51 TACATGTCCT AGTTTGATAA TGTTGATTCA ATCTGAACAA AAGATAATAT 
1801 AAAAATAACC CTTCAGAGTT TGGACATTTC AAGTTGGTAA TAATAAAAAA 

18 51 TAATATTTAA GAAGATATAT ATATATATAT ATTTAGTTTT TTCCACTTCA 
1901 TTTTACATGC CACTATATTG ACTTTAATTG ATATACAGTA TTAAGTTTTT 
1951 AGGTGCCATT ATTTTTAAAA AATTCTATAT TTCCAATGAA CGATGTTAGA 
2001 TTTTACACAG AACATATTCT CTGCATGATT TCAGAAAAGA AAATCTAAAA 
2051 AGGTAATACG GGTATTTCAA ATAAAATCCT TTCTGGTATG AAAGGCTCCA 
2101 TTGATTTTAT TAAGCCTTCC TTTACCTTGT AGTACAAGGT GCTTTAATGG 
2151 GATAGAACTA AGCATATCAA TATCTATAAC TGCATTTTGT GCTAGACAAT 
2201 TACTGTTCTT TTCTCTAAAA TGTATATGTC AATTTACAAG GCCAGGGATA 
2251 GAAAACACTC CATAATTGCT TTCCTTGATT TTGCTGAGGA TTTGGTATGA 
2301 TTTTAGTAAG CAAACTGTTT TTTGGTTTTT CCTTAATGTT TTTAATTTTT 
2351 TTTCCTCTTG CAACAATGAC GGTGCATGTT CTTATAAATA TAGGAAGGTC 
24 01 CAGATATAAA TAGTAACCTA AAGTTCTTGC TGTGCTTAAA AAAAAAAATC 
24 51 ATGTGGCTCT TTCAATATTT GAACTGCTAA GCAATGACAT CTGTAGTTTT 
2501 ATCTCCTTTT TTATGTCATA GAAATTAATA TGATACTTTA AATATGTAAA 
2551 TATAATACAT TGGTAATGCT ATTATTTATA TCTGTCTTAA CATAATTTAA 
2601 GTTGTAGCTG TGTCTTGGAA ATATTTTTAA GGTAATCTAT ATTCACATTG 
2651 CCTGTGTTAA TGCTTTTTAA GGTTTGTATA CATCAGATGT ATATTTTTGG 
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2 701 TTTGGCATAA GCTACGATTG TAATTTTTCT TGGCTTTTTG TTCATAAAGA 
2751 ATTTTTTGAA GGAATGGTAA CAAATGGTAA TTTACAAATG GTTGTGAATA 
2 801 AACACATTTT TACACTTAAA AAAAAAAAAA AAAAA 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 272 bp to 1177 bp; peptide length: 302 
Category: similarity to known protein 



1 MGNCCWTQCF GLLRKEAGRL QRVGGGGGSK YFRTCSRGEH LTIEFENLVE 

51 SDEGESPGSS HRPLTEEEI V DLRERHYDSI AEKQKDLDEK IQKELALQEE 

101 KLRLEEEALY AAQREAARAA KQRKLLEQER QRIVQQYHPS NNGEYQSSGP 

151 EDDFESCLRN MKSQYEVFRS SRLSSDATVL TPNTESSCDL MTKTKSTSGN 

201 DDSTSLDLEW EDEEGMNRML PMRERSKTEE DILRAALKYS NKKTGSNPTS 

251 ASDDSNGLEW ENDFVSAEMD DNGNSEYSGF VNPVLELSDS GIRHSDTDQQ 
301 TR 

BLASTP hits 

Entry A55817 from database PIR: 
cyclin-dependent kinase pl30-PITSLRE - mouse 
Length = 783 

Score = 123 (43.3 bits), Expect = 0.00013, P = 0.00013 
Identities = 53/197 (26%), Positives = 96/197 (48%) 



Alert BLASTP hits for DKFZphfbr2_2cl8, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2cl8 , frame 2 

Report for DKFZphf br2_2cl8 . 2 



[LENGTH] 


302 






[MW] 


34281.39 






[pi] 


4.73 






[PROSITE] 


MYRISTYL 5 






[PROSITE] 


CK2 PHOSEHO SITE 




12 


[PROSITE] 


TYR PHOSPHO SITE 




2 


[PROSITE] 


PKC PHOSPHO SITE 




3 


[KW] 


All Alpha 






[KW] 


LOW COMPLEXITY 


13 


.58 % 


[KW] 


COILED COIL 


13. 


.58 % 



SEQ MGNCCWTQCFGLLRKEAGRLQRVGGGGGSKYFRTCSRGEHLTIEFENLVESDEGESPGSS 

SEG xxxxx 

PRD ccccccccchhhhhhhhhheeecccccccceeeeccccccchhhhhhhhccccccccccc 

COILS 

SEQ HRPLTEEEI VDLRERHYDSIAEKQKDLDEKIQKELALQEEKLRLEEEALYAAQREAARAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KQRKLLEQERQRIVQQYHPSNNGEYQSSGPEDDFESCLRNMKSQYEVFRSSRLSSDATVL 

SEG xxxxxxx 

PRD hhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhheeeeecccccceeee 

COILS CCCCCCCCC 
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SEQ TPNTESSCDLMTKTKSTSGNDDSTSLDLEWEDEEGMNRMLPMRERSKTEEDILRAALKYS 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhccccccchhhhhhhcchhhhhhhhhhhc 

COILS 

SEQ NKKTGSNPTSASDDSNGLEWENDFVSAEMDDNGNSEYSGFVNPVLELSDSGIRHSDTDQQ 

SEG 

PRD cccccccccccccccccccccccceeeecccccccccccccceeeecccccccccccccc 

COILS 

SEQ TR 
SEG 

PRD CC 
COILS 



Prosite for DKFZphf br2_2c!8 . 2 



PS00005 


60->63 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


170->173 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


240->243 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


36->40 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


65->69 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


79->83 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


148->152 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


163->167 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


186->190 


CK2 PHOSPHO" 


"site 


PDOC0C006 


PS00006 


198->202 


CK2 PHOSPHO 


[site 


PDOC00006 


PS00006 


204->208 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


226->230 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


228->232 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00OQ6 


250->254 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


295->299 


CK2 PHOSPHO" 


"site 


POOCOQ006 


PS00007 


103->111 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


103->111 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


24->30 


MYRISTYL 




PDOC00008 


PS00008 


25->31 


MYRISTYL 




PDOC00008 


PS00008 


199->205 


MYRISTYL 




PDOC00008 


PS00008 


245->251 


MYRISTYL 




PDOC00008 


PS00008 


291->297 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKF2phfbr2_2cl8.2) 
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DKFZphfbr2_2dl5 



group : differentiation/development 

DKFZphfbr2_2dl5 encodes a novel 438 amino acid protein similarity to Mus musculus testis- 
specific Y-encoded-like protein (Tspylll . 

The TSPY genes are arranged in clusters on the Y chromosome of many mammalian species. TSPY is 
believed to function in early spermatogenesis and is a candidate for GBY, the putative 
gonadoblastoma-inducing gene on the Y . The novel protein is a new member of the TSPY-SET- 
NAP1L1 family, which represents proteins closely related to TSPY. Therefore, the new protein 
seems to be involved in early spermatogenesis. 

The new protein can find application in modulating early spermatogenesis. 



strong similarity to testis-specif ic Y-encoded-like protein 

complete cDNA, complete cds, EST hits 
localisation: primer B does not match perfect 



Sequenced by Qiagen 



Locus: /map="729.2 cR from top of Chr6 linkage group" 
Insert length: 3229 bp 

Poly A stretch at pos. 3206, polyadenylation signal at pos. 3184 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
551 
701 
751 
801 
351 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 



GGAGACTGTA 
TCTGAGGAAA 
GAGCGGCCTG 
TCATTATTTC 
AGGCTCCGCG 
GGGAGGCTCG 
GGGGCGTACC 
GTTGTTGGGG 
CCAGCCTCCC 
ACCGCAGCCT 
TGTGGCGCCC 
GGCGGAGGAG 
CTGAGAGGGA 
GAGGTAATGG 
AGAAATAGAA 
AAGAAGGGCC 
GCCATCCAGC 
CCAACAGCTG 
GGAGGAACTA 
CGAAACCACC 
GTTAAGGTAC 
CCGGTTGCAA 
AAGCTGATTG 
TCTTTCTACT 
TTCGCAGAAA 
CACAGCCTTC 
GTGGCCAAAT 
CCCGACGTCG 
TTCCAGTCTG 
CTACCACCTT 
TCTATTGTGC 
TATTCAGTTC 
CACTTCCATA 
CATCCTTCAC 
GCCTTTATCT 
GGTTGCTGTC 
GCTCTGCATT 
TTAATGAACT 
ATGGTACTTC 
TGCCCTACAT 

TCGCCCAGGT 
CCTCCCGGGT 
CTACAGGCGC 
ACGGGGTTTC 
TCCGCCCTCC 
AATATTTGTA 
GAGGCAAAGA 
TCCCCTTCTT 
ATAGTTAAAG 



GGGTGGGCGG 
ACGGGCGTTC 
GATGGGGTCA 
TGACCAAGTC 
ACCAAAGCGA 
GAGACCGTCG 
CCAGGATCCC 
GTCGCGGTCA 
GCCGAAGGCC 
GAAAAAGGGC 
AGAGATCCGC 
GTGAAGACAG 
GAGCGCTGAG 
AGGAGCAGAT 
GTGGCGGAGG 
CTGGCCTTTG 
TGGAACTGGA 
GAGCACAAGT 
CATCATTCAG 
CCCAGTTGTC 
ATAACCAATT 
GTTCAAGTTC 
TCAAGGAATA 
CCAATTATAT 
CCAAGACCTC 
CAGAGTCCGA 
CCACTGCAAT 
CCCGCTAAGG 
GTTAACATTT 
CTGCTGGACC 
TTTGTTTTTG 
TCTCAACCTC 
TGACCTTCAT 
ACTACTTGTA 
GCACTGCTTG 
ACTTGGATTT 
GAGCAGTATG 
CAGAGGAGAA 
ATTGCTCTTC 
TGGCTCCTGC 

TGGAGTGCAG 
TCAAGCGATT 
GCGCCGCCAC 
ACCATGCTGG 
TTAGCCTCCC 
AAAGCAAGGT 
AGTTGGCCTG 
CCCAACTTCC 
AGAGACACAT 



TGCGAGCGGC 
GCCTGCGGTT 
AGAGGACCAC 
CCGAGCGACC 
GGCGACACAG 
CGCTCCCGCC 
GCGGGCCGTG 
TGTGGCGATC 
TGGCAGCCGC 
GTTCAGGGTG 
GTCTGAGCTG 
GAAAGTGCGC 
GTGGTGGTGA 
GGAGGTAGAG 
AGGATAGATT 
CATGAGGCTC 
CACTGTGAAT 
TTGGGCGGAT 
AATATCCCGG 
CGCCATGATT 
TAGAGGTGAA 
TTCTTTAGAA 
TGAGGTAAGA 
GGCGCAGGGG 
ATCTGCAGCT 
CAAAATTGCT 
ACTACCTGTT 
GAGCCTGTAG 
GCCCTTGGGA 
TGTGCTTGGG 
CTGACTTTTC 
AAGATTGAGA 
GCTGTTCTGG 
AGCCAAGCAA 
GACCCTGTTT 
CTAGCTTTGG 
GGCACATGCC 
AAGCAGTGAG 
CTTCACCTCT 
CAAGGTCCCT 
TTTTGAGACG 
TGGCGCGATC 
CTCCTGCCTC 
GCCCGGCTAA 
CCAGGCTGGT 
AATCCTCTCT 
TTTTATTTCA 
TAAAATAGAG 
TACTTCCTAG 
CTAGATGGGA 



GGTTAGCTCC 
GGTCCGACTG 
TCCCCTCCAA 
AGGACGCACA 
GTGATGGCGG 
TTCACCGCCT 
GCGGTACTCC 
AAAGCCGGGC 
TTCTGTGGTG 
GAGAGAAGGC 
ACGGCGGGGG 
CACCGTCTCA 
AGGAAGGCCT 
GAGCAGCCGC 
GGAGGAGGAG 
TCCGCATGGA 
GCTCAGGCCG 
GCGTCGACAC 
GCTTCTGGAT 
AGGGGCCAAG 
GGAACTCAGA 
GAAACCCCTA 
TCCTCCGGCC 
GCATGAACCC 
TCTTCACTTG 
GAGATTATTA 
GCGTGAAGGA 
AGATCCCCAG 
ATACTCCTGC 
CATCAGCAAT 
TGCACCCTGT 
CGGTGGTGGG 
AATATCACAT 
ATGATACTGT 
ATTCCCAGGG 
GAGCCTGTTC 
CTGTGGACAG 
CCACTTGTTC 
AGTCACTTTC 
CTCTCTCCCT 
GAGGACGGAG 
TCGGCTCACT 
AGCCTCCCGA 
TTTTTATATT 
CTCGAACCCC 
TAAAAAAGTG 
TTTTGGCTCT 
TGCTAGAGCT 
CCCTTTTATC 
TGAAAGGTGC 



CAGTTCGGCC 
TTAGCAACAT 
ACCCACAGCA 
CCAGTACCTG 
AGCCGGGTGA 
TCAGAGGAGG 
CCAGATCCGA 
AGGAAGAGGG 
ATGGCAGCCG 
CCTAGAAATC 
CGGAGGCTGA 
GCAGCCGTGG 
GGCGGAGAAG 
CAGAAGGTGA 
GCGAGGGAGG 
CCCTCTGGAG 
ACAGGGCCTT 
TACCTGGAGC 
GACTGCTTTT 
ATGCAGAGAT 
CACCCTAGAA 
CTTCAGAAAC 
GAGTGGTGTC 
CAGTCCTTCA 
GTTTTCAGAC 
AAGAGGATCT 
GTCCGTAGAG 
GCCCTTTGGG 
ACAAGGTCTC 
GAGTATGCCT 
TTCCTTTGGA 
TATGCTTCTC 
GCTACGAGGT 
AGATTGTACT 
CCTCTGAACT 
CACCTACTCA 
TTACTGGACG 
TGTGTGATTT 
TATTGCTACC 
GTTTTCCTTT 
TCTTGCTCTG 
GCAACCTCCA 
GTAGCTGGGA 
TTTAGTAGAG 
GACCTCGTGA 
ATAGCTCAGA 
GTCATTTTCA 
CTTACGCCCC 
AACTCCTAGA 
CCTAAGCAGG 
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2501 AGAAACTGAA CAAAAGGCTA GAGGCATGGG CCAGGTAAAA ATTGGGCCTA 
2551 GAGTGAAGAC TGTGCTGCCG TTAAGAGCTT TCGAGGAAGG AGTACTTACT 
2 601 CCCCAATGAT GATGAATGGA GAAATACTTT TCAGGGAGAA TTGAAGGGGT 
2 651 TAAAGTGTTA AATATGTTGC CTAGACAAGG GTTCTTTAAA GAAAGACAGC 
2701 GCAACTTTGA ATGCTTTCTT ACTTGTTTTG TGACCTAATT TATGTGGAAG 
2751 ATTGTTATTT CATTAGGATT TAGTAAAATT TTTTTTTCTG ATTCTAAACT 
2801 TATTGTGAAA ATTGAGCTGT ACAGATATTC TTTTGATTTC AATTGGGAAC 
2851 ATTTGGAAGA ACAACAGTCT TACTTGCCTG TACAATATAG AGACATATGA 
2 901 ATAGTCATAA CAGTTTTCAA CTTGTTCTTG TTTCTGTTAA ACTATATTCC 
2 951 TAGAAACATA GTTTGAACAA CTTGGTCTTT GTTAGGCTTG TCAAATTGCC 
3001 TTCATGGAAA AATAATCTAC AAAAGTATGG TTTAATTGAT TGTCTTACAT 
3051 GATAATTTTC CCTGGCAACA ACTTAGTAAG TGATATATCT TTTTTCCTAA 
3101 ATTGCTTAAA TACTGTGAAA TTGCTCTGAC AAATTGGAAG TGTACCATTG 
3151 GCATATTTGT CTTCCTTTTT ATGCATGATG GTAAAATAAA AGCATGTTGT 
3201 TCTGCTAAGA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AF042181 from database EMBLNEW: 

Homo sapiens testis-specif ic Y-encoded-like protein (TSPYL) mRNA, 
partial cds . 

Score = 3411, P = 6.9e-148, identities = 685/687 

Entry HS938343 from database EMBL : 
human STS WI-11947. 
Score = 1195, P = 2.1e-46, identities = 273/299 



Medline entries 



98399864: 

Murine and human TSPYL genes: novel members of the TSPY-SET-NAP1L1 family 



Peptide information for frame 3 



ORF from 99 bp to 1412 bp; peptide length: 438 
Category: strong similarity to known protein 
Classification: Differentiation/Development 



1 MSGLDGVKRT TPLQTHSIII SDQVPSDQDA HQYLRLRDQS EATQVMAEPG 
51 EGGSETVALP PSPPSEEGGV PQDPAGRGGT PQIRVVGGRG HVAIKAGQEE 
101 GQPPAEGLAA ASVVMAADRS LKKGVQGGEK ALEICGAQRS ASELTAGAEA 
151 EAEEVKTGKC ATVSAAVAER ESAEVVVKEG LAEKEVMEEQ MEVEEOPPEG 
201 EEIEVAEEDR LEEEAREEEG PWPLHEALRM DPLEAIQLEL DTVNAQADRA 
251 FQQLEHKFGR MRRHYLERRN YIIQNIPGFW MTAFRNHPQL SAMIRGQDAE 
301 MLRYITNLEV KELRHPRTGC KFKFFFRRNP YFRNXLIVKE YEVRSSGRVV 
351 SLSTPIIWRR GHEPQSFIRR NQDLICSFFT WFSDHSLPES DKIAEIIKED 
401 LWPNPLQYYL LREGVRRARR RPLREPVEIP RPFGFQSG 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2dl5 , frame 3 

TREMBL:AF042180_1 gene: "Tspyll"; product: "testis-specif ic 
Y-encoded-like protein"; Mus musculus testis-specif ic Y-encoded-like 
protein (Tspyll) mRNA, complete cds., N = 1, Score = 1202, P = 3.1e-122 

TREMBL:AB018264_1 gene: "KIAA0721"; product: "KIAA0721 protein"; Homo 
sapiens mRNA for KIAA0721 protein, partial cds., N = 1, Score = 798, P 
= 2e-79 

TREMBL:AB015345_1 gene: "HRIHFB2216"; Homo sapiens HRIHFB2216 mRNA, 
partial cds., N = 1, Score = 570, P = 2.9e-55 



>TREMBL:AF04218 0_1 gene: "Tspyll"; product: "testis-specif ic Y-encoded-like 
protein"; Mus musculus testis-specif ic Y-encoded-like protein (Tspyll) 
mRNA, complete cds . 

Length = 379 

HSPs : 
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Score = 1202 (180.3 bits), Expect = 3.1e-122, P = 3.1e-122 
Identities = 258/377 (68%), Positives = 283/377 (75%) 

Query: 62 SPPSEEGGVPQDPAGR GGT PQI RVVGGRGHVAI KAGQEE- -GQP- P- - AEGLAA 110 

SP +EG D G GTP R + G G+ GPP EGL 

Sbjct: 3 SPERDEGTPVPDSRGHCDADTVSGTPDRRPLLGEEKAVTGEGRAGIVGSPAPRDVEGLVP 62 

Query: 111 ASVVMAADRSLKK-GVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAE 169 

V AA + V+G A+ + ++ T GAE++A +VKT + TV+AA 

Sbjct: 63 QI RVAAARQGES P PS VRGPAAA V FVTPKYVEKAQETRGAESQARDVKT -E PGT VAAAA — 119 

Query: 170 RESAEVVVKEGLAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALR 229 

E +EV EE MEVE Q P GEE+E+ E EA EE GPW L LR 

Sbjct: 120 -EKSEVATPGS EEVMEVE-QKPAGEEMEMLEASGGVREAPEEAGPWHLGIDLR 170 

Query: 230 MDPLEAIQLELDTVNAQADRAFQQLEHKFGRMRRHYLERRNYI IQNIPGFWMTAFRNHPQ 289 

+PLEAIQLELDTVNAQADRAFQ LE KFGRMRRHYLERRNYI IQNIPGFWMTAFRNHPQ 
Sbjct: 171 RNPLEAIQLELDTVNAQADRAFQHLEQKFGRMRRHYLERRNYI IQNIPGFWMTAFRNHPQ 230 

Query: 290 LSAMIRGQDAEMLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 349 

LSAMIRG+DAEMLRY+T+LEVKELRHP+TGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 
Sbjct: 231 LSAMIRGRDAEMLRYVTSLEVKELRHPKTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRV 290 

Query: 350 VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYY 409 

VSLSTPIIWRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESD+I AEIIKEDLWPNPLQYY 
Sbjct: 291 VSLSTPI I WRRGHEPQSFIRRNQDLICSFFTWFSDHSLPESDRI AEIIKEDLWPNPLQYY 350 

Query: 410 LLREGVRRARRRPLREPVEIPRPFGFQSG 438 

L REG+RR RRRP+REPVEI PRPFGFQSG 
Sbjct: 351 LCREGIRRPRRRP I RE PVEI PRPFGFQSG 379 

Pedant information for DKFZphfbr2_2dl5, frame 3 



Report for DKFZphfbr2_2dl5 . 3 

[LENGTH] 438 

[MW] 49307.65 

[pi] 5.36 

[HOMOL] TREMBL: AF042180_1 gene: "Tspyll"; product: "testis-specif ic Y-encoded-like 

protein"; Mus musculus testis-specific Y-encoded-like protein (Tspyll) mRNA, complete cds . 
107 

[ FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YKR048c] le-07 

[ FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YKR048c] le-07 

[ FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR048c 

le-0 / 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YKR048c] le-07 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR048C] le-07 

[BLOCKS] BL00376F 

[PIRKW] nucleus 6e-39 

[PIRKW] DNA binding 3e-06 

[PIRKW] phosphoprotein 6e-39 

[PIRKW] alternative splicing 6e-39 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 22.83 % 

SEQ MSGLDGVKRTTPLQTHS I I I SDQVPSDQDAHQYLRLRDQSEATQVMAEPGEGGSETVALP 
SEG X 



PRD ccccccccccccccceeeeecccccccccchhhhhhhhchhhhhcccccccccceeeecc 

SEQ PSPPSEEGGVPQDPAGRGGTPQIRVVGGRGHVAIKAGQEEGQPPAEGLAAASVVMAADRS 

SEG xxxxxxxxx 

PRD ccccccccccccccccccccceeeeecccceeeeecccccccccchhhhhhhhhhhhhcc 

SEQ LKKGVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAERESAEVVVKEG 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD ccccccccccceeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALRMDPLEAIQLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

SEQ DTVNAQADRAFQQLEHKFGRMRRHYLERRNYIIQNIPGFWMTAFRNHPQLSAMIRGQDAE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeecccccccccccccchhh 

SEQ MLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLIVKEYEVRSSGRVVSLSTPI IWRR 
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SEG 

PRD hhhhhhhhhhhhhcccccceeeeeeeccccccchhhhhhccccccccccccccceeeecc 

SEQ GHEPQSFIRRNQDLICSFFTWFSDHSLPESDKIAEIIKEDLWPNPLQYYLLREGVRRARR 

SEG xxxxxxxxxxx 

PRD ccccchhhhhhcccccceeeeeccccccccchhhhhhhhhcccccceeeeccccchhhhh 

SEQ RPLREPVEIPRPFGFQSG 

SEG xxxxxxxx 

PRD hccccccccccccccccc 

(No Prosite data available for DKFZphfbr2_2dl5 . 3) 
(No Pfam data available for DKFZphfbr2_2dl5 . 3) 



220 



12/13/10, EAST Version: 2.4.2. 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_2dl7 



group: transmembrane proteins 

DKFZphfbr2_2dl7 encodes a novel 292 amino acid protein with similarity to a C.elegans 
hypothetical protein. 

One transmembrane region is predicted for the protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C.elegans hypothetical protein 



TRANSMEMBRANE 1 
Sequenced by Qiagen 
Locus : unknown 



Insert length: 1009 bp 

Poly A stretch at pos . 990, polyadenylation signal at pos . 969 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
T01 
751 
801 
851 
901 
951 
1001 



TGGGCCTGTG 
CTATTTCCTT 
CAAATCCAGC 
AGATTTATTA 
CGCCTCTTGA 
GAAAAGGGAC 
CTACCGGGAA 
GAGAGATCAT 
TTGAAAAAAG 
TTTTATCTTT 
TTTTAATTCA 
CTTATTATAA 
AAGAGCTGTG 
ACTATATTGA 
AGTTCAGATG 
AGTAACAAAG 
GAGAAAAAGA 
TTCCTTTACT 
TTTTGTTTTT 
CCCATGGTCG 
AAAAAAAAA 



GCTGGGGGCA 
GAGCTCTTTA 
AGGGAGGTCC 
TCTAGGATAG 
TTTTCCTGAT 
AGTTAAGACA 
GATTTACACA 
CACGAAGTAT 
TATCTATTCC 
ATGAGTGAGG 
TGGTAGTGGT 
ATGAAGATCT 
GCTGAAGGAT 
AGTAGAAAAG 
AACCAGCAGA 
AAGCGACGTG 
AATGATGCAA 
ATTTTCTTTA 
CTTCAAGAAT 
TCTACTTGGA 



GAGCTCAGAC 
ATTTTGTTGC 
AGATGAAAAA 
ATTTGGATGA 
ACCCTGGAAG 
CATAAAAACT 
GATGGAACCA 
GTATATGAGC 
AGTAGATGCC 
ATGCTTTGAC 
GTTGTCAGGG 
GGACAGTGGC 
ATGGAGTAAT 
CCGAAGATAC 
AAAACGGGAA 
ATTTCTATGA 
TTGTATATCA 
CCTTGTATAT 
ATTAATTTCT 
TTAAATGGGT 



TGTCTTCTGA 
CAATTTGGAT 
GAAAAGACTA 
AC T AATG AAA 
GATTTGAATA 
GGGGAACCAT 
CAAAAGATAC 
TCCTGGAAAA 
ACTGAGAGTG 
AAATCCACAG 
CAGGGCAGTG 
ACACAGATAC 
AGTACTAAAT 
ACGTACAGTC 
AGAAAAGATA 
GAAGTATCGT 
GAGTGAGTGA 
ATTTTATTAT 
TTATTTGTCA 
TTTTAAATTC 



AGATTGATGT 
AAACATGGCA 
CCGCACTGAA 
AAAGATGAAC 
TGCTTTTAAT 
TTGTTTTTAA 
GAGGCTCTAG 
GGATTGTAAT 
AACCAAAGAG 
AAACTGATGG 
GGCTAGAAGA 
CGTTTATTAA 
CCCAATGAAA 
ATCATCTGAT 
AAGTTTCTAA 
AACCCCCAAA 
GATCACTACT 
ATGTAGATTG 
TCATTTATTT 
AAAAAAAAAA 



BLAST Results 



Entry 189937 from database EMBL : 

Sequence 11 from patent US 5723315. 

Score = 1083, P = 2.2e-42, identities = 223/231 



Entry 189938 from database EMBL: 

Sequence 12 from patent US 5723315. 

Score = 875, P = 7.4e-33, identities = 175/175 

\ 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 922 bp; peptide length: 292 
Category: similarity to unknown protein 
Classification: unset 



1 MSISLSSLIL LPIWINMAQI QQGGPDEKEK TTALKDLLSR IDLDELMKKD 
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51 EPPLDFPDTL EGFEYAFNEK GQLRHIKTGE PFVFNYREDL HRWNQKRYEA 
101 LGEI ITKYVY ELLEKDCNLK KVSIPVDATE SEPKSFIFMS EDALTNPQKL 
151 MVLIHGSGVV RAGQWARRLI INEDLDSGTQ I PFIKRAVAE GYGVI VLNPN 
201 ENYIEVEKPK IHVQSSSDSS DEPAEKRERK DKVSKVTKKR RDFYEKYRNP 
251 QREKEMMQLY IRVSEITTFL YYFLYLVYIL LYVDCFVFLQ EY 

BLASTP hits 

Entry S67436 from database PIR: 

hypothetical protein - fission yeast (Schizosaccharomyces pombe) 
Length = 266 

Score = 112 (39.4 bits), Expect = 0.00037, P = 0.00037 
Identities = 33/147 (22%), Positives = 69/147 (46%) 

Entry CEY75B8A_12 from database TREMBLNEW: 

gene: "Y75B8A.31"; Caenorhabditis elegans cosmid Y75B8A 

Score = 327, P = 1.5e-29, identities = 72/140, positives = 93/140 



Alert BLASTP hits for DKFZphfbr2_2dl7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2dl7, frame 2 



Report for DKFZphf br2_2dl7 . 2 



[LENGTH] 292 

[MW] 34260.50 

[pi] 5.50 

[HOMOL] TREMBLN£W:AF064782_1 product: "unknown"; Mus musculus clone pEN87 unknown mRNA, 

partial cds. le-119 
[KW] SIGNAL_PEPTIDE 19 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 10.96 % 

SEQ MSISLSSLILLPIWINMAQIQQGGPDEKEKTTALKDLLSRIDLDELMKKDEPPLDFPDTL 

SEG . xxxxxxxxxxxxxx 

PRD ccchhhhhhchhhhhhhccccccccccchhhhhhhhhhhhhcchhhhhhccccccccccc 

MEM 

SEQ EGFEYAFNEKGQLRHIKTGEPFVFNYREDLHRWNQKRYEALCEIITKYVYELLEKDCNLK 

SEG 

PRD hhhhhhcccccceeeecccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhe 

MEM 

SEQ KVS I PVDATESEPKSFIFMSEDALTNPQKLMVL I HGSGVVRAGQWARRL I INEDLDSGTQ 

SEG 

PRD eeeccccccccccceeeeeeccccccccceeeeeecccccchhhhhcccccccccccccc 

MEM 

SEQ IPFIKRAVAEGYGVIVLNPNENYIEVEKPKIHVQSSSDSSDEPAEKRERKDKVSKVTKKR 

SEG 

PRD chhhhhhhhccceeeeeccccceeeeeccceeeeccccccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RDFYEKYRNPQREKEMMQLYIRVSEITTFLY YFLYLVYILLYVDCFVFLQEY 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhcccchhhhhhhhhhhhheeeeehhhhhhhhhhhhheeeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphf br2_2dl7 . 2) 
(No Pfam data available for DKFZphf br2_2dl7 .2) 
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DKFZphfbr2_2d20 



group: brain derived 

DKFZphfbr2_2d20 encodes a novel 197 amino acid protein with similarity to Synechocystis sp. 
P74594 hypothetical32 . 8 kD protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to Synechocystis sp. (PCC 6803) 
complete cDNA, complete cds, EST hits 

potential start at bp 67 matches kozak consensus ANCatgG 

Sequenced by Qiagen 

Locus : unknown 

insert length: 1787 bp 

Poly A stretch at pos. 17 68, polyadenylation signal at pos . 1743 



1 TGGGGCGGCC GCGGCGGGAA CATGGAGGAG CTGCTGAGGC GCGAGCTGGG 

51 CTGCAGCTCT GTCAGGGCCA CGGGCCACTC GGGGGGCGGG TGCATCAGCC 

101 AGGGCCGGAG CTACGACACG GATCAAGGAC GAGTGTTCGT GAAAGTGAAC 

151 CCCAAGGCGG AGGCCAGAAG AATGTTTGAA GGTGAGATGG CAAGTTTAAC 

201 TGCCATCCTG AAAACAAACA CGGTGAAAGT GCCCAAGCCC ATCAAGGTTC 

251 TGGATGCCCC AGGCGGCGGG AGCGTGCTGG TGATGGAGCA CATGGACATG 

301 AGGCATCTGA GCAGTCATGC TGCAAAGCTT GGAGCCCAGC TGGCCGATTT 

351 ACACCTTGAT AACAAGAAGC TTGGAGAGAT GCGCCTGAAG GAGGCGGGCA 

401 CAGTGTGGAG AGGAGGTGGG CAGGAGGAAC GGCCCTTTGT GGCCCGGTTT 

451 GGATTTGACG TGGTGACGTG CTGTGGATAC CTCCCCCAGG TGAATGACTG 

501 GCAGGAGGAC TGGGTCGTGT TCTATGCCCG GCAGCGCATT CAGCCCCAGA 

551 TGGACATGGT GGAGAAGGAG TCTGGGGACA GGGAGGCCCT CCAGCTTTGG 

601 TCTGCTCTGC AGTAAAAGAT CCCTGACCTG TTCCGTGACC TGGAGATCAT 

651 CCCAGCCTTA CTCCACGGGG ACCTCTGGGG TGGAAACGTA GCAGAGGATT 

7 01 CCTCTGGGCC GGTGATTTTT GACCCAGCTT CTTTCTACGG CCACTCGGAA 

751 TATGAGCTGG CAATAGCTGG CATGTTTGGG GGCTTTAGCA GCTCCTTTTA 

801 CTCCGCCTAC CACGGCAAAA TCCCCAAGGC CCCAGGATTC GAGAAGCGCC 

851 TTCAGTTGTA TCAGCTCTTT CACTACTTGA ACCACTGGAA TCATTTTGGA 

901 TCGGGGTACA GAGGATCCTC CCTGAACATC ATGAGGAATC TGGTCAAGTG 

951 AGCGGGCCTT ACTCTGGAAG GAGGTCTCAG AGGTTTCTCC ACAGTCCTCT 

1001 TCTGGGCAAA TTCTTGTTTC TTCACATGCC GGACTAGCTT AAGACCAATG 

1051 CAGTAGCTTA TTTCCAAGCC TTGCAAAGTA TATAATATCT AAGAGGAAAG 

1101 GTTTTGTCAT CCCAGCGTTG TCCACTTTGT GGGGCTTTGT AGGTAGACGG 

1151 AGCCACACTA CAGGCAGGGT ATGAGCAGAG GGATGTATGG AGTGTGGGCG 

1201 ACTCTGAGCC TCACTGCTGC TGCAAGGTGG GGAAACTGTA AGTGAACCCC 

1251 TGTGGGTGCG GGGGAGGGTA TCCGGTGCGC AGGGAGGTGG CCAGCGCCCC 

1301 CGGGCACTGC TGCTCATAGG TACCTTTCCG CTGCCTCCTC CCTGCTCTCC 

1351 TGTGCAGGAA TGTCTCTGAG CTGTTCACGT TGATGCTTCT TGGTTGGCAA 

14 01 GACTTGGGTG TAGACATGAA ACCACCTTAC TAAAAGCGTC TTAAAATGAC 

14 51 CAATTCCAGA ATCAAGCGTA TTCCGTTTTC CTCCTGCATG ATCCCTGGGC 

1501 CCTCCCGCAG GCTGAGCAAG TCTGTAAACT GATTCTGGGA GAAACCAAGC 

1551 TGCTGGCCGT AGGATGTCCT TGGGTACATC CAGGAGTCTT CATTGCTTCT 

1601 GTTATTACCC CGTCTCCTCT GCCATTTTCT ACAGCTTGCT GAGTTGTCAT 

1651 TCCTTTGCAA CATTAAAATA CATGCTGAAC TCATATTTTT CCTTCCTTCA 

1701 CTGTTGTAGT AAAGAGACAT ATTTCATGAA TGGCATTGAT GCTAATAAAC 

1751 CCTTTGCCCA AAAATTTGAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 
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ORF from 22 bp to 512 bp; peptide length: 197 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZIPPER (117-139) 



1 MEELLRRELG CSSVRATGHS GGGCISQGRS YDTDQGRVFV KVNPKAEARR 
51 MFEGEMASLT AILKTNTVKV PKPIKVLDAP GGGSVLVMEH MDMRHLSSHA 
101 AKLGAQLADL HLDNKKLGEM RLKEAGTVWR GGGQEERPFV ARFGFDVVTC 
151 CGYLPQVNDW QEDWVVFYAR QRI QPQMDMV EKESGDREAL QLWSALQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2d20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2d20, frame 1 



Report for DKFZphfbr2_2d20 . 1 



[LENGTH] 197 

[MW] 21963.25 

[pi] 6.96 

[HOMOL] PIR:S76790 hypothetical protein - Synechocystis sp. (strain PCC 6803) 9e-12 

[SUPFAM] hypothetical protein bl725 le-06 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MYRISTYL 2 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] Alpha_Beta 



SEQ MEELLRRELGCSSVRATGHSGGGCISQGRSYDTDQGRVFVKVNPKAEARRMFEGEMASLT 

PRD ccchhhhhccccceeeeccccccceeeccccccccceeeeeeccchhhhhhhhhhhhhhh 

SEQ AILKTNTVKVPKPIKVLDAPGGGSVLVMEHMDMRHLSSHAAKLGAQLADLHLDNKKLGEM 

PRD hhhhhheeeeccceeeecccccceeeeecccccccchhhhhhhhhhhhhhhcccccchhh 

SEQ RLKEAGTVWRGGGQEERPFVARFGFDVVTCCCYLPQVNDWQEDWVVFYARQRIQPQMDMV 

PRD hhhhhccccccccccccceeeccccceeeccccccccccccchhhhhhhhhhhhhhhhhh 

SEQ EKESGDREALQLWSALQ 

PRD hhhccchhhhhhhhccc 



Prosite for DKFZphfbr2_2d20 . 1 



PS00002 
PS00005 
PS00005 
PS00008 
PS00003 
PS00029 



20->24 
13->16 
67->70 
22->28 
104->110 
96->118 



GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

LEUCINE ZIPPER 



PDOC00002 
PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00029 



(No Pfam data available for DKFZphf br2_2d20 . 1) 
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DKFZphfbr2_2gl8 



group: brain derived 

DKFZphfbr2_2gl8 encodes a novel 229 amino acid protein with partial similarity to the humane 
dJ30M3.2 gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

J30M3.2 extension of genmodel 

complete cDNA, complete cds, EST hits 
(mouse ESTs with >90% Identities) 

Sequenced by Qiagen 

Locus: /map="'6p22 . 1-22" 

Insert length: 2444 bp 

Poly A stretch at pos. 2425, no polyadenylation signal found 

1 TGGTCGAGGG TCGACGGTAT CGATAAGTTT TTTTTTTTTT TTTTTTTTTT 

51 TGGAAAGCAA GGATCACACT TCCCCCTCCC TGTTCCTTAA TCCCTTTTCT 

101 AAAAAGGGGG GAAAATCCGG ATGGATTTTA GGGATTGGTC TGGTGTCAGC 

151 TGTGTCTTAT TGCACACCTA AATCCTGATT ATAGGCTTTT CATTTCTCCG 

201 CAAAGCCTTT ATTTTGGCAG TTAAGCCAAA TGTGTTTTCC AGAAAGTTAG 

251 TTATTTTCTC CTCTTTCTTT CCTTTCTTTC CTCCCTTTTT CCCGTCTGAC 

301 CCCAAACGTT ATTGTCCAAA CATGACTGGA CAGCAGCTTT TGTTTCTTGA 

351 CCCTGTAATA TGACAGTCTG CTAATATTGA CAGAAGGTGC AGTTTTTGGG 

4 01 TTATAGTCGT GATTTTCGCT AATCAATCAT ATTAGCAGGA AAAAAAATGA 

451 CTTGTTTCTG TTGTACTTGA GTCTTAAGAA AAAGTGCCCA TAGTTTAGTG 

501 ACAATTTCCA AAGGCTTTAG TACCACCTGT ATTTCAAAAT GGGGGACCCA 

551 AACTCCCGGA AGAAACAAGC TCTGAACAGA CTACGTGCTC AGCTTAGAAA 

601 GAAAAAAGAA TCTCTAGCTG ACCAGTTTGA CTTCAAGATG TATATTGCCT 

651 TTGTATTCAA GGAGAAGAAG AAAAAGTCAG CACTTTTTGA AGTGTCTGAG 

701 GTTATACCAG TCATGACAAA TAATTATGAA GAAAATATCC TGAAAGGTGT 

7 51 GCGAGATTCC AGCTATTCCT TGGAAAGTTC ' CCTAGAGCTT TTACAGAAGG 

801 ATGTGGTACA GCTCCATGCT CCTCGATATC AGTCTATGAG AAGGGATGTA 

851 ATTGGCTGTA CTCAGGAGAT GGATTTCATT CTTTGGCCTC GGAATGATAT 

901 TGAAAAAATC GTCTGTCTCC TGTTTTCTAG GTGGAAAGAA TCTGATGAGC 

951 CTTTTAGGCC TGTTCAGGCC AAATTTGAGT TTCATCATGG TGACTATGAA 

1001 AAACAGTTTC TGCATGTACT GAGCCGCAAG GACAAGACTG GAATCGTTGT 

1051 CAACAATCCT AACCAGTCAG TGTTTCTCTT CATTGACAGA CAGCACTTGC 

1101 AGACTCCAAA AAACAAAGCT ACAATCTTCA AGTTATGCAG CATCTGCCTC 

1151 TACCTGCCAC AGGAACAGCT CACCCACTGG GCAGTTGGCA CCATAGAGGA 

1201 TCACCTCCGT CCTTATATGC CAGAGTAGAG TACTGACCAG CAAAATGGAG 

1251 AAGATCAGAG AATGCAGCAG CAGTTTTTTT TCTTGTTTTC TTACCACTTT 

1301 ATTCTTTCAG AGTTTAAAGA AAATGGACTC AT GC AC AGAA CACTATGCAT 

1351 TTTGAAACTT GTTCATCCTG GATTTTTTTA AATCATTTTT ATCTCAGAAC 

1401 TTAAACAAAA ATTAGATGTC GTGCACGGAC TGTGTGAAAG AAGATGCTTT 

14 51 GCATATTTGC TGCACTGCAT CAGTATCTTA CTAAAAATGT GAAATGAAAG 

1501 GACTATTGTA CACTGAAATG CTTAAATGTA TCTGAAAGCA CAAGGTGATA 

1551 CTCATTTTTA TGGTCTTCCC ATTTGTGCTG GTTTTTGCCT CTTTGACATC 

1601 TGTCATCAGT ATTTAGAGGG TGAGAAGTGA ATGTAACAGG TATAAATAAC 

1651 ATTTTTAAAA ACAATAACTT TGCTATAATC ACAGTTGTTC CAGAGCACTG 

17 01 TCAGATACAT TCTAATGACC AGAACTGGTT TAAAAAAAGA AAATACAACC 
1751 ATGGGAAAGA AATCTTAAAT GAAAAACGCA TCTCATTGTA GGCATTTTTG 

18 01 CCTCATATTT TACTGGGCCA TGTTTGTTTC CTGGTACTCA TGTATTTTTT 
1851 TTTTTTCCAG ATCTCTTTCC CCAAGTTGCT ATTGTAAGAG TATTCTGCTG 
1901 CGTGTGGATG CAGTTATACA CATTAAAGCA GATCTGGAGT CTGAAGTAGC 
1951 TATAAAGCAG CTATAAAACA GAAATACATG CATAGCTGCA GAAACCATGA 
2001 TAGGTAGAGG ACTTTTCTTT TGGTTTTGTT TTGTTTTGTT TTGTTTTGTT 
20 51 TTTGGTTTTA CAGAGAAGAG ATTTTTATTA CAAAGAAAAA AATTCCAGTG 
2101 AATTGTGCAG AAATGCTGGT TTTTACACCA TCCTAAAGAA AAACTTTACA 
2151 AGGGTGTTTT GGAGTAGAAA AAAGGTTATA AAGTTGGAAT CTTAAATTGT 
2201 AAAATTAACC ATTGAGTGTC AAAGTTCTAA AAGCAGAACT CATTTCGTGC 
22 51 AATGAACATA AGGAAAGAC T ACTGTATAGG TTTTTTTTTT TCTCCTTTTA 
2301 AATGAAGAAA AGCTTTGCTT AAGGGTTGCA TACTTTTATT GGAGTAAATC 
2351 TGAATGATCC TACTCCTTTG GAGTAAGACT AGTGCTTACC AGTTTCCAAT 
24 01 TGTATTTAGC TTCTGTTGGA ATTTGAAAAA AAAAAAAAAA AAAA 

BLAST Results 
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Entry HS338352 from database EMBL: 
human STS EST171398 . 
Score = 1747, P = 3.0e-74, identities = 359/365 

Entry HS447255 from database EMBL: 
human STS SHGC-10143. 
Score = 1717, P = 6.5e-73, identities = 365/383 

Entry HS30M3 from database EMBLNEW: 

Human DNA sequence from clone 30M3 on chromosome 6p22.1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar 
to (predicted) plant, worm, yeast and archaea bacterial genes, and the 
first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands . 

Score = 6646, P = 0.0e+00, identities = 1344/1355 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 539 bp to 1225 bp; peptide length: 229 
Category: putative protein 



1 MGDPNSRKKQ ALNRLRAQLR KKKESLADQF DFKMYIAFVF KEKKKKSALF 
51 EVSEVIPVMT NNYEENILKG VRDSSYSLES SLELLQKDW QLHAPRYQSM 
101 RRDVIGCTQE MDFILWPRND IEKIVCLLFS RWKESDEPFR PVQAKFEFHH 
151 GDYEKQFLHV LSRKDKTGIV VNNPNQSVFL FIDRQHLQTP KMKATIFKLC 
201 SICLYLPQEQ LTHWAVGTIE DHLRPYMPE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2gl8, frame 2 

TREMBLNEW : HS30M3_2 gene: "dJ30M3.2"; product: "dJ30M3.2 (novel 
protein)"; Human DNA sequence from clone 30M3 on chromosome 
6p22.1-22.3. Contains three novel genes, one similar to C. elegans 
Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains 
ESTs, GSSs and putative CpG islands., N = 1, Score = 470, P = l.le-44 



>TREMBLNEW:HS30M3_2 gene: "dJ30M3.2"; product: "dJ30M3 . 2 (novel protein)"; 
Human DNA sequence from clone 30M3 on chromosome 6p22.1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar to 
(predicted) plant, worm, yeast and archaea bacterial genes, and the first 
exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG islands. 
Length =86 

HSPs: 

Score = 470 (70.5 bits), Expect = l.le-44, P = l.le-44 
Identities = 86/86 (100%), Positives = 86/86 (100%) 

Query: 144 AKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 203 

AKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFLFIDRQHLQTPKNKATIFKLCSIC 
Sbjct: 1 AKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFLFIDRQHLQTPKNKAT I FKLCSIC 60 

Query: 204 LYLPQEQLTHWAVGTIEDHLRPYMPE 229 

LYLPQEQLTHWAVGTIEDHLRPYMPE 
Sbjct: 61 LYLPQEQLTHWAVGTIEDHLRPYMPE 86 



pedant information for DKFZphfbr2_2gl8, frame 2 



Report for DKFZphfbr2_2gl8.2 
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[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
DNA sequence 
similar to C 
bacterial genes 
islands. 6e-47 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE) 
[PROSITE] 
[PROSITE] 
[KW] 



229 

27083.42 
9.04 

TREMBL:HS30M3_2 gene: "dJ30M3 . 2" ; product: "dJ30M3.2 (novel protein)"; Human 
from clone 30M3 on chromosome 6p22.1-22.3. Contains three novel genes, one 
elegans Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
and the first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 



[KW] 



MYRISTYL 2 

CAMP_PHOSPHO_SITE 2 

CK2_PH0SPH0_SITE 4 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 4 

ASNJ3LYCOSYLATION 1 

Alpha_Beta 

LOW COMPLEXITY 5.24 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MGDPNSRKKQALNRLRAQLRKKKESLADQFDFKMYIAFVFKEKKKKSALFEVSEVIPVMT 

cccccchhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhheeeeec 

NNYEENILKGVRDSSYSLESSLELLQKDVVQLHAPRYQSMRRDVIGCTQEMDFILWPRND 

xxxxxxxxxxxx 

cchhhhhhhcccccccccchhhhhhhhhhhhhhccccccccceeecccccceeeecccch 

IEKIVCLLFSRWKESDEPFRPVQAKFEFHHGDYEKQFLHVLSRKDKTGIVVNNPNQSVFL 

hhhhhhhhhhhccccccccccccccccccccchhhhhhhhhhhcccceeeeccccceeee 

FI DRQHLQTPKNKATI FKLCS ICLYL PQEQLTHWAVGTI EDHLRPYMPE 

eeecccccccccceeeeeeeeeeeeeccccccccceeeecccccccccc 



Prosite for DKFZphfbr2_2gl8 .2 



PS00001 


175->179 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


22->26 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


44->43 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00005 


6->9 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


99->102 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


162->165 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


189->192 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


25->29 


CK2 PHOSPHO SITE 


PDOC00C06 


PS00006 


80->84 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


162->166 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


2I8->222 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


69->77 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


70->76 


MYRISTYL 


PDOC00008 


PS00008 


168->174 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphf br2_2gl8 . 2) 
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DKFZphfbr2_2hl 



group: brain derived 

DKFZphfbr2_2hl encodes a novel 180 amino acid protein with weak similarity to C.elegans 
D2007.4 protein 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C.elegans D2007.4 protein 
CpG island in 5' region, complete cDNA 
Sequenced by Qiagen 
Locus : unknown 
Insert length: 957 bp 

Poly A stretch at pos. 939, polyadenylation signal at pos . 91S 

1 GGGGGTCCCT GACTTTATAT GGCTGCTCCT GGCGAGCGAC TGAGTCGTCC 

51 GTGAGGAAAA AGAGGCGAGG CTTTTCCGAG ATCGTCTCAG CGATGGCGCT 

101 TCGGTCGCGG TTTTGGGGGT TGTTCTCGGT TTGCAGGAAC CCTGGGTGCA 

151 GGTTCGCAGC CCTGTCAACC AGCTCCGAGC CGGCAGCGAA ACCTGAAGTG 

201 GACCCTGTGG AAAATGAAGC TGTCGCCCCA GAATTCACCA ACCGGAACCC 

251 CCGGAACCTG GAGCTTTTGT CTGTAGCCAG GAAAGAGCGG GGCTGGCGGA 

301 CGGTGTTTCC CTCCCGTGAG TTCTGGCACA GGTTGCGAGT TATAAGGACT 

351 CAGCATCATG TAGAAGCACT TGTGGAGCAT CAGAATGGCA AGGTTGTGGT 

401 TTCGGCCTCC ACTCGTGAGT GGGCTATTAA AAAGCACCTT TATAGTACCA 

451 GAAATGTGGT GGCTTGTGAG AGTATAGGAC GAGTGCTGGC ACAGAGATGC 

501 TTAGAGGCGG GAATCAACTT CATGGTCTAC CAACCAACCC CGTGGGAGGC 

551 AGCCTCAGAC TCGATGAAAC GACTACAAAG TGCCATGACA GAAGGTGGTG 

601 TGGTTCTACG GGAACCTCAG AGAATCTATG AATAAATGGA AGCATTAATT 

651 GTTTTGAACA TGTAAATATA AATCTGTCAG CCACTACAGC CATCAAAAGA 

701 GAGCATCTGG AAGAACAGCC AGCTTGGAAG TTTTACAGCA ATAATGTTGC 

751 AGTGGAATAT TATTTGTAGT TAAGGTCATC CTCCTCCCCT TTCTGTTTTT 

801 TTAAATCAAG AACTACGTTC TGCCCCTCTC TTGGGCTTCA GAAGCATCTA 

851 AGAAAAGCAG TCATCAATTA TAATTAACTT TCAAAGGGCA AGTCAGAAGT 

901 TGTTTATAAA TTACAAAATA AAGGCATATT ATGAACTCTA AAAAAAAAAA 

951 AAAAAAA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 93 bp to 632 bp; peptide length: 180 
Category: similarity to known protein 
Classification: unset 



1 MALRSRFWGL FSVCRNPGCR FAALSTSSEP AAKPEVDPVE NEAVAPEFTN 
51 RNPRNLELLS VARKERGWRT VFPSREFWHR LRVIRTQHHV EALVEHQNGK 
101 VVVSASTREW AIKKHLYSTR NVVACESIGR VLAQRCLEAG INFMVYQPTP 
151 WEAASDSMKR LOSAMTEGGV VLREPQRIYE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2hl, frame 3 
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PIR:S44789 D2007.4 protein - Caenorhabditis elegans, N = 1, Score = 
194, p = 2e-15 

PIR:JC5753 ribosomal protein L18 - Vibrio proteolyticus, N = 1 , Score = 
121, P = l.le-07 



>PIR:S44789 D2007 . 4 protein - Caenorhabditis elegans 
Length = 170 



HSPs: 



Score = 194 (29.1 bits), Expect = 2.0e-15, P = 2.0e-15 
Identities = 51/134 (38%), Positives = 78/134 (58%) 



Query: 48 FTNRNPRNLELLSVARKERGWRTVFP — SREFWHRLRVIRTQHHVEA-LVEHQNGKVWS 104 

F NRNPRN EL+ G++ +R + +++ ++ + H E LV +Q+G VV+S 

SbjCt: 9 FVNRNPRNNELMGRQAPNTGYQFEKDRAARS YI YKVELVEGKSHREGRLVHYQDG- WIS 67 

Query: 105 ASTREWAIKKHLYSTRNWACESIGRVLAQRCLEAGINFMVYQPTPMEAASDSMKRLQ— 162 

AST+E +1 LYS + A +IGRVLA RCL++GI+F + T EA S + 
SbjCt: 68 ASTKEPSIASQLYSKTDTSAALNIGRVLALRCLQSGIHFAMPGATK-EAIEKSQHQTHFF 126 

Query: 163 SAMTEGGVVLREPQRI 178 

A+ E G+ L+EP + 
Sbjct: 127 KALEEEGLTLKEPAHV 142 



Pedant information for DKFZphfbr2_2hl, frame 3 



Report for DKFZphf br2_2hl . 3 



[LENGTH] 180 

[MW] 20576.57 

[pi] 9.63 

[HOMOL] PIR:S44789 D2007.4 protein - Caenorhabditis elegans 2e-13 

[FUNCAT] j mrna translation and ribosome biogenesis [H. influenzae, HI0794] 2e-04 

[SUPFAM] Escherichia coli ribosomal protein L18 8e-06 

[KW] Alpha_Beta 



SEQ MALRSRFWGLFSVCRNPGCRFAALSTSSEPAAKPEVDPVENEAVAPEFTNRNPRNLELLS 

PRD ccccccceeeeeeeecccccceeeecccccccccccccccceeeecccccccccchhhhh 

SEQ VARKERGWRTVFPSREFWHRLRVIRTQHHVEALVEHQNGKVVVSASTREWAIKKHLYSTR 

PRD hhhhcccccccchhhhhhhhhhccccchhhhhhhhhcccceeeeechhhhhhhhhhhhcc 

SEQ NVVACESIGRVLAQRCLEAGINFMVYQPTPWEAASCSMKRLQSAMTEGGVVLREPQRIYE 

PRD ccceeehhhhhhhhhhhhhcceeeeeccccchhhhhhhhhhhhhhhccceeecccccccc 



(No Prosite data available for DKFZphf br2_2hl . 3) 
(No Pfam data available for DKFZphf br2_2hl .3) 
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DKFZphfbr2_2hlO 
group: brain derived 

DKFZphfbr2_2hlO encodes a novel 220 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length; 2176 bp 

Poly A stretch at pos . 2161, polyadenylation signal at pos. 2143 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1351 
1901 
1951 
2001 
2051 
2101 
2151 



TGGGGAGTAT 
CTTTGTTATA 
ATAATAAAAA 
TTAATTATTT 
AAACATCAAA 
AAAAATGGAT 
TGAAGAAAGC 
ACCAGGAAAC 
GCAGATAAAG 
GGAGTTGAAG 
TTAAAGATCA 
GATTTGCGAT 
AAGACATGAA 
TTCATTTGGA 
GAGATTCTTC 
AGTGTCTGAT 
CAGAGGCAAC 
TCATTTTAAG 
TCAGGTGAGC 
GGGAGGTCAC 
AAACCAATTC 
TTAAAAACAC 
TCTCATTTAA 
GATGATATTA 
GCT AAAAATC 
GGATTTGGAA 
TGATTGGCAT 
TGGAGCATGC 
TAAACTTAAT 
AGAATATTTG 
TAAGCTTACC 
AATTTATATT 
GATGTATTAG 
GTAGCTATTA 
CCCATAAAAA 
TTGTAAAGCA 
TAAGTGACTT 
ACAGTGATGT 
AGATGGCTGC 
TAGATATAGA 
AATAAACTTT 
GGATTCTTAA 
AAAATACTCT 
ATCTGCCCCC 



TCTAATTATA 
TTAAGTTGCA 
TAGTTTTAAG 
CAGGAAATTG 
TTCTGAAAGT 
GTAATGCAAA 
ATAACAGTTT 
TGGCTCAAAA 
TGGAAAAGAA 
ATGTGCCAGA 
CAACTCCAGT 
TAGCATCAGA 
CCTGAATCTG 
ACCTCTGACT 
AGAAAGGTAG 
CAAACAGATG 
AGTAGACCTG 
TTTCAGTGTA 
TCAGTGGTGC 
ACATACACTT 
ACCAATAATA 
GAACAGGATT 
TGGTTTTCAA 
GTAATATTTA 
TTTATTACCT 
ACCATGTACT 
TGTTAATGAA 
TTAGAGTACT 
TTTGGATTTA 
AATCTTTGIA 
TTAATTAAAC 
TGTGCTTTTT 
TTGTATTAAT 
TGCTTTTAAT 
TGGTCTGGAA 
GTAAACTGAA 
TTCTGTAATT 
AAAAGAACTG 
TAAATTGATT 
AATGAATAAT 
TATTTAAGAT 
ATTTTGTTTC 
GCTGTATATA 
GAAAAAAAAA 



TTTTATATTT 
CACTTGTTTC 
ATTAACTGTG 
CAAGACCTAA 
AAACAAGATA 
TTCATTTGAA 
CAGATAAGGA 
AACATCGTCA 
AAAACAAATA 
GTTCAGAAAA 
GAAGCCAGAT 
TAATGTAAGC 
TTAGTTCTGA 
CCATCCGAGG 
TGGTGATCCT 
ACATTCCTGG 
GAAGATGAAA 
CCAACGATAA 
TGTTGTAGGT 
TACCTGTATG 
GCATGATTAG 
TTAATGATAA 
GGAAATGGGA 
TAAAGCCTTT 
GTATATCCTT 
TTTGGGGAGT 
GGCTTTATTT 
AAATTGATCT 
ATATAACATT 
CCTCCATACA 
TTTCAGTGAA 
GTCAGTGTGT 
GTAAAGTAGA 
ATTGTTTTAA 
GCCAAACCAA 
AACATGTCCT 
GTAAAATAAA 
GTTTTGGAAA 
TTTCAGTTCT 
ATGAAGAACA 
GCTTCATTTT 
ACTGAATGTT 
GTAGTTTTTG 
AAAAAA 



AATAAATTAT 
TTTTATCCAG 
AATGTAAAGG 
CATGGCTGAA 
AAGCTGCTTC 
GGCTCATCAA 
AAATGAAACC 
GTTGTGATTC 
CAACACGTTT 
CATAATCTTA 
7TTCTTCAAA 
ATTGATCAGT 
TGTTAGCGAG 
TACTTGAGTA 
TCAGCCAAGA 
AGGAAATAAC 
AAGAAACAAG 
GGGCATTTGG 
TCAGAAATGG 
TTCAACCTAT 
TAGGGATTCC 
TTAAATTTGC 
TTTGGTTGCT 
CAAACTTCCA 
TTCAGTTAAC 
AATTGATTAA 
GTGAGGATGA 
AATGAGAATT 
CCAGTCAGAC 
AGTGTTAGCC 
AGTGGAATTA 
AAGCTGTGTA 
AACCCATTGT 
TGTTCTTCCT 
AGTATGGTAT 
GGCATGTATT 
AACTTCAAAT 
TTTAGCCTAA 
TTTTATCATC 
GTAGTTTGCT 
TACTTCTTAA 
CAATGTTTTA 
AGTAAATATT 



TTTTCTATTT 
AAAGTTTAGT 
AAAAGTATTA 
AGAGAAACAG 
TTCAAAAGAA 
CAACAAAAAG 
TGTCTTGCAG 
AAATATTGGT 
GTCAGGAAAT 
TCTGATCAGA 
GAATATTAAG 
TTTTGAGAAA 
CAAGGCAGTA 
IGAAGCCACA 
CTGATGAAGT 
CCTAGCACAA 
TTGAAATTAC 
AACAGTGCTA 
AAATATGTAA 
GTTATCAAAC 
CAAAAAGTTT 
AGTGGAAAGG 
GACATGAATT 
TCAATCCTAA 
TGAGAGGAAG 
AAACAATGGC 
TGCTGGTAAA 
TGGATGAACA 
GCATGTAAAC 
TGCCAGGCTG 
TTAAGATATA 
GAAATTCTTT 
TGAAACTCCT 
TAGAAATAGG 
AATGTAGATA 
CAGCCATGTT 
GGGACCTAAA 
TTTATCTATA 
TAAAATATAA 
TTGAAATACT 
AACGTGCTTT 
AATGGCGATT 
TGCAATAAAA 



BLAST Results 



Entry G35287 from database EMBL: 
human STS SHGC-37375. 
Score = 2163, P = 2.8e-91, identities = 437/441 



230 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 182 bp to 841 bp; peptide length: 220 
Category: putative protein 



1 MAERETETSN SESKQDKAAS SKEKNGCNAN SFEGSSTTKS EESITVSDKE 
51 NETCLADQET GSKNIVSCDS NIGADKVEKK KQIQHVCQEM ELKMCQSSEN 
101 IILSDQIKDH NSSEARFSSK NIKDLRLASD NVSIDQFLRK RHEPESVSSD 
151 VSEQGSIHLE PLTPSEVLEY EATEILQKGS GDPSAKTDEV VSDQTDDIPG 
201 GNNPSTTEAT VDLEDEKERS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2hl0, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_2hlO, frame 2 



Report for DKFZphf br2_2hl0 . 2 



[LENGTH] 

[MW] 

[pi] 

[FUNCAT] 

[FUNCAT] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 



220 

24109.02 
4.51 

04.99 other transcription activities 



30.10 nuclear organization 
MYRISTYL 3 
CK2_PH0SPHO_SITE 8 
PKC_PH0SPHO_SITE 5 
ASN_GLYCOS YLAT I ON 3 

TNFR/NGFR cysteine-rich region 
Alpha_Beta 



[S. cerevisiae, YKR092C] 4e-05 



[S. cerevisiae, YKR092c] 4e-05 



SEQ MAERETETSNSESKQDKAASSKEKNGCNANSFEGSSTTKSEESITVSDKENETCLADQET 

PRD cccccccccccccchhhhhhhhccccccccccccccccceeeeeeeeccccccccccccc 

SEQ GSKNIVSCDSNIGADKVEKKKQIQHVCQEMELKMCQSSENIILSDQIKDHNSSEARFSSK 

PRD cccceeeecccccchhhhhhhhhhhhhhhhhhhhhhccceeeeccccccccccccccccc 

SEQ NIKDLRLASDNVSIDQFLRKRHEPESVSSDVSEQGSIHLEPLTPSEVLEYEATEILQKGS 

PRD cchhhhhhcccchhhhhhhhcccccccccccccccceeecccccccchhhhhhhcccccc 

SEQ GDPSAKTDEVVSDQTDDI PGGNNPSTTEATVDLEDEKERS 

PRD ccccccccccccccccccccccccccceeeehhhhhhccc 



Prosite for DKFZphf br2_2hl0 . 2 



PS00001 


51 


->55 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


111- 


>115 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


131- 


>135 


ASN" 


"GLYCOSYLATION 


PDOC00001 


PS00005 


20 


->23 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00005 


37 


->40 


PKC" 


"PHOSPHO_SITE 


PDOC00005 


PS00005 


47 


->50 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


118- 


>121 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


184- 


>187 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00006 


9 


->13 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


13 


->17 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


20 


->24 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00005 


38 


->42 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


45 


->49 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


47 


->51 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


163- 


>167 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


205- 


>209 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00008 


26 


->32 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 



34->40 
201->207 



MYRISTYL 
MYRISTYL 



PDOC00008 
PDOC00003 



Pfam for DKFZphfbr2_2hlO . 2 



HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeG. tYtD.WNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

+ E+ T +D +N ++C E G+ + +C+++ + 

Query 40 SEESITVSDKEN— ETC— LADQET— GSKNIVSCDSNIGADK 



76 
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group: intracellular transport and trafficking 

DKFZphfbr2_2il7 . 3 encodes a novel 201 amino acid putative GTP-binding protein related to 
RablB. 

Rab proteins are members of the Ras superfamily of GTPases. Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic ) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. RablB is essential for the intracellular 
transport of nascent low density lipoprotein (LDL) receptor. It is discussed as a universal 
mediator of endoplasmatic reticulum to Golgi transport of membrane glycoproteins in mammalian 
cells . 

The new protein can find clinical application in modulating the transport of glycoproteins 
inside cells, especially of the LDL receptor. 



Medline 

96245776: Intracellular transport and maturation of nascent low density 
lipoprotein receptor is blocked by mutation in the Ras-related 
GTP-binding protein, RAB IB 



strong similarity to rabl 

complete cDNA, complete cds, start at 47, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1985 bp 

Poly A stretch at pos . 1901, polyadenylation signal at pos. 1859 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 



GGGAGCAGAG 
AACCCCGAAT 
CGTGGGCAAG 
AGAGCTACAT 
CTGGATGGCA 
ACGGTTCCGG 
TCGTGGTGTA 
TGGCTGCAGG 
GGTGGGCAAC 
CAGCCAAGGA 
GCCAAGAATG 
AATCAAAAAG 
ATCTCAAGAT 
TAGGAGGGGC 
GTCCCTGGAG 
GAGTCTGTGG 
CTGCCTGCTG 
CTCAGGGCCT 
AGGTGACTTT 
TCTTCTGTCG 
CCTTCTTCTT 
TACTGCCCTG 
TCCAGGACCT 
AGGCCACAGG 
TCCCTTCCTA 
TATATCTAGG 
CTGGAGGGAG 
TGCGCTGCCT 
AAGGCCCACC 
AGCACCTCCT 
CGTCTCTGGA 
TGGGGAATGT 
CTGCACCCAG 
AGAAAGCAAG 
CTTGCCTGTC 
GGCTGCCCCC 
GTCAGGCATT 
AAAAAAATTA 
CAAGGAAAAA 
AAAAAAAAAA 



TCGACTGGGA 
ATGACTACCT 
TCATGCCTGC 
CAGCACCATC 
AAACTATCAA 
ACCATCACTT 
TGACGTCACT 
AGATTGACCG 
AAGAGCGACC 
GTTTGCAGAC 
CCACCAATGT 
CGGATGGGGC 
CGACAGCACC 
ACATGGAGTG 
GGGGGAGGAG 
CTTTGGGGTG 
CCCTGAGCCC 
GTGGCCAGGC 
CCAAGATGCC 
GTGTCCCTCC 
CCTGCTGTCC 
GCTGCAGTCA 
GGGATCCAGG 
GGCCCAGCAG 
CACTCCCAGC 
GCGGGTGGGC 
ACTGCTCCTG 
GCCCACCGTG 
GTGCCAGAGG 
CCCTTTCCCC 
GCCCACCAGG 
GGGTTCCATC 
GATCCTAGTC 
TCTTTGGTCT 
CCACCTGTGC 
TTGGGGTGCC 
TTGCAAGGAA 
ATAAATTTCC 
AAAAAAAAAA 
AAAAGAAAAA 



GCGACCGAGC 
GTTTAAGCTG 
TCCTGCGGTT 
GGCCTCCACT 
ACTTCAGATC 
CCAGCTACTA 
GACCAGGAAT 
CTATGCCAGC 
TCACCACCAA 
TCTCTGGGCA 
CGAGCAGGCG 
CTGGAGCAGC 
CCTGTAAAGC 
GGACAGGAGG 
GTACCTCCCT 
TCCTGGGCTC 
CGGTTCTGTC 
AGGGCGGAGG 
CCCCTACACA 
CACCCCCATG 
TGCCCAAGAA 
GTGCCCAGGG 
GCCCTGGGCT 
CCCACCCTTT 
TCGAGCCGTC 
GGGTAGCAGT 
CCGCCCTCTG 
CCCCTTTGTC 
CTGGGCACCA 
AAGGTAGCAC 
GAAGGCCCTC 
CAGGATTGGG 
CCCTGCCCTC 
CCCTGAGAAG 
CCTGCCCTCC 
CCCCGCTCCC 
AAGCCACTTG 
ATTGGCCCTC 
AAAAAAAAAA 
AAAAAAAAAA 



GGGCCGCCGC 
CTTTTGATTG 
TGCTGATGAC 
TCAAGATCCG 
TGGGACACAG 
CCGGGGGGCT 
CCTACGCCAA 
GAGAACGTCA 
GAAGGTGGTG 
TCCCCTTCTT 
TTCATGACCA 
CTCTGGGGGC 
CGGCTGGCGG 
GGGCACCTTC 
CTCCCTCTCC 
CCCATCTCCT 
AGGGTCCCTA 
CCTGCTGTGC 
CCTTTCTTTG 
TATGCTGCAC 
CTGAGGGTCT 
CGAGGAATGT 
GGACCTCAGG 
CCTCTCCCCA 
CAGCTGCGGT 
GCTGGGCCTG 
CCCTGCCGGA 
CCCATGTCAG 
GCCTTAACCC 
ATCTGGCTCA 
ATCCCCTGCC 
GGCCTCTCTG 
TGGCACAGCT 
CCATGTCCCT 
AGCTTGTATT 
AGGTTCCCCT 
GGGAAAGATG 
GGGTGAGCTG 
AAAAAAAAAA 
AAAAA 



CGCCGCCATG 
GCGACTCAGG 
ACGTACACAG 
AACCATCGAG 
CGGGCCAGGA 
CATGGCATCA 
CGTGAAGCAG 
ATAAGCTCCT 
GACAACACCA 
GGAGACGAGC 
TGGCTGCTGA 
GAGCGGCCCA 
TGGCTGTTGC 
TCCAGATGAT 
TGGGGCATTT 
TCTGGCCCAT 
AGGGAGGACA 
AGTTGCCTCT 
GAACGAGGGC 
TGGGTTCTCT 
CCCCGGCCTC 
GGCCAGGGGA 
ACAGGCATGG 
CTGCCTCCTC 
GGGATCTGAG 
TGTCTTGAGC 
GACAGACCCA 
GCGGAGGCGG 
TCACTCTGCT 
CTCCCCACTC 
GCTACTTCTC 
CTCACCCACT 
GCTTCCTGCA 
CGTGCTGTCT 
TAAGTCCCTG 
CTGGTGTCAT 
GAAAAGGACA 
AGGGTTTTTG 
AAAAAAAAAA 
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BLAST Results 



No BLAST result 



Medline entries 



91115900: 

A family of ras-like GTP-binding proteins expressed in electromotor 
neurons . 



Peptide information for frame 3 



ORF from 48 bp to 650 bp; peptide length: 201 
Category: strong similarity to known protein 



1 MNPEYDYLFK LLLIGDSGVG KSCLLLRFAD DTYTESYIST IGVDFKIRTI 
51 ELDGKTIKLQ IWDTAGQERF RTITSSYYRG AHGIIVVYDV TDQESYANVK 
101 QWLQEIDRYA SENVNKLLVG NKSDLTTKKV VDNTTAKEFA DSLGT PFLET 
151 SAKNATNVEQ AFMTMAAEIK KRMGPGAASG GERPNLKIDS TPVKPAGGGC 
201 C 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2il7, frame 3 

SWISSPR0T:RB1B_RAT RAS-RELATED PROTEIN RAB-1B., N = 1, Score = 1023, P 
= 2.7e-103 

PIR:S06147 GTP-binding protein rablB - rat, N - 1, Score - 1013, P = 
3.2e-102 

SWISSPROT:RABl_DISOM RAS-RELATED PROTEIN ORAB-1., N = 1, Score = 967, P 
= 2.4e-97 

PIRrTVHUYP GTP-binding protein Rabl - human, N = 1, Score = 966, P = 
3e-97 



>SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-1B. 
Length = 201 

HSPs: 



Score = 1023 (153.5 bits), Expect = 2.7e-103, P = 2.7e-103 
Identities = 197/201 (98%), Positives = 199/201 (991) 



Query: 


1 


MN PE YD YLFKLLLIGDSGVGKSCLLLRFADDTYTESY I ST IGVDFKIRTI ELDGKTIKLQ 


60 






MN PE YDYLFKLLL I GDSGVGKSCLLLRFADDTYTESYI ST IGVDFKIRTI ELDGKTIKLQ 




Sbjct: 


1 


MNPEYDYLFKLLL I GDSGVGKSCLLLRFADDTYTESYI ST IGVDFKIRTI ELDGKTIKLQ 


60 


Query: 


61 


IWDTAGQERFRTITSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 


120 






IWDTAGQERFRT+TSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 




Sbjct : 


61 


IWDTAGQERFRTVTSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 


120 


Query: 


121 


NKSDLTTKKVVDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 


180 






NKSDLTTKKVVDNTTAKEFADSLG+PFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 




Sbjct: 


121 


NKSDLTTKKVVDNTTAKEFADSLGVPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 


180 


Query: 


181 


GERPNLKI DSTPVKPAGGGCC 201 








GERPNLKIDSTPVK A GGCC 




Sbjct: 


181 


GERPNLKI DSTPVKSASGGCC 201 





Pedant information for DKFZphfbr2_2il7, frame 3 



Report for DKFZphfbr2_2il7 . 3 

[LENGTH] 201 
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[MW] 
[pi] 
[HOMOL] 
[FUNCAT] 
2e-77 
[FUNCAT] 
[FUNCAT] 
YFL005w] 
[ FUNCAT] 
[FUNCAT] 
4e-57 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YGL210w] 
[FUNCAT] 
le-30 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
3e-25 
[FUNCAT] 
3e-25 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

[S. 

[ FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[ FUNCAT] 
palmitylati 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 



4e-57 



le-44 



22171.25 
5.56 

SWISSPR0T:RB1B_RAT RAS- RELATED PROTEIN RAB-1B. le-112 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YFL038c] 

30.08 organization of golgi IS. cerevisiae, YFL038c] 2e-77 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 

30.02 organization of plasma membrane [S. cerevisiae, YFL005w] 4e-57 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL005w] 

08.19 cellular import [S. cerevisiae, YER031c] 8e-46 

08.13 vacuolar transport [S. cerevisiae, YER031c] 8e-46 

09.09 biogenesis of intracellular transport vesicles [S. cerevisiae, 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YOR089C] 



03.10 sporulation and germination [S. cerevisiae, YNL098c] 3e-25 
11.01 stress response [S. cerevisiae, YNL098c] 3e-25 

03.99 other cell growth, cell division and dna synthesis activities [S. 
YNL098C] 3e-25 

01.03.13 regulation of nucleotide metabolism [S. cerevisiae, YNL098c] 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YNL098c] 



10.04.07 g-proteins [S. cerevisiae, YNL098c] 3e-25 

03.22 cell cycle control and mitosis (S. cerevisiae, YNL098C] 3e-25 

30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 9e-24 
11.10 cell death [S. cerevisiae, YORlOlw] 9e-24 

04.07 rna transport [S. cerevisiae,- YOR185c] 4e-23 

30.10 nuclear organization [S. cerevisiae, YOR185c] 4e-23 

08.01 nuclear transport (S. cerevisiae, YOR185c] 4e-23 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-17 
10.02.07 g-proteins [S. cerevisiae, YPR165w] 7e-17 

10.99 other signal-transduction activities [S. cerevisiae, YCR027c] le-16 
03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c] le-11 

10.05.07 g-proteins [S. cerevisiae, YLR229c] le'-ll 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w] 4e-10 
03.01 cell growth [S. cerevisiae, YNL180c] 9e-09 

06.07 protein modification (glycolsylation, acylation, myristylation, 
on, f arnesylation and processing) (S. cerevisiae, YPL051w) 3e-08 

99 unclassified proteins [S. cerevisiae, YAL048c] 5e-05 

BL01019A ADP-ribosylation factors family proteins 
BL01115A GTP-binding nuclear protein ran proteins 

dlplk 3.25.1.3.1 cH-p21 Ras protein (human (Homo sapiens) 2e-41 

dlguaa_ 3.25.1.3.10 RaplA (Human (Homo sapiens) 5e-60 
dlrrga_ 3.25.1.3.5 ADP-ribosylation factor 1 (ARF1) [rat (Rattu 2e-30 
dlhura_ 3.25.1.3.4 ADP-ribosylation factor 1 (ARF1) [human (Horn 2e-33 
nucleus le-21 

membrane trafficking le-110 
oncogene le-25 

endoplasmic reticulum le-105 
phosphoprotein le-105 
glycoprotein 3e-25 
prenylated cysteine le-110 
signal transduction 4e-23 
transforming protein le-105 
purine nucleotide binding 2e-24 
alternative splicing 5e-26 
P-loop le-110 
lipoprotein le-110 
proto-oncogene 3e-27 
methylated carboxyl end 3e-27 
hydrolase 7e-25 
membrane protein le-105 
GTP binding le-110 
thiolester bond 5e-76 
Golgi apparatus le-105 
ras transforming protein le-110 
ATP_GTP_A 1 
MYRISTYL 2 
CK2_PHOSPHO_SITE 5 
SIGMA54_INTERACT_1 1 
TYR_PHOSPHO_SITE 1 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 3 

Ras family (contains ATP/GTP binding P-loop) 
Alpha_Beta 
3D 
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SEQ MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIKLQ 

22 lp- EEEEEEETTTTCHHHHHHHHHHCCCCCCCCCTTTEEEE-EEEEETTEEEEEE 

SEQ IWDTAGQERFRTITSSYYRGAHGI I VVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 

2 2 lp- EEECTTTTTTCGGGHHHHHHCCEEEEEEETTBHHHHHHHHHHHHHHHHHHTTTTCEEEEE 

SEQ NKSDLTTKKVVDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 

22 lp- ETTTTCCC-CCCHHHHHHHHHHCCCCEEEETTTTTTTHHHHHHHHHHHHHH 

SEQ GERPNLKI DSTPVKPAGGGCC 

221p- 



Prosite for DKFZphf br2_2il7 . 3 



PS00001 


121->125 


ASN GLYCOS YLAT ION 


PDOC00001 


psooooi 


133->137 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


154->15B 


ASN GLYCOS YLAT ION 


PDOC00001 


PS00002 


17->21 


GLYCOSANINOGLYCAN 


PDOC00002 


PS00005 


56->59 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


126->129 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


135->138 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


151->154 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


32->36 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


91->95 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


135->139 


CK2 PHOSPHO SITE 


PDOC0 0C0 6 


PS00006 


156->160 


CK2 PHOSPHO SITE 


PDOC00006 


PS000Q6 


179->183 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


27->34 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


18->24 


MYRTSTYL 


PDOC00C03 


PS00008 


176->182 


MYRI ST YL 


PDOCUUUOti 


PS00017 


15->23 


ATP GTP A 


PDOC00C17 


PS00675 


ll->25 


SIGMA5 4 INTERACT 1 


PDOCU05 7 9 



Pfam for DKFZphfbr2_2il7 . 3 



HMMJJAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 



10 



*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
KL+LIGDSGVGKSCLL+RF +++++E+YI+TIGVDF+++TIE+DGKTIK 
KLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIK 



58 



LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
LQIWDTAGQER+R+++++YYRGA+G+++VYD+T+++S+ N+++W++EI+R 
59 LQIWDTAGQERFRTITSSYYRGAHGIIVVYDVTDQESYANVKQWLQEIDR 108 

HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+++ ENV ++LVGNK+DL +++V+ +++EFA+++G IPF+ETSAK++ 
109 YAS — ENVNKLLVGNKSDLTTKKVVDNTTAKEFADSLG-IPFLETSAKNA 155 

iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk. . . rCCCIM* 
+NVE+AFM+++ EI++RM+ +++E +N++ +S++ K +CC 
156 TNVEQAFMTMAAE I KKRMGPGAASGGERPNLKI DST PVKPAGGGCC — 201 
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DKFZphfbr2_2kl9 



group: brain derived 

DKFZphfbr2_2kl9 encodes a novel 303 amino acid protein with similarity to human KIAA0378 
product . 

The protein contains a leucine zipper, which can mediate protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to KIAA0378 

encoded by the genomic clones HS147M19/HS608E8 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1931 bp 

Poly A stretch at pos . 1866, no polyadenylation signal found 



1 GGGGGGGGCG CGCGGTGACA GCGCGGGGTT GGCGGCGTGG GACCCAGGGG 
51 GCGACAGAGG CAGCAGCAGC CCGAGGCCTG AGGAGAGGAG ACCGGCGGCG 
101 GCGGCAATGC TGGAGACCCT TCGCGAGCGG CTGCTGAGCG TGCAGCAGGA 
151 TTTCACCTCC GGGCTGAAGA CTTTAAGTGA CAAGTCAAGA GAAGCAAAAG 
201 TGAAAAGCAA ACCCAGGACT GTTCCATTTT TGCCAAAGTA CTCTGCTGGA 
251 TTAGAATTAC TTAGCAGGTA TGAGGATACA TGGGCTGCAC TTCACAGAAG 
301 AGCCAAAGAC TGTGCAAGTG CTGGAGAGCT GGTGGATAGC GAGGTGGTCA 
351 TGCTTTCTGC GCACTGGGAG AAGAAAAAGA CAAGCCTCGT GGAGCTGCAA 
4 01 GAGCAGCTCC AGCAGCTCCC AGCTTTAATC GCAGACTTAG AATCCATGAC 
451 AGCAAATCTG ACTCATTTAG AGGCGAGTTT TGAGGAGGTA GAGAACAACC 
501 TGCTGCATCT GGAAGACTTA TGTGGGCAGT GTGAATTAGA AAGATGCAAA 
551 CATATGCAGT CCCAGCAACT GGAGAATTAC AAGAAAAATA AGAGGAAGGA 
601 ACTTGAAACC TTCAAAGCTG AACTAGATGC AGAGCACGCC CAGAAGGTCC 
651 TGGAAATGGA GCACACCCAG CAAATGAAGC TGAAGGAGCG GCAGAAGTTT 
701 TTTGAGGAAG CCTTCCAGCA GGACATGGAG CAGTACCTGT CCACTGGCTA 
7 51 CCTGCAGATT GCAGAGCGGC GAGAGCCCAT AGGCAGCATG TCATCCATGG 
801 AAGTGAACGT GGACATGCTG GAGCAGATGG TCCTGATGGA CATATCGGAC 
851 CAGGAGGCCC TGGACGTCTT CCTGAACTCT GGAGGAGAAG AGAACACTGT 
901 GCTGTCCCCC GCCTTAGGTA GGGTTGACAA ACTTGCATTA GCTGAACCAG 
951 GGCAGTATCG ATGCCACTCC CCTCCAAAGG TGAGACGTGA GAACCATCTG 
1001 CCAGTCACTT ACGCATAAAC CCCCAAGCTC ACAGCCAGCT CCTGGCTCCC 
1051 TAACCCCACG GTTCCACACG GCTGTGTGGC AGCTGCAACA GTGGTGTGGT 
1101 TCCGTCATGA ATTCTTCTCA AAGATTTGAC ATGCTCCACT CCGGTAACTT 
1151 TGGTGAGTTG AGAGCTTTCT TGTTTGTTTT CCCTCCTTTA CCATCCAGAA 
12 01 ATCCATTTGA GTCTGCTCCT TGTGGTTAAG GACTGGCGTT TGCAGGGAGG 
1251 TGCGGACTCT CCTGCGGGGC TCACGGGAAA CTCTTCCCTC TTCGTGCGAC 
1301 AGGCATTTAG GGGCGTGCCT GCCATGGGCA AAGCCATGGT GTGTGTTCAG 
1351 CTCTTGGCCT GTGTTGTAAA CTTAGTTGCA CTTCAGTTCC TTTCATCCCT 
1401 TCACAAAATT TTGTTTCACA TTCATGCAGC AAATATGGGC TGAGGTGCCA 
1451 GACCTGTACC TGGGCTTGGT GCGTTTCAAA TTTCAGACCA GTTCTTTGGG 
1501 CTGGGTCAAG GCAAAGCICA GTCGTCCCAG CAGCACCTCA GCCATCTGTA 
1551 GAAGGTTCTA CCATTACCAC GGTTTCAGCT TCCTCTAAAC TTCTCACCCG 
1601 CTTCTCCTGG CAATCTGTCA GAACGGTGTC ATCCTGGGGA AGAGAAGGAG 
1651 CTTGGGTGCA TTTGCCCTCA TCCTGAGAAG GCCAGAATAC TGGAGACCAG 
1701 CGTGAACCCT CACCCAGAGT CAGGGGAAGA TTTAGAAACA GTGACACCTG 
1751 CATATAGAAT TTTGATTCCT TGAAGAGCCT ATTTAGTTCC ATAAAATTGG 
1801 AGAACTGCTG AAGGTCAGTA ATTCCGACTT TCTCAGCAGT GGTGTCTCTG 
1851 AATTACTGCA AAGGGTAAAA AAAAAAAAAA AAAAAACTTA TCGATACCGT 
1901 CGACCTCGAT GATGATGATG ATGATGTCGA C 



BLAST Results 



Entry HS147M19 from database EMBL: 

Homo sapiens DNA sequence from PAC 147M19 on chromosome 6p22. 1-22.3. 

Contains an unknown gene, ESTs and GSSs. 

Score = 5540, P = 4.1e-275, identities = 1114/1120 

3 exons 592-1884 

Entry HS608E8 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 608E8 
Score = 797, P = 1.2e-78, identities = 161/163 
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6 exons 1-592 



Medline entries 



90294724: 

The involucrin gene of the gibbon: The middle region shared by the 
hominoids 



Peptide information for frame 2 



ORF from 107 bp to 1015 bp; peptide length: 303 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (97-119) 



1 MLETLRERLL SVQQDFTSGL KTLSDKSREA KVKSKPRTVP FLPKYSAGLE 
51 LLSRYEDTWA ALHRRAKDCA SAGELVDSEV VMLSAHWEKK KTSLVELQEQ 
101 LQQLPALIAD LESMTANLTH LEASFEEVEN NLLHLEDLCG QCELERCKHM 
151 QSQQLENYKK NKRKELETFK AELDAEHAQK VLEMEHTQQM KLKERQKFFE 
201 EAFQQDMEQY LSTGYLQIAE RREPIGSMSS MEVNVDMLEQ MVLMDISDQE 
251 ALDVFLNSGG EENTVLSPAL GRVDKLALAE PGQYRCHSPP KVRRENHLPV 
301 TYA 

BLAST P hits 

No BLAST p hits available 

Alert BLASTP hits for DKFZphfbr2_2kl9, frame 2 

TREMBL : HSAB237 6_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, 
partial cds . , N = 1, Score = 137, p = 4.8e-06 

PIR: 137037 involucrin - common gibbon, N = 1, Score = 124, P = 7.4e-05 

PIR:A57013 early endosome antigen 1 - human, N = 1, Score = 128, P = 
9.5e-05 



>TREMBL:HSAB237 6_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, partial 
cds . 

Length = 808 

HSPs: 



Score 


= 137 


(20.6 bits), Expect = 4.8e-06, P = 4.8e-06 




Identities = 


= 59/222 (26%), Positives = 103/222 (46%) 




Query : 


2 


LETLRERLLSVQQDFTSGLKTL SDKSREAKVKS-KPRTVPFLPKYSAGLELLSRYED 


57 






L TL E L S ++ LK D+ R +++S + K +A L+ E 




Sbjct: 


434 


LATLEEAL-SEKERIIERLKEQRERDDRERLEEIESFRKENKDLKEKVNALQAELTEKES 


492 


Query: 


58 


TWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTAN 


117 






+ L A A SAG DS++ L E+KK +L+ QL++ ID M 




Sbjct: 


493 


SLI DLKEHASSLASAGLKRDSKLKSLEI AIEQKKEECSKLEAQLKKAHN- IEDDSRMNPE 


551 


Query: 


118 


LTHLEASFEEVENNLLHLEDLCG— QCELERCKHMQSQQLEN YKKNKRK ELETFKAE 


172 






++++ + D CG Q E++R + +++EN K +K K ELE+ 




Sbjct: 


552 


FAD— QIKQLDKEASYYRDECGKAQAEVDRLLEIL-KEVENEKNDKDKKIAELESLTLR 


607 


Query : 


173 


LDAEHAQKVLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAE 220 








+ +KV ++H QQ++ K+ + EE +++ ++ +LQI E 




Sbjct: 


608 


HMKDQNKKVANLKHNQQLEKKKNAQLLEEVRRREDSMADNSQHLQIEE 655 




Score 


= 100 


(15.0 bits), Expect = 6.2e-02, P = 6.0e-02 




Identities = 


- 44/156 (28%), Positives = 76/156 (48%) 




Query : 


57 


DTWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPAL-IADLESMT 


115 






D A+ +R +C A VD + +L E +K + +L+ L + D 




Sbjct : 


560 


DKEASYYR — DECGKAQAEVDRLLEILK-EVENEKNDKDKKIAELESLTLRHMKDQNKKV 


616 


Query : 


116 


ANLTHLEASFEEVENNLLHLEDLCGQCE — LERCKHMQSQQLENYKKNKRKELETFKAEL 


173 
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ANL H + E+ +N L LE++ + + + +H+Q ++L N + R+EL+ KA L 
Sbjct: 617 ANLKHNQ-QLEKKKNAQL-LEEVRRREDSMADNSQHLQIEELMNALEKTRQELDATKARL 674 

Query: 174 DAEHAQKVLEME-HTQQMKLKERQKFFEEAFQQDMEQYLS 212 

A Q + E E H +++ ER+K EE + E L+ 
Sbjct: 675 -ASTQQSLAEKEAHLANLRI-ERRKQLEEILEMKQEALLA 712 

Pedant information for DKFZphfbr2_2kl9, frame 2 



Report for DKFZphfbr2_2kl9.2 

[LENGTH] 303 

[MW] 34814.78 

[pi] 5.23 

[PROSITE] LEUCINE_ZIPPER 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 3.63 % 

[KW] COILED_COIL 14.52 % 

SEQ MLETLRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRTVPFLPKYSAGLELLSRYEDTWA 

SEG 

PRD ccchhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccchhhhhhhhhhhhchhh 
COILS 

SEQ ALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhchhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccccccccccccccccc 

SEQ LEASFEEVENNLLHLEDLCGQCELERCKHMQSQQLENYKKNKRKELETFKAELDAEHAQK 

SEG 

PRD hhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coils ccccccccccccccccccc 

SEQ VLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhh 

COILS 

SEQ MVLMDISDQEALDVFLNSGGEENTVLSPALGRVDKLALAEPGQYRCHSPPKVRRENHLPV 

SEG 

PRD hhhhhhchhhhhhhhhccccccceeeccccccccceecccccccccccccceeecccccc 

COILS 

SEQ TYA 
SEG 

PRD CCC 
COILS 

Prosite for DKFZphf br2_2kl9 . 2 
PS00029 97->119 LEUCINE_ZIPPER PDOC00029 

(No Pfam data available for DKFZphfbr2_2kl9.2) 
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group: cell cycle 

DKFZphfbr2_2kl4 encodes a novel 335 amino acid protein with strong similarity to rattus rattus 
IAG2 "implantation-associated protein" and the human N33 tumour-suppressor gene. 

Tumour-suppressor genes are known to be involved in the control of cell growth and division, 
interacting with proteins which control the cell cycle. The N33 gene is significantly 
methylated in tumour cells, a mechanism by which tumor-suppressor genes are inactivated in 
cancer. In addition, the novel protein contains a RGD cell attachment site. Therefore the 
novel protein is a new putative tumour-suppressor gene. 

The new protein can find application in modulating/blocking the cell cycle and in the therapy 
of tumours. 



strong similarity to human N33 tumor suppressor gene 
complete cDNA, complete cds, EST hits, 

potential start at Bp 30 matches kozak consensus ANCatgG 
potential transmembran protein (4 TM) 

similarity to yeast OST3p (oligosaccharyltransf erase gamma chain) 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2241 bp 

Poly A stretch at pos . 2221, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 



TGGGACTTAT 
TGGTGTGTCT 
TCCCTCAGCC 
TTAGTCAGCT 
GGAGACAAGT 
TATCGTCATG 
AGCAAGCTGA 
AGTGCATTCA 
CTCTGATGTA 
ACTTTCCTGC 
GTGCGGGGTT 
TGATGTCAAT 
TGTTGGGATT 
AGTAATATGG 
GTGTTTTGTG 
GACCACCATA 
CATGGAAGCA 
GTTTAATGGT 
CCTCTGACAT 
GGACTTGTTG 
ATATCATGGC 
ATATATAGAC 
TTGAAAAGAA 
GTGATTTAAA 
CAAGCAATCC 
AACCTTCTCT 
AGTATATTAT 
GCTCAAAACT 
CCAAAGATGG 
TACAGATAAC 
GATGTGTATA 
TCATGTGGTC 
CCCTCAGCAA 
CTCCAGCCTG 
TAAATACAGG 
TTTAGAAAGA 
ACCCATCTGT 
ACATGGCCTA 
GTACCTAAGA 
GCAGTGACTT 
GATCACGAGG 
CGTCTCTACT 
TAATCCCAGC 
AGATGGAGGT 
AACAGAGCGA 



AGAAGGGAGA 
CTGTGACCAT 
TCTGCCCAAA 
GATGGAATGG 
TCCGTCGCCT 
TTCACTGCTC 
TGAAGAATTC 
CCAACAGGAT 
TTTCAGATGC 
AAAAGGGAAA 
TTTCAGCTGA 
ATTAGAGTGA 
GCTTTTGGCT 
AATTTCTCTT 
CTTGCTATGA 
TGCCCATAAG 
GTCAAGCCCA 
GGAGTTACCT 
GGATATTGGA 
TATTATTCTT 
TACCCATACA 
ACTGGAGTAC 
GAATGCAACT 
TAGTTAATCA 
TCTGTCAAAA 
TCCCAGTGAA 
AAAAATTGTA 
ACTTTAGTTA 
GGAAAGTAAG 
TACATTAGGA 
CTTTACGCAT 
TTCTGAAAAT 
GACAGTTGTT 
AGTGATAGAG 
ATTATAATTT 
TTTCAGATTC 
GATAAAAATA 
AAATGTTTCT 
GAAAAATAGG 
ACGCCTGTAA 
TCAGGAGTTC 
AAAAATATAA 
TGCACAGGAG 
TTCAGTGAGC 
GACTCCATCT 



GGAGCGAACA 
GGTGGTGGCG 
GAAAGAAGGA 
ACTAACAAAA 
TGTGAAAGCC 
TCCAACTGCA 
CAGATCCTGG 
ATTTTTTGCC 
TAAACATGAA 
CCCAAACGGG 
GCAGATTGCC 
TTAGACCCCC 
GTTATTGGTG 
TAATAAAACT 
CATCTGGTCA 
AATCCCCACA 
GTTTGTAGCT 
TAGGAATGGT 
AAGCGAAAGA 
CAGTTGGATG 
GCTTTCTGAT 
TGGAAATTGA 
TGTATATTCT 
TTTAACCAAA 
TCTGAGGTAT 
CTTTATGGAA 
AAACTACTAC 
ACTTGGTCAT 
TCCTGACCAG 
ATTCATTCTT 
CTTTCCTTTT 
GGAACACCAT 
TCTCCTCCTC 
TGAGACTCTG 
CTGCTTGAGT 
ATTCCATCTC 
TAGCTTAGTG 
ACAAATTAGA 
CTCAGTTAGA 
TCTCAGCACT 
GAGACCATCC 
AAATTAGCTG 
GCTGAGGCAC 
CGAGATCACG 
CAAAAAAAAA 



TGGCAGCGCG 
CTGCTCATCG 
GATGGTGTTA 
GACCTGTAAT 
CCACCGAGAA 
TAGACAGTGT 
CAAACTCCTG 
ATGGTGGATT 
TTCAGCTCCA 
GTGATACATA 
CGGTGGATCG 
AAATTATGCT 
GACTTGTGTA 
GGATGGGCTT 
AATGTGGAAC 
CGGGACATGT 
GAAACACACA 
GCTTTTGTGT 
TAATGTGTGT 
CTCTCTATTT 
GAGTTAAAAA 
AAAACGAAAA 
GTATTACCTC 
GAAGATGTGT 
TTGAAAATAA 
CATTTAATTT 
TTTGTTTTAG 
CTGATCTTAT 
GTGTTCCCAC 
AGCTTCTTCA 
GAGTAGAGAA 
TCTTCAGAGC 
CTTGCATATT 
TCTCAAAAAA 
ATGGTGTTAA 
CTTAGTTTTC 
CTAAAATCAG 
GTTTGTCACT 
AAAGGACTCC 
TTGGGAGGCC 
TGGCCAACAT 
GGTGTGGTGG 
GAGAATCACT 
CCACTGCACT 
AAAAAAAAAA 



TTGGCGGTTT 
TTTGCGACGT 
TCAGAAAAGG 
AAGAATGAAT 
ATTACTCCGT 
GTCGTTTGCA 
GCGATACTCC 
TTGATGAAGG 
ACTTTCATCA 
TGAGTTACAG 
CCGACAGAAC 
GGTCCCCTTA 
TCTTCGAAGA 
TTGCAGCTTT 
CATATAAGAG 
GAATTATATC 
TTGTTCTTCT 
GAAGCTGCTA 
GGCTGGTATT 
TTAGATCTAA 
GGTCCCAGAG 
TCGTGTGTGT 
TTTTTTTCAA 
AGTGCCTTAA 
TTATCCTCTT 
AGTACAATTA 
TTAGAACAAA 
ATTGCCTTAT 
ATATGCCTGT 
TCTTTGTGTG 
ATTATGTGTG 
ACACGTCTAG 
TCCTACTGCG 
AAAGTATCTC 
CTACCTTGTA 
TTTTAAGGTG 
TGTAACTTAT 
TATTCCATTT 
CTGGCCAGGC 
AAGGCAGGCA 
GGTGAAACCC 
CAGGAGCCTG 
TGAACTCAGG 
CCAGCCTGGC 
A 
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BLAST Results 



No BLAST result 

Medline entries 



96299740: 

Structure and methylation-associated silencing of a gene 
within a homozygously deleted region of human chromosome 
band 8p22. 

97243398: 

Tumour-suppressor genes in prostatic oncogenesis: a 
positional approach. 

98334474 : 

Concordant methylation of the ER and N33 genes in 
glioblastoma multiforme. 



Peptide information for frame 3 



ORF from 30 bp to 1034 bp; peptide length: 335 
Category: strong similarity to known protein 



1 MAARWRFWCV SVTMVVALLI VCDVPSASAQ RKKEMVLSEK VSQLMEHTNK 

51 RPVIRMNGDK FRRLVKAPPR NYSVIVMFTA LQLHRQCVVC KQADEEFQIL 

101 ANSWRYSSAF TNRIFFAMVD FDEGSDVFQM LNMNSAPTFI NFPAKGKPKR 

151 GDTYELQVRG FSAEQIARWI ADRTDVNIRV IRPPNYAGPL MLGLLLAVIG 

201 GLVYLRRSNM EFLFNKTGWA FAALCFVLAM TSGQMWNHIR GPPYAHKNPH 

251 TGHVNYIHGS SQAQFVAETH IVLLFNGGVT LGMVLLCEAA TSDMDIGKRK 

301 IMCVAGIGLV VLFFSWMLSI FRSKYHGYPY SFLMS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf br2_2kl4 , frame 3 

TREMBL : RNAF8 5 54_1 gene: "IAG2"; product: "implantation-associated 
protein"; Rattus norvegicus implantation-associated protein (IAG2) 
mRNA, partial cds . , N = 1, Score = 1560, P = 3.4e-160 

PIR:G02297 gene N33 protein - human, N = 1, Score = 1256, P = 5.6e-128 

TREMBL : HSN33S1 1_1 gene: "N33"; product: "N33 protein form 2"; Human 
N33 protein form 2 (N33I gene, exon 11 and complete cds., N = 1, Score 
= 1252, P = 1.5e-127 



>TREMBL : RNAF8554_1 gene: "IAG2"; product: "implantation-associated protein"; 

Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds. 
Length = 308 

HSPs : 



Score = 1560 (234.1 bits), Expect = 3.4e-160, P = 3.4e-160 
Identities = 295/307 (96%), Positives = 299/307 (97%) 



Query : 


29 


AQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDKFRRLVKAPPRNYSVIVMFTALQLHRQCV 


88 






AQRKKE VL EKV QLMEWTN+RPVIRMNGDKFR LVKAPPRNYSVI VMFTALQLHRQCV 




Sbjct: 


2 


AQRKKEKVLVEKVIQLMEWTNQRPVIRMNGDKFRPLVKAPPRNYSVIVMFTALQLHRQCV 


61 


Query : 


89 


VCKQADEEFQILANSWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPAKGKP 


148 




VCKQADEEFQILAN WRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFP KGKP 




Sbjct : 


62 


VCKQADEEFQILANFWRYSSAFTNRIFFAMVDFDEGSDVFQMLNMNSAPTFINFPPKGKP 


121 


Query : 


149 


KRGDTYELQVRGFSAEQI ARWI ADRTDVNIRVI RPPNYAGPLMLGLLL AVI GGLVYLRRS 


208 




KR DTYELQVRGFSAEQIARWIADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 




Sbjct : 


122 


KRADTYELQVRGFSAEQI ARWI ADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 


181 


Query: 


209 


NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 


268 






NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 
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Sbjct: 182 NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 241 

Query: 269 THIVLLFNGGVTLGMVLLCEAATSDMDIGKRKIMCVAGIGLVVLFFSWMLSIFRSKYHGY 328 

THIVLLFNGGVTLGMVLLCEAA SDMDIGKR++MC+AGIGLVVLFFSWMLS IFRSKYHGY 
Sbjct: 242 TH I VLLFNGGVTLGMVLLCEAAASDMDTGKRRMMC I AGIGLVVLFFSWMLS IFRSKYHGY 301 

Query: 329 PYSFLMS 335 

PYSFLMS 
Sbjct: 302 PYSFLMS 308 



Pedant information for DKFZphfbr2_2kl4, frame 3 



Report for DKFZphf br2_2kl4 . 3 



[LENGTH] 335 

[MWJ 38036.83 

[plj 9.68 

[HOMOL] TREMBL: RNAF85541 gene: "IAG2"; product: "implantation-associated protein"; 
Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds . le-161 

[ FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOR085w] 

4e-14 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YOR085w] 4e-14 

[FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YOR085w] 4e-14 

[EC] 2.4.1.119 Dolichyl-diphosphooligosaccharide--protein glycosyltransf erase le-12 



[PIRKW] 


glycosyltransf erase le 


-12 


[PIRKW] 


transmembrane protein 


6e-69 


[ PIRKW] 


hexosyltransf erase le- 


12 


[PROSITE] 


RGD 1 




[PROSITE] 


MYRISTYL 4 




[PROSITE] 


AMI DAT I ON 1 




[ PROSITE] 


CK2 PHOSPHO SITE 


2 


[PROSITE] 


PKC PHOSPHO SITE 


4 


[PROSITE] 


ASN GLYCOSYLATION 


2 


[KW] 


SIGNAL PEPTIDE 30 




[KW] 


TRANSMEMBRANE 4 




[KW] 


LOW_COMPLEXITY 5.97 % 



SEQ MAARWRFWCVSVTMVVALLIVCDVPSASAQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDK 

SEG 

PRD cccceeeeeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccceeeeecccc 

MEM 



SEQ FRRLVKAPPRNYSVIVMFTALQLHRQCVVCKQADEEFQILAKSWRYSSAFTNRIFFAMVD 

SEG 

PRD ceeeeeccccccceeeehhhhhhccceeeehhhhhhhhhhhhhcccccccccceeeeeec 

MEM 



SEQ FDEGSDVFQMLNMNSAPTFINFPAKGKPKRGDTYELQVRGFSAEQIARWIADRTDVNIRV 

SEG 

PRD cccccceeeecccccccceeeccccccccccceeeeeeeccchhhhhhhhhhhhheeeee 

MEM M 



SEQ IRPPNYAGPLMLGLLLAVIGGLVYLRRSNMEFLFNKTGWAFAALCFVLAMTSGQMWNHIR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD eccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeec 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ GPPYAHKNPHTGHVNYIHGSSQAQFVAETHIVLLFNGGVTLGMVLLCEAATSDMDIGKRK 

SEG 

PRD ccccccccccccceeeecccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IMCVAGIGLVVLFFSWMLSI FRSKYHGYPYSFLMS 

SEG 

PRD eeeecccceeeeeehhhhhhhhhhccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphfbr2_2kl4 . 3 



PS00001 
PS00001 
PS00005 
PS00005 



71->75 
215->219 
38->41 
48->51 



ASN_GLYCOS YLATION 
ASN_GLYCOSYLATION 
PKC_PH0SPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
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103- 


->106 


rr\v_- r nUij rnu 


SITE 




p<^ ft nnn r 


111 


->1 14 


PKC~ PHO^ PHO 


SITE 




PS00006 


208 


->212 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


292 


->296 


CK2 PHOSPHO" 


>ITE 


PDOC00006 


PS00008 


193 


->199 


MYRISTYL 




PDOC00008 


PS00008 


233 


->239 


MYRISTYL 




PDOC00008 


PS00008 


259 


->265 


MYRISTYL 




PDOC00008 


PS00008 


278 


->284 


MYRISTYL 




PDOC00008 


PS00009 


296 


->300 


AMIDATION 




PDOC00009 


PS00016 


150 


->153 


RGD 




PDOC00016 



(No Pfara data available for DKFZphfbr2_2kl4 . 3) 
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DKFZphfbr2_3cl8 



group: nucleic acid management 

DKFZphfbr2_3cl8 encodes a novel 4 48 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicase and RNA-dependent ATPase 
from the DEAD box family 
group helicases 

Summary DKFZphf br2_3cl8 encodes a novel 448 amino acid protein with 
similarity to DEAD-box subfamily ATP-dependent RNA helicases. 
Deletion of the yeast homolouge DBP5 is lethal. 



strong similarity to RNA helicase and RNA-dependent ATPase from the 
DEAD box family 

complete cDNA, EST hits 
complete cds ATG at Bp 109 

Sequenced by AGOWA 

Locus: /map="87.50 cR from top of Chrl6 linkage group" 
Insert length: 1713 bp 

Poly A stretch at pos . 1696, no polyadenylation signal found 



■ 1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801' 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 



TGGGGTAGTG 
ATCCCTCGTG 
CTGGGACCAT 
GCTGCGGCTG 
ACCAGATACC 
CAGATGAAGA 
CTGATCAGAA 
GCGGGATCCA 
GGCTCCCACA 
GCTGCCTTCG 
CCCCCAGTGT 
GAAAAGTGAT 
TATGCTGTTC 
GATTGTCATT 
AGTTCATTGA 
GTCATGATAG 
GATGCTGCCC 
ACTCTGTGTG 
AAACTGAAGC 
CCTGTGCAGC 
GGGCCATCAC 
GCTAGTTGGC 
GCTGAGTGGG 
TCCGAGAGGG 
GGCATTGATG 
GGACAAGGAC 
GCACGGGCCG 
AAGCACAGCA 
GATAGAAAGA 
ACTGAGAAGC 
CAGGAGACAA 
ACGGCACAAG 
CTTGACAAAA 
ACACAACCTT 
AAAAAAAAAA 



GGGCTGGAGC 
CCATCCCTCG 
GGCCACTGAC 
AGTCGTTGAG 
AATGGTGCTG 
AGAGAAAGAG 
GCAACCTTGT 
AACTCCCCTC 
GAACTTAATT 
TGCTGGCCAT 
CTATGTCTCT 
TGAACAAATG 
GAGGCAATAA 
GGCACCCCTG 
TCCCAAGAAA 
CCACTCAGGG 
AGGAACTGCC 
GAAGTTTGCC 
GTGAGGAAGA 
AGCAGAGACG 
CATTGCTCAA 
TGGCAGCAGA 
GAGATGATGG 
CAAAGAGAAG 
TTGAACAAGT 
GGGAATCCTG 
CTTTGGCAAG 
TGAACATCCT 
TTGGACACAG 
TCCACCAGCC 
GTGCGTTCAG 
TAGAGAGAAA 
ATGTATGCAA 
GGAAGATTAG 
AAA 



AGAGCCTGCC 
AATCCACCAG 
TCATGGGCCC 
CAACTTGCAT 
TTGTCAAGAC 
GACAGAGCTG 
TGATAACACA 
TGTACTCGGT 
GCCCAATCTC 
GCTTAGCCAA 
CCCCAACGTA 
GGCAAATTTT 
ATTGGAAAGA 
GGACTGTGCT 
ATCAAGGTGT 
CCACCAAGAT 
AGATGCTGCT 
CAGAAAGTGG 
GACCCTGGAC 
AGAAGTTCCA 
GCCATGATCT 
GCTCTCAAAA 
TGGAACAGAG 
GTTTTGGTGA 
GTCTGTCGTC 
ACAATGAGAC 
AGGGGCCTGG 
GAACAGAATC 
ATGATTTGGA 
ACTGATGCCA 
GGCACAGGCC 
CTACCTACCT 
ATGATGGGGG 
GCATGAATAC 



GCGAACCCCC 
CACGAGCGTC 
TGGCGGTGGA 
CTTAAGGAAG 
CAATGCCAAT 
CCCAGTCCTT 
AACCAAGTGG 
GAAGTCTTTT 
AGTCTGGTAC 
GTAGAACCTG 
TGAGCTCGCC 
ACCCTGAACT 
GGCCAGAAGA 
GGACTGGTGC 
TTGTTCTGGA 
CAGAGCATCC 
TTTCTCCGCC 
TCCCAGACCC 
ACCATCAAGC 
GGCCTTGTGT 
TCTGCCATAC 
GAAGGCCACC 
GGCTGCAGTG 
CCACCAACGT 
ATCAACTTTG 
CTACCTGCAC 
CAGTGAACAT 
CAGGAGCATT 
CGAGATTGAG 
GCCCTGGCAC 
CCGACATCAC 
CACTTCAAAT 
ATGGTAGAAA 
ACAGAGATTT 



GGAGCCCACG 
CCACCCGCGC 
CGAGCAGGAA 
AGAAAATCAA 
GCAGAGAAGA 
ACTCAACAAG 
AAGTCCTGCA 
GAAGAGCTTC 
TGGTAAAACA 
CAAACAAATA 
CTCCAAACAG 
GAAGCTAGCT 
TCAGTGAGCA 
TCCAAGCTCA 
TGAGGCTGAT 
GCATCCAGAG 
ACCTTTGAAG 
AAACGTTATC 
AGTACTATGT 
AACCTCTACG 
TCGCAAAACA 
AGGTGGCTCT 
ATTGAGCGCT 
GTGTGCCCGC 
ATCTTCCCGT 
CGGATCGGGC 
GGTGGACAGC 
TTAATAAGAA 
AAAATAGCCA 
TGCCCCTGCA 
CCCAAGGACA 
TATGTTTGGA 
AAAATTATTT 
ACCTTTAAAA 



BLAST Results 
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Entry G3649S from database EMBL: 
SHGC-5 30 94 Human Homo sapiens STS CDNA. 
Length = 459 
Minus Strand HSPs: 

Score = 1693 (254.0 bits), Expect = 2.8e-70, P = 2.8e-70 
Identities = 369/387 (95%), Positives = 369/387 (95%) 

Entry G44014 from database EMBLNEW: 

WIAF-3643-STS Human THudson SANGER Homo sapiens STS genomic, sequence 
tagged site. 

Score = 901, P = 2.3e-35, identities = 183/185 



Medline entries 



94192995: 

Gene 1994 Mar 25; 140 (2) : 171-177 

Mouse erythroid cells express multiple putative RNA helicase genes 
exhibiting 

high sequence conservation from yeast to mammals. 



Peptide information for frame 1 



ORF from 109 bp to 1452 bp; peptide length: 448 
Category: strong similarity to known protein 



1 MATDSWALAV DEQEAAAESL SNLHLKEEKI KPDTNGAVVK TNANAEKTDE 

51 EEKEDRAAQS LLNKLIRSNL VDNTNQVEVL QRDPNSPLYS VKSFEELRLP 

101 QNLIAQSQSG TGKTAAFVLA MLSQVEPANK YPQCLCLSPT YELALQTGKV 

151 IEQMGKFYPE LKLAYAVRGN KLERGQKISE QIVIGTPGTV LDWCSKLKFI 

201 DPKKIKVFVL DEADVMIATQ GHQDQSIRIQ RMLPRNCQML LFSATFEDSV 

251 WKFAQKVVPD PNVIKLKREE ETLDTIKQYY VLCSSRDEKF QALCNLYGAI 

301 TIAQAMIFCH TRKTASWLAA ELSKEGHQVA LLSGEMMVEQ RAAVIERFRE 

351 GKEKVLVTTN VCARGIDVEQ VSVVINFDLP VDKDGNPDNE TYLHRIGRTG 

401 RFGKRGLAVN MVDSKHSMNI LNRIQEHFNK KIERLDTDDL DEIEKIAN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3cl8, frame 1 

PIR: 149731 RNA helicase - mouse, N = 2, Score = 1758, P = 3.8e-223 

TREMBL:AF005239_1 gene: "Dbp80"; product: "DEAD-box helicase"; 
Drosophila melanogaster DEAD-box helicase (Dbp80) mRNA, complete cds . , 
N = 2, Score = 1142, P = 1.8e-125 

SWISSPROT: YB66_SCHPO PUTATIVE ATP -DE PENDENT RNA HELICASE C12C2.06., N = 
2, Score = 911, P = 5.5e-103 

PIR:S66920 probable RNA helicase CA5/6 - yeast (Saccharorayces 
cerevisiae), N = 2, Score = 887, P - 1.9e-98 

>PIR:I49731 RNA helicase - mouse 
Length = 478 

HSPs: 

Score = 1758 (263.8 bits), Expect = 3.8e-223, Sum P(2) = 3.8e-223 
Identities = 338/349 (96%), Positives = 349/349 (100%) 

Query: 100 PQNL I AQSQSGTGKTAAFVLAMLSOVEPANKYPQCLCLSPT YELALQTGKV I EQMGKFYP 159 

PQNL I AQSQSGTGKTAAFVLAMLS+VEPA++ YPQCLCLSPT YELALQTGKV I EQMGKF+P 
Sbjct: 130 PQNLIAQSQSGTGKTAAFVLAMLSRVEPADRYPQCLCLSPTYELALQTGKVIEQMGKFHP 189 

Query: 160 ELKLAYAVRGNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 219 

ELKLAYAVRGNKLERGQK+SEQI VIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 
Sbjct: 190 ELKLAYAVRGNKLERGQKVSEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 249 

Query: 220 QGHQDQSIRIQRMLPRNCQMLLFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQY 279 
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QGHQDQSIRIQR++PRNCQMLLFSATFEDSVWKFAQKVVPDPN+IKLKREEETLDTIKQY 



Sbjct : 


250 


QGHQDQSIRIQRIVPRNCQMLLFSATFEDSVWKFAQKVVPDPNIIKLKREEETLDTIKQY 


309 


Query : 


280 


YVLCSSRDEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 


339 






YVLC++R+EKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 




Sbjct: 


310 


YVLCNNREEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 


369 


Query : 


340 


QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 


399 






QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSWINFDLPVDKDGNPDNETYLHRIGRT 




Sbjct: 


370 


QRAAVIERFREGKEKVLVTTNVCARGIDVEQVSWINFDLPVDKDGNPDNETYLHRIGRT 


429 


Query : 


400 


GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 448 








GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 




Sbjct : 


430 


GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDEIEKIAN 478 




Score 


= 419 


(62.9 bits), Expect - 3.8e-223, Sum P(2) = 3.8e-223 




Identities = 


= 94/136 (69%), Positives = 104/136 (76%) 




Query : 


1 


MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 


60 






MATDSWALAVDEQEAA +S+S+L +KEEK K DTNG V+KT+ AEKT+EEEKEDRAAQS 




Sbjct: 


1 


MATDSWALAVDEQEAAVKSMSSLQIKEEKAKSDTNG-VIKTSTTAEKTEEEEKEDRAAQS 


59 


Query : 


61 


LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRL-PQNL IAQSQSGTGKTAA 


116 






LLNKLIRSNLVDNTNQVEVLQRDP+SPLYSVKSFEELRL PQ L A + K 




Sbjct : 


60 


LLNKLIRSNLVDNTNQVEVLQRDPSSPLYSVKSFEELRLKPQLLQGVYAMGFNRPSKIQE 


119 


Query: 


117 


FVLAMLSQVEPANKYPQ 133 








L M+ P N Q 




Sbjct : 


120 


NALPMMLAEPPQNLIAQ 136 





Pedant information for DKFZphfbr2_3cl8, frame 1 



Report for DKFZphf br2_3cl8 . 1 



[LENGTH] 

[MW] 

tpl] 

[HOMOL] 

[FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

YJL138C] le- 

[FUNCAT] 

(FUNCAT] 

[ FUNCAT] 

[ FUNCAT] 

influenzae, 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

(BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 



448 

50490.07 
5.83 

PIR: 149731 RNA helicase - mouse 



0.0 



98 
04 



classification not yet clear-cut [S. cerevisiae, YOR046c] 
01.04 rrna processing (S. cerevisiae, YDR021w] 2e-65 



le-102 



30.10 nuclear organization 



[S. 



30.03 organization of cytoplasm 



cerevisiae, YDR021w] 2e-65 
[S. cerevisiae, YJL138c] 



63 



le-63 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



04.99 other transcription activities [S. cerevisiae, YDL160c) 2e-49 

j mrna translation and ribosome biogenesis [ H . influenzae, HI0231 RNA] 9e- 



04.05.03 mrna processing (splicing) 
1 genome replication, transcription, 
HI0892] 3e-39 

06.10 assembly of protein complexes 
09.01 biogenesis of cell wall 
04.05.01.07 chromatin modification 
30.16 mitochondrial organization 
r general function prediction 
11.10 cell death [S. cerevisiae 

03.19 recombination and dna repair 



[S. cerevisiae, YDL084w) le-43 
recombination and repair 



[H. 



(S. 
[S. 
[S. 
[S. 
[M. 



cerevisiae, 
cerevisiae, 
cerevisiae, 
cerevisiae, 
jannaschii, 
YMR190c] le-05 
[S. cerevisiae, 



YLL00 8w] 
YJL033w] 
YMR290c] 
YDR194C] 



le-35 
9e-27 
8e-26 
le-23 



99 unclassified proteins [S. cerevisiae, YIR002c] 7e-04 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-64 
RNA binding le-64 
DEAD box 4e-64 
transmembrane protein 3e-22 
DNA binding 2e-32 
ATP le-101 

purine nucleotide binding 4e-64 

P-loop le-101 

hydrolase 4e-43 

protein biosynthesis le-64 

ATP binding 2e-35 

WW repeat homology 3e-29 

translation initiation factor eIF-4A le-64 

DEAD/H box helicase homology le-101 

DNA helicase recG 2e-06 

unassigned DEAD/H box helicases le-101 

ATP-dependent RNA helicase DBP1 9e-33 



MJ1401] 9e-08 
YMR190C] le-05 
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T QI1D CZiM 1 
L 5Urf iAl v J J 


ATP-dependent RNA 


helicase DHH1 4e-48 




tobacco ATP-dependent 


RNA helicase DB10 3e-29 


[PROSITE] 


MYRISTYL 5 






[PROSITE] 


AMI DATION 1 






[PROSITE] 


CK2 PHOSPHO SITE 




6 


[PROSITE] 


GLYCOSAMINOGLYCAN 




1 


[PROSITE] 


PKC PHOSPHO SITE 




8 


[PROSITE] 


ASN GLYCOSYLATION 




1 


[PFAM] 


Helicases conserved C- 


■terminal domain 


[PFAM] 


DEAD and DEAH box 


helicases 


[KW] 


Alpha_Beta 







SEQ MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 

PRD ccchhhhhhhhhhhhhhhhcccchhhhhhhcccccceeeeeehhhhhhhhhhhhhhhhhh 

SEQ LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGKTAAFVLA 

PRD hhhhhhhhhcccccceeeeeeccccccceeehhhhhhhhccceeeeeccccccchhhhhh 

SEQ MLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYPELKLAYAVRGNKLERGQKISE 

PRD hhhhhhhhhccceeeeeccchhhhhhhhhhhhhhccccccccceeeccccchhhhhhhhe 

SEQ QIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIATQGHQDQSIRIQRMLPRNCQML 

PRD eeeecccccchhhhhhhhhhcccceeeeeecchhhhhhhccchhhhhhhhhhccccceee 

SEQ LFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQYYVLCSSRDEKFQALCNLYGAI 

PRD eeeccccchhhhhhhhhhcccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhch 

SEQ TIAQAMTFCHTRKTASWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTN 

PRD hhhhhheeecchhhhhhhhhhhhhccceeeeecccchhhhhhhhhhhhccccceeeeeec 

SEQ VCARGIDVEQVSVVINFDLPVDKDGNPDNET YLHRIGRTGRFGKRGLAVNMVDSKHSMNI 

PRD ccccccceeeeeeeeecccccccccccccceeeeeecccccccccceeeeeeeccchhhh 

SEQ LNRIQEHFNKKIERLDTDDLDEIEKIAN 

PRD hhhhhhhhhhhccccccccchhhhhccc 



Prosite for DKFZphfbr2_3cl8 . 1 



PS00001 


389- 


>393 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


109- 


>113 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


9C 


>->93 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


111- 


>114 


PKC PHOSPHO" 


STTE 


PDOC00005 


PS00005 


147- 


>1S0 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


226- 


>229 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


275- 


>278 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


284- 


>287 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


311- 


>314 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


399- 


>402 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00006 


4E 


->52 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


92 


->97 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


123- 


>127 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


189- 


>193 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


245- 


>249 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00006 


284- 


>288 


CK2 PHOSPHO 


"SITE 


PDOC00006 


PS00008 


110- 


>116 


MYRISTYL 




PDOC00008 


PS00008 


175- 


>181 


MYRISTYL 




PDOC00008 


PS00008 


185- 


>191 


MYRISTYL 




PDOC00008 


PS00008 


385- 


>391 


MYRISTYL 




PDOC00008 


PS00008 


406- 


>412 


MYRISTYL 




PDOC00008 


PS00009 


402- 


>406 


AMIDATION 




PDOC00009 



Pfam for DKFZphfbr2_3cl8 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 



DEAD and DEAH box helicases 

'gLpPWILRnlyeMGFEkPTPIQQqAIPilLeG. . . . RDVMACAQTGSGK 
++ ++ +N ++ P E+ +++A++Q+G+GK 

65 LIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGK 113 

TAAF1I PMLQHI DwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHM 
TAAF++ ML+++ + + PQ +L L+PT ELA+Q+ ++++++GK++ 
114 TAAFVLAMLSQVEPAN — KYPQ CLCLSPTYELALQTGKVIEQMGKFY 158 

nglRImd YGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIER. gtldLDr 
+ ++ + ++ ++ +++ +++ +IVI+TPG ++D + +D + + 
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Query 159 PELKLAYAVR GNKLERGQKISEQI VIGTPGTVLDWCSKLKFIDPKK 204 

HMM TeMLVMDEADRMLD . MGFIDQI RrlMrqIPMpwNRQTMMFSATMPdelqE 

I+++V+DEAD M+ +G +DQ RI R++P +N Q ++FSAT+ D++ + 
Query 205 IKVFVLDEADVMI ATQGHQDQS I RI QRMLP- -RNCQMLLFSAT FEDSVWK 252 

HMM LARrFMRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

+A ++ +P I ++++E T++ +IKQ+Y+ + + ++KF +LC+L++ 
Query 253 FAQKVVPDPNVIKLKREEETLD-TIKQYYVLCSSRDEKFQALCNLYG 298 



HMM_NAME Helicases conserved C-terrainal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
+L+ +L+++G +V+ + G M+ E+R ++++F++G+ +VL++T+V +R 
Query 316 SWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTNVCAR 364 

HMM GIDIPdVNHVINYDM. . . .PWNPEq. . YIQRIGRTgRIG* 

GID+++V++VIN+D+ + NP++ Y+4RIGRTGR+G 
Query 365 GIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFG 403 



Medline 

PMID: 10322435 

"Unwinding RNA in : DEAD-box proteins and related families." de la Cruz J, Kressler D, Linder 
P 
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DKFZphfbr2_3f 16 
group: brain derived 

DKFZphfbr2_3f 16 encodes a novel 127 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1514 bp 

Poly A stretch at pos . 1454, polyadenylation signal at pos. 1434 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



GGGGGGACTG 
AGCGGCGCTG 
CCGTCCCCGC 
TGAAAGATCC 
GTGATTATTA 
CATGTGGATG 
TATGGGAAGA 
GAAGAAGAGC 
GGACCAAATC 
CTCTGGAAGA 
TTTGTTCCTG 
TTGGTGGATG 
TAATTGTAAA 
TTTTTGTTAG 
AAAAAGCTGT 
GTGAAGACAG 
ATTAAAAGAC 
CACTTGTTAA 
ACTGTTAAGC 
AGAATCTCCA 
AAGGCAAGTA 
GCCAGTCCAG 
TGTTACATCT 
TCACTGCATC 
AACCCATAAT 
TGCTTTTCCT 
AACAAAGCTA 
AAGCTGCCAT 
AAAAAAATTA 
ACTGAAAAAA 
AAAAAAAAAA 



GAGAAGGGAG 
CCTGTGGAGA 
TGCTGTGCAT 
AAGTCGCAGC 
ACGGTCATTC 
GAAAATGAAG 
AGAATTTATT 
ATGAATGGTT 
CAAGACCAGT 
TCTTGTGGTC 
GGGTGAAGTA 
TAGCACAATT 
AGCACTCTTG 
TCTTGCATGC 
CAAACATTTA 
CAAGGAAAGA 
CTAAACCTTA 
TGTCTAAACT 
ATATTICTCA 
TTTTCTGAAG 
TGTCATATTA 
CCTTTTCCGG 
GAGGAAGTAT 
ATAGCTATGC 
CCAGCTGAAC 
TGCTTTGTTA 
GAACAGTTTT 
ACGTGTTCAG 
TATTTTTTCA 
AAAAAAAAAA 
AAAA 



GCGGCGGGCG 
TCCGCGGAGG 
TGGGTTAAAA 
AGTACTAGCC 
TCATGAAGAT 
AAGAATTCAA 
GAACGCTGTT 
TATTCCAGCT 
TTAATGACCT 
AAGAGCAATC 
CGGAAATATT 
TCCACACTGT 
TCACTGTGTT 
TTAATAAAAG 
CTGAAAATAG 
AGCACCAGTC 
CCAAATTGTC 
TTAAAATCAG 
GACTTAAATT 
GTCTGTTAGT 
CTGAGGCTAC 
TATGTTATTG 
GTAATTTGAG 
CTGTATGGAG 
AAAGATACTG 
ACCATCACGA 
GGCTTCTTAA 
TGTGAATAGT 
AAAATATTTA 
AAAAAAAAAA 



AAGCGCACGT 
CCGACAGGAT 
ACGACAACCA 
CAAGCATCAT 
GACAATCCAT 
CAGACAAATA 
TCCAAGAAAT 
CGAGATCTCC 
TGTTATCAGT 
TGAATCCAAA 
TGAGTAGACG 
GAAGGCAGTA 
ACACTTATGC 
TGCTGAGACT 
AATTGGCCCC 
AAGTTGTGAA 
TTTTTTTGAG 
TACATTTAAT 
TGATTATGTC 
TAATTTGAGA 
AAGTTAGTCA 
TTAGAAATAT 
AATTGTAACT 
TCTAACATAI 
TAACATTATG 
GAGTCTGCAG 
ACTTCATATT 
GTTTAAGTTG 
AAAAAATAAA 
AAAAAAAAAA 



CGAGCGGGGG 
TCGTTGGCTG 
ACATCAGCCA 
CAATGAAGAT 
TTGCAGAGTA 
GAAGAGGAGT 
GCTGGAAGAG 
CACAAACTAT 
GAAGGCTCTT 
TGCAAAGGAG 
GGGCCCTCTT 
TTAGAAGACT 
ATTGCCAAAG 
GTTACTAAGT 
ATGGCTTGAT 
CAAGCACCAA 
GCTAATCTAT 
TTGAGTTCCA 
CCCATCAAAA 
TAATTTGTTA 
GCAGATGAGT 
TGAGTTCTAA 
TCTAAGGGAT 
GACCAATACC 
ATTTGAGTGG 
CACAACTTTT 
TGGGTAGGTT 
AAAATATTGT 
TAATAGTAGA 
AAAAGAAAAA 



No BLAST result 



No Medline entry 



BLAST Results 



Medline entries 



Peptide information for frame 3 



ORF from 150 bp to 530 bp; peptide length: 127 
Category: putative protein 



1 MKDPSRSSTS PSIINEDVII NGHSHEDDNP FAEYMWMENE EEFNROIEEE 
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51 LWEEEFIERC FQEMLEEEEE HEWFIPARDL PQTMDQIQDQ FNDLVI SEGS 
101 SLEDLVVKSN LNPNAKEFVP GVKYGNI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_3f 16, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_3f 16, frame 3 



Report for DKFZphf br2_3f 16. 3 



[LENGTH] 127 

[MW] 14998.41 

[pi] 4.04 

[BLOCKS] BL01269D 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 27.56 % 

SEQ MKDPSRSSTSPSIINEDVIINGHSHEDDNPFAEYMWMENEEEFNRQIEEELWEEEFIERC 

SEG xxxxxxxxxxxxxxxxxxxxxxx 



PRD ccccccccccccccccceeeecccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ FQEMLEEEEEHEWFI PARDLPQTMDQIQDQFNDLV I SEGSSLEDLVVKSN LNPNAKEFVP 

SEG xxxxxxxxxxxx 

PRD ' hhhhhhhhhhhhhccccccccchhhhhhhhhcceeeecccccceeeeecccccccccccc 

SEQ GVKYGNI 

SEG 

PRD ccccccc 



Prosite for DKFZphf br2_3f 16.3 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00006 100->104 CK2_PHOSPHO_SITE PDOC00006 

PS00008 121->127 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphf br2_3f 16 . 3) 
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DKFZphfbr2_3g8 



group: metabolism 

DKFZphfbr2_3g8 . 1 encodes a novel 178 amino acid protein with similarity to yeast ARD1 protein. 

In yeast, ARD1 and NAT1, are required for the expression of an N-terminal protein 
acetyltransf erase 1. NAT1 controls full repression of the silent mating type locus HML, 
sporulation and entry into GO. ARD1 is involved in the assembly of the NAT 1-complex. The new 
protein could be part of this or an other NAT complex. 

The new protein can find application modulating NAT assembly and action and therefore be 
important in metabolism of drugs and environmental mutagens. 



strong similarity to N-TERMINAL ACETYLTRANSFERASE COMPLEX ARD1 homolog 
complete cDNA, complete cds? start at Bp 40, EST hits 
Sequenced by AGOWA 
Locus: /map="20" 
Insert length: 1030 bp 

Poly A stretch at pos . 1013, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



TGGGCTTGGC 
ACGGGCCTTT 
ATCCACTTAC 
CACTGGCCAG 
GGGTTATATT 
ACGGGCACGT 
TTGGCTGCTA 
TGGGTTTTTT 
ACATGTACAA 
TATTCGGCCA 
AGCACTTTCC 
CTGTGAGGCC 
ATACTCTAGA 
GCTCTATTAG 
ATACAGGTTA 
CATACCTATT 
TAGGTCAGAA 
GTATGCTAGG 
AGAACCACTG 
TTGAAGCTTT 
CATTATTGAC 



GAACGGTCTT 
ACCTGCGACG 
AGAAACTTAT 
AGTATTTCAT 
ATGGGTAAAG 
CACAGCTCTG 
AACTTATGGA 
GTGGATCTCT 
GCAGTTGGGC 
GCAACGGGGA 
AGGGATACTG 
TGAAGACATT 
TGCTTTATGG 
GAGAAAAGTA 
TCAATTTATT 
AAAGCTGTTC 
GGAAACATAC 
GAAAAGACTT 
CTGCATATAT 
AAAAGCATAT 
TCCAAAAAAA 



CGGAAGCGGC 
ACCTGTTCCG 
GGGATTCCTT 
TGTTGCAGTG 
CAGAAGGCTC 
TCTGTTGCCC 
GTTACTAGAG 
TTGTAAGAGT 
TACAGTGTAT 
GCCTGATGAG 
AGAAGAAATC 
GAATAACCCT 
ACAATATTAT 
ATCATTTTAG 
TTAAATCTCA 
ATTGTAACAA 
CACTCTCATG 
GCTCCAGTCT 
TTGTTTTTAA 
ATGAAATGTA 
AAAAAAAAAA 



GGCGGCGCGA 
CTTCAACAAC 
TCTACCTACA 
GCACCTGGTG 
AGTAGCTAGG 
CAGAATTTCG 
GAGATTTCAG 
ATCTAACCAA 
ATAGGACGGT 
GACGCTTATG 
CATCATACCA 
GGGCAGTGGT 
TTTCATTGGA 
GTCTTAAAGA 
TTGTTTCCAG 
AATTCAATCA 
GTTCATAGTA 
CCTCCTCACT 
ATTTTGTATT 
TAAATCTAAG 



TGACCACGCT 
ATTAACTTGG 
ATACCTCGCC 
GAGAATTAAT 
GAAGAATGGC 
ACGCCTTGGT 
AAAGAAAGGG 
GTTGCAGTTA 
CATAGAGTAC 
ATATGAGGAA 
TTACCTCATC 
TCTTAGGCAG 
TGATTCTGGA 
CTTCAAGAAA 
TTAGCAATAT 
AAAAGGCAGC 
TTCACTGTAT 

TCTGTGCCTG 
GAACTGTTAA 
ATGTATAATA 



BLAST Results 



Entry HSG0101 from database EMBL: 
human STS SHGC-35956. 
Length = 401 
Minus Strand HSPs ; 

Score = 1417 (212.6 bits). Expect = 9.3e-58, P = 9.3e-58 
Identities = 301/311 (96%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 40 bp to 573 bp; peptide length: 178 
Category: strong similarity to known protein 



1 MTTLRAFTCD DLFRFNNINL DPLTETYGIP FYLQYLAHWP EYFIVAVAPG 
51 GELMGYIMGK AEGSVAREEW HGHVTALSVA PEFRRLGLAA KLMELLEEIS 



251 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



101 ERKGGFFVDL FVRVSNQVAV NMYKQLGYSV YRTVIEYYSA SNGEPDEDAY 
151 DMRKALSRDT EKKSIIPLPH PVRPEDIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3g8, frame 1 

TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 
acetyltransf erase complex subunit"; S.pombe chromosome III cosmid 
C16C4., N = 1, Score = 475, P = 3.2e-45 

SWISSPROT:ARDH_LEIDO N-TERMINAL ACETYLTRANSFERASE COMPLEX ARD1 SUBUNIT 
HOMOLOG., N = 1, Score = 451, P = l.le-42 

PIR:S69021 hypothetical protein YPR131C - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 382, P = 2.3e-35 



>TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12"; product: "putative n-terminal 

acetyltransf erase complex subunit"; S.pombe chromosome III cosmid cl6C4. 
Length = 180 

HSPs : 

Score = 475 (71.3 bits), Expect = 3.2e-45, P = 3.2e-45 
Identities = 96/165 (58%), Positives = 118/165 (71*) 

Query: l mttlraftcddlfrfnninldpltetygipfylqylahwpeyfivavapgge — LMGYIM 58 

MT R F DLF FNNINLDPLTET+ I FYL YL WP +V + + LMGYIM 
SbjCt: 1 MTDTRKFKATDLFSFNNINLDPLTETFNTSFYLSYLNKKPSLCVVQESDLSDPTLMGYIM 60 

Query: 59 GKAEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQV 118 

GK+EG+ +EWH HVTA++VAP RRLGLA +M+ LE + + FFVDLFVR SN + 
Sbjct: 61 GKSEGT--GKEWHTHVTAITVAPNSRRLGLARTMMDYLETVGNSENAFFVDLFVRASNAL 118 

Query: 119 AVNMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSI 165 

A++ YK LGYSVYR VI YYS +G+ DED++DMRK LSRD ++SI 
Sbjct: 119 AIDFYKGLGYSVYRRVIGYYSNPHCK-DEDSFDMRKPLSRDVNRESI 164 



Pedant information for DKFZphfbr2_3g8, frame 1 



Report for DKFZphfbr2_3g8 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
acetyltrans 
[FUNCAT] 
palmitylati 
[FUNCAT] 
4e-14 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[KM] 



178 

20338.24 
5.06 

TREMBL:SPCC16C4_12 gene: "SPCC16C4 . 12 " ; product: "putative n-terminal 
ferase complex subunit"; S.pombe chromosome III cosmid cl6C4. 7e-47 

06.07 protein modification (glycolsylation, acylation, myristylation, 
on, f arnesylation and processing) [S. cerevisiae, YPR131c] 6e-37 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YHR013c] 



cerevisiae , 
cerevisiae , 
jannaschii, 



30.03 organization of cytoplasm [S. 
03.22 cell cycle control and mitosis [S. 
r general function prediction [M. 
acyltransf erase le-12 
arrest-defective protein 1 le-12 

Escherichia coli peptide N-acetyltransf erase riml le-07 
CK2_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 3 
Alpha_Beta 



YHR013C] 4e-14 
YHR013C] 4e-14 
MJ1530] 6e-09 



SEQ MTTLRAFTCDDLFRFNNINLDPLTET YGI PFYLQYLAHWPEYFI VAVAPGGELMGYIMGK 

PRD ccccccccccchhhhhhcccccccccccchhhhhhcccccceeeeeeccccceeeehhhh 

SEQ AEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQVAV 

PRD hcccccccccccceeeeehhhhhhhhcchhhhhhhhhhhhhhccceeeeeeeecchhhhh 

SEQ NMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSIIPLPHPVRPEDIE 

PRD hhhhhhcccchhhhhhccccccccccchhhhhhhhhhhhhhhhhcccccccccccccc 
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PS00005 3->6 PKC_PHOSPHO_SITE PDOC00005 

PS00005 100->103 PKC_PH0SPHO_SITE PDOC00005 

PS00005 160->163 PKC_PHOSPHO_SITE PDOC00005 

PS00006 8->12 CK2_PH0SPH0_SITE PDOC00006 

PS00006 133->137 CK2_PHOSPHO_SITE PDOC00006 

PS00006 141->145 CK2_PH0SPH0_SITE PDOC00006 



(No Pfam data available for DKFZphf br2_3g8 . 1) 
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DKFZphfbr2_312 



group: brain derived 

DKFZphfbr2_312 encodes a novel 589 amino acid protein with weak similarity to S. cerevisiae 
ubiquitin-like protein DSK2 . 

Pfam predicts for this protein similarity to the ubiquitin family; No informative BLAST 
results; No predictive prosite or SCOP motive 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to ubiquitin-like protein DSK2 yeast 
complete cDNA, complete cds, EST hits 

Dsk2p is involved in spindel pole body SPB duplication, SPB - centomer 
strong similarity to HRIHFB2157 human mRNA 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2978 bp 

Poly A stretch at pos . 2958, polyadenylation signal at pos. 2924 



1 
51 
101 
151 
201 
251 
301 
351 
401 
" 451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



GGGGGGAGGA 
CTGAACACAC 
TGCACCGAGC 
GCTACTGAGG 
CTGGTGTCTG 
AGTCACCGCC 
GCGGTCCTCC 
GCCCCCGCGG 
GAAGACCCCG 
TCCAGCAGTT 
CAACTTGTGT 
GAGTCAGCAT 
CACAAAACAG 
GGCAATGTTA 
TGCTACTAGC 
TGAGTAGCTT 
ATGCAGCGAC 
AAATCCCTTT 
TAATTATGGC 
ATTAGTCATA 
TGCCAGGAAT 
CTTTGAGCAA 
ATGTACACAG 
TGGTGGTAAT 
GTAGTCAACC 
GCTCCACAGA 
TGTGGGTGGC 
CTACTGCGCC 
CCAGGAATGC 
GCAAAACATG 
GCCAGAATCC 
GCTGGAAATC 
CCTCCAACAA 
GAGCAATGCA 
ACGGAAGCCC 
AGGAAGCACT 
GTGAAAACAC 
TTTATTCAGC 
GAATCCAGAA 
GATTTTTGAA 
GATATCAATG 
GCATTTCTGT 
AACTTTAAAA 
GTTATAAACA 
GTGTGGGTTT 
ACTGCATGCA 
TTTATAGTTG 
TTTTTAAACA 
AAAGAAGCAA 
TGTGACTTTG 
TGATAATTTT 



AGCGGTGGCT 
GGCGGCTGCC 
TCCGGGGCCC 
CGGCGTGCTC 
CTGGCTCCTC 
GCCGCCGCCG 
GGGCTCCCAG 
CCGCTGCCTC 
AAGGAAAAGG 
TAAGGAAGAA 
TGATATTTGC 
GGAATTCATG 
GCCTCAGGAT 
CTACATCATC 
AACCCTTTTG 
GGGTTTGAAT 
AACTTTTGTC 
GTTCAGAGCA 
CAATCCACAA 
TGTTGAATAA 
CCAGCAATGA 
CCTAGAAAGC 
ATATTCAGGA 
CCATTTGCTT 
TTCCCGTACA 
CTTCCCAGAG 
ACTACTGGTA 
AAATTTGGTG 
AGAGCTTGTT 
TTGTCTGCCC 
TGACCTTGCT 
CTCAGCTTCA 
ATGCAGAATC 
GGCCTTGTTA 
CGGGCCTCAT 
GGAGGCTCTT 
AAGTCCCACA 
AGATGCTGCA 
GTCAGATTTC 
CCGTGAAGCA 
CAGCTATTGA 
ATCTTGAAAA 
TACCTGCTTT 
AACCCAATAT 
TTCTGTATTT 
TCACTTCTGC 
GGTGACCAGA 
TTAGCCTATG 
ATCATTTGCA 
GCATGCATTT 
GTTTTATTTG 



GCTGCGGATG 
GAGCGCCTGA 
CACACCCGCT 
TGCATTCTTC 
CTTGCTCGCC 
CCACAGCCAT 
GATAGCGCCG 
CGCGGAGCCC 
AGGAATTCGC 
ATCTCTAAAC 
TGGAAAAATT 
ATGGACTTAC 
CATTCAGCTC 
AACTCCTAAT 
GTTTAGGTGG 
ACTACCAACT 
TAACCCTGAA 
TGCTCTCAAA 
ATGCAGCAGT 
TCCAGATATA 
TGCAGGAGAT 
ATCCCAGGGG 
ACCAATGCTG 
CCTTGGTGAG 
GAAAATAGAG 
TTCATCAGCT 
GTACTGCCAG 
CCTGGAGTAG 
GCAACAAATA 
CCTACATGAG 
GCACAGATGA 
AGAACAAATG 
CTGATACACT 
CAGATTCAGC 
CCCAGGGTTT 
CGGGAACTAA 
GCAGGAACCA 
GGCTCTTGCT 
AGCAACAACT 
AACTTGCAAG 
AAGGTTACTG 
AATGTAATTT 
ATTTCATTTT 
GATGCATTTT 
TTCTTTTCTG 
ATTTATTGTA 
TTTTGTCCTG 
GTAGTAATTT 
CTCTATAATT 
TTGCAAACAA 
TATACAATAT 



TCGGTGTGAG 
CCCGGGCCTG 
ACGGTGGCCC 
GCTGTCCAGG 
TGCTCCCTCC 
GGCCGAGAGT 
CCGGAGCCGA 
AAA A T C AT G A 
CGTGCCCGAG 
GTTTTAAATC 
TTGAAAGATC 
TGTTCACCTT 
AGCAAACAAA 
AGTAACTCTA 
CCTTGGGGGA 
TCTCTGAACT 
ATGATGGTCC 
TCCTGACCTG 
T GAT AC AG AG 
ATGAGACAAA 
GATGAGGAAC 
GATATAATGC 
AGTGCTGCAC 
CAATACATCC 
ATCCACTACC 
TCCAGCGGCA 
TGGCACTTCT 
GAGCTAGTAT 
ACTGAAAACC 
AAGCATGATG 
TGCTGAATAA 
AGACAACAGC 
ATCAGCAATG 
AGGGTTTACA 
ACTCCTGGCT 
TGGATCTAAC 
CTGAACCTGG 
GGAGTAAATC 
GGAACAACTC 
CTCTAATAGC 
GGCTCCCAGC 
ATTTTTGATA 
GACTCTTGGA 
AAGGTGGAGT 
GAACAGTGGG 
ATTTTTTAAA 
CATCTGTCCA 
ATGTAGAATA 
TGTGGTACAG 
TGCTGTAAGA 
AGAGTATGCA 



CGAGCGGCGC 
CGCCAGAGCC 
TGCGCCCGTT 
CCTGCCGGCT 
TGCTTGCCTG 
GGTGAAAGCG 
AGGTGCTGGC 
AAGTCACCGT 
AATAGCTCCG 
ACATACTGAC 
AAGATACCTT 
GTCATTAAAA 
TACAGCTGGA 
CATCTGGTTC 
CTTGCAGGTC 
ACAGAGTCAG 
AGATCATGGA 
ATGAGACAGT 
AAATCCAGAA 
CGTTGGAACT 
CAGGACCGAG 
TTTAAGGCGC 
AAGAGCAGTT 
TCTGGTGAAG 
CAATCCATGG 
CTGCCAGCAC 
GGGCAGAGTA 
GTTCAACACA 
CACAACTGAT 
CAGTCACTAA 
TCCCCTATTT 
TCCCAACTTT 
TCAAACCCTA 
GACATTAGCA 
TGGGGGCATT 
GCCACACCTA 
ACATCAGCAG 
CTCAGCTACA 
AGTGCAATGG 
AACAGGAGGT 
CATCATAGCA 
ACGGCTCTTA 
ATTCTGTGCT 
ACAGTAAGAT 
AATTAAGGCT 
AACATCACCT 
GTTTATTTGC 
AAAGCATTAA 
TATTGCTTAT 
TTTATACTAC 
CATTTGGGAC 
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25 51 TGCATTTCTG GAAACATACT GCAATAGGCT CTCTGAGCAA AACACCTGTA 
2601 ACTAAAAAAG TGAAGATAAG AAAATACTCT TAAAGCTGAG TATTTCCTAA 
2651 TTGTATAGAA TCTTACAGCA TCTTTGACAA ACATCTCCCA GCAAAAGTGC 
27 01 CGGTTAGTCA GGTTTGTTGA AAATACAGTA GAAAAGCTGA TTCTGGTTAT 

27 51 CTCTTTAAGG ACAATTAATT GTACAGACAC ATAATGTAAC ATTGTCTCAA 

28 01 CATTCATTCA CAGATTGACT GTAAATTACC TTAATCTTTG TGCAGACTGA 
2851 AGGAACACTG TAGTATACCC CAAAGTGCAT TTGCCTAGGA CTTCTCAGCT 
2 901 TCTCCCATAG GTAGTTTAAC AGGCATTAAA ATTTGTAATT GAAATGTTGC 
2951 TTTCACTCAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 279 bp to 2045 bp; peptide length: 589 
Category: similarity to known protein 



1 MAESGESGGP 

51 AVPENSSVQQ 

101 TVHLVIKTQN 

151 GLGGLAGLSS 

201 NPDLMRQLIM 

251 MMRNQDRALS 

301 SNTSSGEGSQ 

351 SGTSGQSTTA 

401 RSMMQSLSQN 

451 LSAMSNPRAM 

501 NGSNATPSEN 

551 LEQLSAMGFL 



PGSQDSAAGA 
FKEEISKRFK 
RPQDHSAQQT 
LGLNTTNFSE 
ANPQMQQLIQ 
NLESIPGGYN 
PSRTENRDPL 
PNLVPGVGAS 
PDLAAQMMLN 
QALLQIQQGL 
TSPTAGTTEP 
NREANLQALI 



EGAGAPAAAA 
SHTDQLVLIF 
NTAGGN VTTS 
LQSQMQRQLL 
RNPEISHMLN 
ALRRMYTDIQ 
PNPWAPQTSQ 
MFNTPGMQSL 
NPLFAGNPQL 
QTLATEAPGL 
GHQQFIQQML 
atggdinaa: 



SAEPKIMKVT 
AGKILKDQDT 
STPNSNSTSG 
SNPEMMVQIM 
NPDIMRQTLE 
EPMLSAAQEQ 
SSSASSGTAS 
LQQITENPQL 
QEQMRQQLPT 
I PGFTPGLGA 
QALAGVNPQL 
ERLLGSQPS 



VKTPKEKEEF 
LSQHGIHDGL 
SATSNPFGLG 
ENPFVQSMLS 
LARNPAMMQE 
FGGNPFASLV 
TVGGTTGSTA 
MQNy.LSAPYM 
FLOQMQNPDT 
LGSTGGSSGT 
QNPEVRFQQQ 



BLASTP hits 



Entry CE1_1 from database TREMBL: 

"F15C11.2"; Caenorhabditis elegans cosmid VF15C11L 
Length = 293 

Score = 454 (159.8 bits), Expect = 4.4e-43, P = 4.4e-43 
Identities - 81/162 (50%), Positives - 113/162 (69%) 

Entry S54583 from database PIR: 

ubiquitin-like protein DSK2 - yeast (Saccharomyces cerevisiae) 
Length = 373 

Score = 278 (97.9 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 100/307 (32%), Positives = 155/307 (50%) 

Entry AB015344_1 from database TREMBLNEW: 

gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 mRNA, partial cds . 
Score = 1135, P = 3.6e-115, identities = 227/301, positives = 253/301 



Alert BLASTP hits for DKFZphfbr2_312, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_312, frame 3 



Report for DKFZphf br2_312 . 3 



[LENGTH] 589 

[MW] 62489.22 

[pi] 5.02 

[HOMOL] TREMBL :AB015344_1 gene: "HRIHFB2157"; Homo sapiens HRIHFB2157 mRNA, partial 
cds. le-121 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YMR276w] 2e-17 
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[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YMR276w] 2e-17 

[BLOCKS] BL00299 Ubiquitin family proteins 

[SUPFAM] unassigned ubiquitin-related proteins 5e-16 

[SUPFAM] ubiquitin homology 5e-16 

[PROSITE] MYRISTYL 24 

[PROSITE] CK2_PHOSPHO_SITE 9 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASNGLYCOSYLATION 7 

[PFAM] Ubiquitin family 

[KW] Irregular 

[KW] 3D 

[KW] LOW_C0MPLEXITY 23.43 % 

SEQ MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQ 

SEG . . xxxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxx 

laarA CEEEEEETTTCEEEECTTTTBHHH 



SEQ FKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQT 

SEG 

laarA HHHHHHHHHCCCGGGEEEEETTEECTTTTBGGGGCCTTTTEEEEEBC 

SEQ NTAGGNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSSLGLNTTNFSELQSQMQRQLL 

SEG . . . xxxxxxxxxxxxxxxxxxxxxx . . xxxxxxxxxxxxxxxx 

laarA 

SEQ SNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEISHMLNNPDIMRQTLE 

SEG 

laarA 

SEQ LARNPAMMQEMMRNQDRALSNLESIPGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLV 

SEG 

laarA 

SEQ SNTSSGEGSQPSRTENRDPLPHPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ PNLVPGVGASMFNTPGMQSLLQQITENPQLMQNMLSAPYMRSMMQSLSQNPDLAAQMMLN 

SEG 

laarA 

SEQ NPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALLQIQQGLQTLATEAPGL 

SEG 

laarA 

SEQ IPGFTPGLGALGSTGGSSGTNGSNATPSENTSFTAGTTEPGHQQFIQQMLQALAGVNPQL 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ QNPEVRFQQQLEQLSAMGFLNREANLQALIATGGDINAAIERLLGSQPS 

SEG 

laarA 
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PS00001 


55->59 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


126->130 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00001 


136->140 


ASN_ 


GLYCOSYLATION 


PDOC00001 
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164->168 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


167->171 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 
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asn" 


"GLYCOSYLATION 


PDOC00001 
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"GLYCOSYLATION 
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PDOC00002 
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40->43 
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PHOSPHO SITE 
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PKC" 
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PDOC00005 


PS00005 


66->69 


PKC" 
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PS00006 


43->47 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


71->75 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


181->185 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


200->204 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


260->264 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


304->308 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 
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CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


506->510 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


572->576 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00008 


8->14 


MYRISTYL 


PDOC00008 


PS00008 


12->18 


MYRISTYL 


PDOC00008 
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MYRISTYL 


PDOC00008 


PS00008 


366- 


->372 


MYRISTYL 


PDOC00008 


PS00008 


479 


->485 


MYRISTYL 


PDOC00008 


PS00008 


489- 


->495 


MYRISTYL 


PDOC00008 


PS00008 
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PDOC00008 
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495- 
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499' 
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psooooa 
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->579 
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Pfam for DKFZphf br2_312 . 3 



HMM_NAME Ubiquitin family 

HMM *MQIFVKTLtGRTcTFEVepQEtVeqIKQHIeekEGIPPeQQRLIFaGRQ 

M ++VKT + +F V+++ V Q+K+ 1+ +Q +LIFAG+ 

Query 37 MKVTVKTPK-EKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKI 

HMM LEDeKTLsDYNIggeSTLHLVIR* 

L D TLS+++I + T+HLV++ 
Query 85 LKDQDTLSQHGIHDGLTVHLVIK 107 
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DKFZphfbr2_62bll 



group: signal transduction 

DKFZphfbr2_62bll . encodes a novel 655 amino acid putative GTPase-activating protein, related to 
human chimaerins . 

The rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to CHIMAERIN 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: /map="4" 

Insert length: 4593 bp 

Poly A stretch at pos . 4571, polyadenylation signal at pos. 4553 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



GGGGGAGTTT 
AAAGGATGGG 
CTATCATGCA 
CTCTTGCTCA 
AGGGGACTTT 
TGACCTTTCC 
AAACAAAAAA 
CTTGCTTGCT 
CTTAAAAAAA 
GAAAAGGAAG 
TATCCCTAAG 
AGCACTTCAA 
TCGCTCAGAG 
GGCCTGTACG 
CCTTAGCCTC 
CGGTGTTTTA 
TGTTCGTTAT 
AGCAGTGCGT 
TTTCGACTGC 
TGACTGTGGG 
TGGCATCACT 
CCTTATGCGA 
GGAAGAGGAA 
CAGTGGTAAA 
GTACAGTCCT 
GGTCTTTGGT 
TCATGGAGGG 
AAACATGATT 
AGATGGAGTG 
TGTTACAGAA 
TGCTCCTGGG 
ATCCCCCACA 
TTCACAAGCT 
CCAGCCTTTA 
CAGTAATGCA 
TACAGGCCAG 
ACGCACAGTG 
CACACTCGGG 
GCTATGTGAC 
GGCCAGCACA 
CATGATGAAC 
CTTCCTCCTG 
TCTACCACCA 
CCCTGTTTTG 
ATGAAAGCAA 
ACCAGTAGCA 
CCACAGTGCA 
AACAGAAGAT 
TTGACTTTGG 
GAGGAAAAAG 
CAAAAGAAGA 
CAGTTTTTTT 



GAAGACAGAA 
GGGTGCTATA 
CTGAAATGCT 
GATGAAAGGA 
TTGTCATTGT 
AGAGGAATCT 
ATGCAAAAAC 
GCAGTTCTTT 
AGAAAAAAAA 

CCAGGACCTG 
AAATAACAAG 
GCAGGTTTTA 
ATGCCTGAAG 
AACTCCTTTC 
GTTTTCGGAA 
GAGAAGAGAT 
GGACTTTATC 
CAGGCCAGGC 
GAGAAGCCAT 
TCTTAAGCTG 
AGTATGAAGA 
GCAGGTGTTA 
TTACAACCTC 
ACTCGGGAGT 
CCTAATATCC 
CACTGTGGTG 
GCCTCTTTCC 
AGCAACAACA 
CAAGGAGAAC 
ACAAGTCTGA 
GCTCTATCAG 
AGATGTGTCT 
ATAAGGGTAG 
GAAGGTCTTG 
AAGGAGCTCT 
TACAGAATGG 
AACCCCACAA 
CCTGAGGGAT 
ACAGACTGTC 
CTTGATGACA 
TGAAATCTCC 
CCTGCCCAGA 
GATGGGCCCC 
AAGTGACCAC 
GTGACAACAG 
CTGCACAGTT 
AGAGTATGAG 
AAACAGAAAT 
TTCACAATGA 
TGCCGAGAAA 
CCACGTTTGG 



AGGAAAGGGG 
AAAGAAGCAG 
TTCTGGAGAA 
GCCAGCAAGG 
TCCTCTTTCC 
CAGTCCAGCT 
CAATTCCTGC 
TCCTGTGACA 
AACCTTAAAA 
CTAAACAGGA 
GATGATCAAA 
TAAACAAGCA 
AAGGAAGCCA 
ACCGGAATTC 
ATCCCTAAAA 
AGGCATTTTT 
ATGGGAACCG 
CGACAAAGGG 
TAATCTTGTT 
CATTTGACAG 
TACCTCCGAG 
TTTTTTGTCA 
AGGAATTAGC 
CTCAAGTATA 
TAACAAAATG 
TGCGCCCCAA 
GTCCAGCAGT 
CAAAGATGCA 
ATGAAATTCA 
AATAACACCA 
GTCACCCCAG 
GCAGCAAAAC 
AGAAGCCCCC 
TGGGATAGTT 
AGAAAACCCA 
TCACTGAAGG 
AACGGTGCGC 
ATGTTCGAAA 
AACAAGCAGA 
CACCTATGAT 
AGCAGAGCAT 
CTCCCTGAGA 
GCAAGACTTT 
CGCAGGACGA 
AGGAGTGTGG 
TGAGACATTT 
TAGTTTCCAG 
TCCAGGATAA 
GATGAGCCTC 
TAGAAATAAA 
AGAAATGACA 
AGAACTGACA 



AGAAACCTGC 
GGGGGTCCTT 
GGTGCCGTTA 
ACAGTCCTGA 
TCTTGCACAG 
GAGAAGACAG 
TGTTTGAATG 
TTTTGGAATG 
ACTCCCTGGA 
GTAAATGAGA 
ACCTTCAAAT 
TGAGGAGTGG 
AAACCGGGTT 
TGGGGGGTGC 
CTACATACAG 
GGACAGAAAC 
TCTGGCTCCG 
GGCTGAAAGA 
AAGGAGCTCC 
CAACACAGAT 
AACTTCCAGA 
TGTGCCAAAC 
AAAGC AGGTG 
TTTGCAGATT 
AGTGTGCAGA 
AGTGGAAGAT 
TGATGTCAGT 
GAACTACAAA 
GAAGAAAGCC 
AGGACAGCCC 
AGAAGCAGCA 
CAACAGCCCA 
CTCTCATGGT 
ACCAATGGGT 
AACCACCCCC 
TATCTGGTAC 
ATGGGCATTT 
CATGAGCTGG 
AAGAACAAGC 
AATGTCCATC 
TGACAGTGCT 
ACTCCAACTC 
TTTGGGGGGA 
CCTTTCCCAC 
GAGGTCGAAG 
GTGGGCAACA 
CCTGAAACAG 
AGAGCTTAGA 
CATGATGAAC 
AATGCGAAAT 
TGCTACAGAA 
GTGGAACCCA 



AGAGAGCATC 
T G AAAGAAAT 
TTTTCCTCCC 
AATATTCCTC 
AGCTATTTGC 
TTCTTAATAA 
GGAATGGTAG 
TCTGCAGAAA 
TTAGGCAAGA 
GGTGGTAACT 
TCTAGGGATC 
CTGTTGGGTT 
CAGAACTTCA 
CCGGCTGGTG 
AAGAATCAAA 
TGGAGGATAC 
ATGTTGGTGG 
AGAGGGTCTC 
AAGATGCCTT 
GTACACACGG 
ACCAGTTATT 
TGCTCAGCAA 
AAGAGTTTGC 
CTTGGATGAA 
ACTTGGCAAC 
CCTTTGACTA 
GATGATTAGC 
GCAAGCCCCA 
ACCATGGGGC 
TAGTAGGCAG 
TGAACAATGG 
AAGAACAGTG 
CAAAAAGAAC 
CCTTCAGCAG 
AATGGGAGCC 
CAAAATGGGC 
TGAACAGCGA 
CTGCCAAATG 
TGGAGAGTTA 
AACAGTTCTC 
ACCTGGTCCA 
CTGTCGCTCT 
ACTTTGAGGA 
CCCAGGGACT 
TAGTCGTGCC 
GCAGCAGCAA 
GAAATGACCA 
ACAGCGAAAC 
TGGATCAGGA 
GCCGAGCGAG 
AGAAATGGAG 
GGAGAACCGA 
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2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 



GAGAGGAAAC 
TGATGGCTCT 
CCAGGTGGCT 
TCATTTACAG 
TCATGCCCCA 
AGAGTAGTTT 
TATTTTATTG 
ATTTAGCTTG 
TGGCATTGTG 
CTTTTTTGCT 
ATATATGAGT 
GCCAATAGAC 
AATACAGTCG 
CTGGCACCAC 
GAGACTCCAT 
ATCCTCAGTG 
GGTAAAGGAT 
GGACCGAATC 
CTTAGGTACT 
GTCTCCCACT 
CAGCAGTGAA 
TATGTCACGT 
TTCTCTCCCT 
GGAGGGGGCC 
ATTTTGCTTC 
ATTTTAAAAG 
TCAGCAATGC 
TTTGGGAGAA 
AAATAATCTT 
AATAGAGACT 
AATGCTTGCA 
CTTTCATCCT 
AAGCGTCTGT 
CTTTAATTCC 
CCTGGACATA 
ATAACTCTCC 
GATAAAAACC 
AAACAAAGAT 
CTGTTACTGA 
AATAATAAAA 



ACAATATGGA 
GGCAAGGACT 
GGTCACCTGG 
ACATTAAACA 
TAATGCTACT 
TTCAAAAGTA 
CAAGTCTTGT 
CTTTCAAGCT 
TTATCATCGG 
GAGGAAATGA 
TATTAAAACC 
TTTGTCATGA 
AATCACCAGG 
TCAGTTTTGC 
GAGAAAGTCC 
CGTATCGCCA 
GGCATTTAAC 
TCTTTAACTG 
GGGAAACAAT 
CAAACCTCTC 
ATGGTATTAC 
AGTGACATTT 
ACTACAGCTG 
AGGCTGCAGG 
TAATTTTGAC 
GTGAATGCCT 
TAATTTTCTA 
CAGTTCTTCA 
TCTCACCGTA 
ACATACTTGA 
TGTGTTTATT 
TGCCACTGTC 
GGTCCTATGG 
CTTTTCTCTC 
CGATAGGAAA 
CTTCATATCT 
TCAGACTCAT 
ATTTAAACTG 
ACCTTCTATG 
ATACTACTCC 



TTCAGTGAGC 
CCAGGGATTC 
ATGTACAGAA 
TCCATATCTG 
GTCAAGTGTT 
AACTAAAAAT 
ATTTAAATGT 
TCACCCCTTG 
CTTATTTTAT 
AGATAAGCAA 
AGAAGAATAC 
C C AAAAAG AG 
AACCTTTGAG 
TTTTGCGAGG 
CTTTCTGAGG 
ATGCAGGATG 
GATTCAGGCT 
CTGGATAGTT 
GCTTGCTAAA 
CCATCTCCCA 
TGTTTCCCTC 
TTTTCTCACT 
GCAAAGTTGG 
AGAAGGAGAA 
AGTATCACTT 
AAAGTTCCAA 
GAAAAACCCA 
CAATAAGGCA 
GAACAAAAAG 
GTTTATGGGG 
TATTTTCAAG 
TTGCTTTTAT 
TATCAACCAG 
TCTTTCCAAT 
TTCAAACTCA 
TTTCACCTAT 
CCAGAAAGCT 
CTTGGGTTCA 
CATAACTTTT 
CATAAAAAAA 



CTGCTTTCGC 
TGGTGGGATA 
GTCTAACTGG 
CAATGTGTAC 
ACAACTGGAT 
GAGAAGCATA 
TAAATCAATA 
CACTTAACAT 
AGATCAATAT 
AAATATAAAT 
TTTGTGGCTG 
AAATGTAAAT 
CTGCTTTTAA 
CGATTTGACA 
CCCACTGTCT 
CTCCTTAGAA 
TTGAATTACT 
TTAGAGGAAT 
CCATGCCCAC 
ACAACTGCAC 
TGAGTGAAAC 
CAGGCTATTG 
TTTGCAGCAA 
AAGTTTAGAA 
TCCTGTTAAA 
TTTTAGCAAA 
GGGCTCTTTG 
ATGGTTTTGA 
TTACAAAAGG 
TTTGTGTTGT 
AGGGAAAGTG 
TTTTTACTCT 
TATCTTTATA 
TATTTAACCA 
AAATATGAAA 
TTCCAGTCCT 
ATATGATGCA 
AATGGTATAC 
TTTTTCCTCT 
AAAAAAAAAA 



CTGCTGTCTC 
TGACTTAGAA 
TGAAGGAATA 
CAAAGTTATA 
ATGTGTATAT 
TTTCAAGAAT 
TGTTGTTGCA 
AAGCTATTTT 
TTTTATTTCC 
ATATATATAA 
TGCTGTTTGT 
AGTTTTATAA 
AATTCTTCCC 
TAGGAACTTT 
ACCTTGCCAG 
AAGAAAAAAT 
CTGTCCCTCT 
TCTCCTGCTA 
GTGAGCACCT 
TTTAGAATAC 
TGCTAGAGTA 
CCATCTGGGA 
GAAGATAGTG 
GAAACAAACC 
ACATACAATA 
TATGGGAACC 
GAGCTAGAGT 
GAGGCCAGGC 
CATAATCGGA 
TTGAAGGTTC 
GTCTGTACTG 
CCCACTGAGC 
GCAATAATTT 
GTTACTTCCA 
ATTGATCTTA 
TATCATAGTT 
CTAGTAAAAA 
AATTTGCCAG 
GTGCAATTGG 
AAC 



BLAST Results 



Entry G38474 from database EMBLNEW: 

SHGC-58303 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 2175, P = 1.2e-92, identities = 439/441 



Medline entries 



97476250: 

Beta2-chimaerin is a high affinity receptor for the phorbol ester tumor 
promoters . 



Peptide information for frame 1 



ORF from 661 bp to 2625 bp; peptide length: 655 
Category: similarity to known protein 



1 MPEDRNSGGC 

51 EKRYGNRLAP 

101 EKPSFDSNTD 

151 AGVKELAKQV 

201 PNILRPKVED 

251 SNNNEIQKKA 

301 ALSGSKTNSP 

351 EGLEKTQTTP 

401 NPTNVRNMSW 

451 LDDKQSIDSA 

501 DGPPQDDLSH 

551 LHSLVSSLKQ 



PAGALASTPF 
MLVEQCVDFI 
VHTVASLLKL 
KSLPVVNYNL 
PLTIMEGTVV 
TMGLLQNKEN 
KNSVHKLDVS 
NGSLQARRSS 
LPNGYVTLRD 
TWSTSSCEIS 
PRDYESKSDH 
EMTKQKIEYE 



IPKTTYRRIK 
RQRGLKEEGL 
YLRELPEPVI 
LKYICRFLDE 
VQQLMSVMIS 
NNTKDSPSRQ 
RSPPLMVKKN 
SLKVSGTKMG 
NKQKEQAGEL 
LPENSNSCRS 
RSVGGRSSRA 
SRIKSLEQRN 



RCFSFRKGIF 
FRLPGQANLV 
PYAKYEDFLS 
VQSYSGVNKM 
KHDCLFPKDA 
CSWDKSESPQ 
PAFNKGSGIV 
THSVQNGTVR 
GQHNRLSTYD 
STTTCPEQDF 
TSSSDNSETF 
LTLETEMMSL 



GQKLEDTVRY 
KELQDAFDCG 
CAKLLSKEEE 
SVQNLATVFG 
ELQSKPQDGV 
RSSMNNGSPT 
TNGSFSSSNA 
MGILNSDTLG 
NVHQQFSMMN 
FGGNFEDPVL 
VGNSSSNHSA 
HDELDQERKK 
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601 FTMIEIKMRN AERAKEDAEK RNDMLQKEME QFFSTFGELT VEPRRTERGN 
651 TIWIQ 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_62bll , frame 1 

SWISSPROT:Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053., N = 3, Score = 
661, P = 2.4e-89 

TREMBL:HSU90908_1 product: "unknown"; Human clones 23549 and 237 62 
mRNA, complete cds . , N = 1 , Score = 348, P = l.le-29 

PIR:S29128 N-chimerin - rat, N = 1, Score = 286, P = 2.8e-24 

P1R:S29956 beta-chimerin - rat, N = 1, Score = 279, P = 1.6e-23 

TREMBL:AB014572_1 gene: "KIAA0672"; product: "KIAA0672 protein"; Homo 
sapiens mRNA for KIAA0672 protein, complete cds., N = 1, Score = 314, P 
= le-24 



>SWISSPROT: Y053_KUMAN HYPOTHETICAL PROTEIN KIAAO053. 
Length = 638 



HSPs: 



Score = 661 (99.2 bits), Expect = 2.4e-89, Sum P(3) = 2.4e-89 
Identities = 122/209 (58%), Positives = 160/209 (76%) 



Query: 38 GIFGQKLEDTVRYEKRYGNRLAPMLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAF 97 

G+FGQ+L++TV YE+++G L P+LVE+C +FI + G EEG+FRLPGQ NLVK+L+DAF 
Sbjct: 148 GVFGQRLDETVAYEQKFGPHLVPILVEKCAEFILEHGRNEEGIFRLPGQDNLVKQLRDAF 207 

Query: 98 DCGEKPSFDSNTDVHTVASLLKLYLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELA 157 

D GE+PSFD +TDVHTVASLLKLYLR+LPEPV+P+++YE FL C +L + +E +EL 
Sbjct: 208 DAGERPSFDRDTDVHTVASLLKLYLRDLPEPVVPWSQYEGFLLCGQLTNADEAKAQQELM 2 67 

Query: 158 KQVKSLPVVNYNLLKYICRFLDEVQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEG 217 

KQ+ LP NY+LL YICRFL E+Q VNKMSV NLATV G N++R KVEDP IM G 
Sbjct: 268 KQLSILPRDNYSLLSYICRFLHEIQLNCAVNKMSVDNLATVIGVNLIRSKVEDPAVIMRG 327 



Query: 218 TVVVQQLMSVMISKHDCLFPKDAELQSKP 246 

T +Q++M++MI H+ LFPK ++ P 
Sbjct: 328 TPQIQRVMTMMIRDHEVLFPKSKDI PLSP 356 



Score = 210 (31.5 bits), Expect = 2.4e-89, Sum P(3) = 2.4e-89 
Identities = 45/115 (39%), Positives = 73/115 (63%) 



Query: 531 TSSSDNSETFVGNSSSNHSALHSL VSSLKQEMTKQKIEYESRIKSLEQRNLTLETEM 587 

T +S NSET G +S + SL V L++E+ QK YE +IK+LE+ N + ++ 
Sbjct: 523 TLASPNSETGPGKKNSGEEEIDSLQRMVQELRKEIETQKQMYEEQIKNLEKENYDVWAKV 582 

Query: 588 MSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVE 642 

+ L++EL++E+KK +EI +RN ER++ED EKRN L++E+++F + E E 
Sbjct: 583 VRLNEELEKEKKKSAALEISLRNMERSREDVEKRNKALEEEVKEFVKSMKEPKTE 637 

Score = 70 (10.5 bits), Expect = 1.2e-74, Sum P(3) = 1.2e-74 
Identities = 28/121 (23%), Positives = 54/121 (44%) 

Query: 528 SRATSSSDNSETFVGNSSSNHSALHSLVSSLKQE-MTKQKIEYESRIKSLEQRNL-TLET 585 

S+ TS+ DN + G+ SAL S K + + E K+ + + +L+ 

Sbjct: 489 SQRTSTYDNVPSLPGSPGEEASALSSQACDSKGDTLASPNSETGPGKKNSGEEEIDSLQR 548 

Query: 586 EMMSLHDELDQERKKFTMIEIKMRMAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRR 645 

+ L E++ +++ M E +++N E+ D + L +E+E+ L + R 

Sbjct: 549 MVQELRKEI ETQKQ MYEEQIKNLEKENYDVWAKVVRLNEELEKEKKKSAALEISLRN 605 



Query: 646 TER 648 
ER 

Sbjct: 606 MER 608 

Score = 53 (8.0 bits). Expect = 2.4e-89, Sum P(3) = 2.4e-89 
Identities = 31/111 (27%), Positives = 46/111 (41%) 



Query: 344 S FS S SNAEGLEKTQTT PNGS LQARRSSSLKVSGTKMGTHS VQNG TV--RMGILNSD 397 

SFSS +++TT A SKV KG +Q+ T+ R L S 

Sbjct: 388 SFSSMTSDS-DTTSPTGQQPSDAFPEDSSKVPREKPGDWKMQSRKRTQTLPNRKCFLTSA 446 
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Query: 398 TLG-NPTNV RNMSWLPNGYVTLRDNKQKEQAGELGQ HNRLSTYDNV 442 

G N + + +N W P+ + ++ + +L Q R STYDNV 

Sbjct: 447 FQGANSSKMEIFKNEFWSPSSEAKAGEGHRRTMSQDLRQLSDSQRTSTYDNV 498 

Score = 53 (8.0 bits), Expect = 3.5e-14, Sum P(3) = 3.5e-14 
Identities = 32/125 (25%), Positives = 56/125 (44%) 

Query: 242 LQSKPQDG VSNNNEIQKKATMGLLQNKEN — NNTKD SPSRQCSWDKSESPQRSS 293 

+ + SK +D + +IQ+ TM ++++ E +KD SP Q + K RSS 

SbjCt: 314 IRSKVEDPAVIMRGTPQIQRVMTM-MIRDHEVLFPKSKDIPLSPPAQKNDPKKAPVARSS 372 

Query: 294 MNNGSPTALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGL 353 

+ + L S+T+S + D+P++AF+SV + 

SbjCt: 373 VGWDATEDLRISRTDSFSSMTSDSDTTS — PTGQQPSDAFPEDSSKVPREKPGDWKMQSR 430 

Query: 354 EKTQTTPN 361 

++TQT PN 
Sbjct: 431 KRTQTLPN 438 

Pedant information for DKFZphfbr2_62bll, frame 1 



Report for DKFZphf br2_62bll . 1 

[LENGTH] 655 

[MW] 73394.60 

[pi] 8.13 

[HOMOL] SWI SSPROT : YO 5 3 HUMAN HYPOTHETICAL PROTEIN KIAA0053 . 3e-71 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPL115c] le-16 

[FUNCAT] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPL115c] le-16 

[ FUNCAT ] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YPL115c] 
le-16 

[FUNCAT] 10.02.09 regulation of g-protein activity [S. cerevisiae, YPL115c] le-16 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YER155C] 2e-16 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YER155c] 2e-15 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YDR379w] 4e-16 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDL240w] 3e-15 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR134w] 2e-13 

[FUNCAT] 30.04 organization of cytoskeleton (S. cerevisiae, YOR134w) 2e-13 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) 2e-46 

[SCOP] dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn 6e-37 

[PIRKW] phosphotransferase 3e-13 

[PIRKW] breakpoint cluster region 2e-20 

[PIRKW] transmembrane protein 7e-14 

[PIRKW] brain 2e-20 

[PIRKW] alternative splicing 2e-20 

[PIRKW] p-loop 9e-19 

[PIRKW] cytoskeleton le-08 

[SUPFAM] CDC24 homology 7e-2t 

[SUPFAM] bcr protein 7e-21 

[ SUPFAM] myosin motor domain homology 9e-19 

[SUPFAM] pleckstrin repeat homology 2e-15 

[SUPFAM] LIM metal-binding repeat homology 9e-15 

[SUPFAM] protein kinase C zinc-binding repeat homology 5e-24 

[PROSITE] MYRISTYL 16 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 11 

[PROSITE] ASN_GLYCOSYLATION 8 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.87 % 

[KW] COILED_COIL 12.06 % 

SEQ MPEDRNSGGCPAGALASTPFIPKTTYRRIKRCFSFRKGIFGQKLEDTVRYEKRYGNRLAP 

SEG 

COILS 

lrgp- C 



SEQ MLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAFDCGEKPSFDSNTDVHTVASLLKL 

SEG 

COILS 

lrgp- HHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCGGGCCCCHHHHHHHHH 

SEQ YLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELAKQVKSLPWNYNLLKYICRFLDE 

SEG 
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COILS 

lrgp- HHHHTTTTTTTGGGHHHHHH TTTTCGGGHHHHHHHHHHHCCHHHHHHHHHHHHHHHH 

SEQ VQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEGTVVVQQLMSVMISKHDCLFPKDA 

SEG 

COILS 

lrgp- HHHHHHHHCCCHHHHHHHHGGGCC 

SEQ ELQSKPQDGVSNNNEIQKKATMGLLQNKENNNTKDSPSRQCSWDKSESPQRSSMNNGSPT 

SEG 

COILS 

lrgp- 

SEQ ALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGLEKTQTTP 

SEG 

COILS 

lrgp- 

SEQ NGSLQARRSSSLKVSGTKMGTHS VQNGT VRMGILNSDTLGNPTNVRNMSWLPNGYVTLRD 

SEG 

COILS 

lrgp- 

SEQ NKQKEQAGELGQHNRLSTYDNVHQQFSMMNLDDKQSIDSATWSTSSCEISLPENSNSCRS 

SEG xxxxxxx 

COILS 

lrgp- 

SEQ STTTCPEQDFFGGNFEDPVLDGPPQDDLSHPRDYESKSDHRSVGGRSSRATSSSDNSETF 

SEG xxxxx xxxxxxxxxxxxxxxxx. . . 

COILS 

lrgp- 

SEQ VGNSSSNHSALHSLVSSLKQEMTKQKIEYESRIKSLEQRNLTLETEMMSLHDELDQERKK 

SEG . . xxxxxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

lrgp- 

SEQ FTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRRTERGNTIWIQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

lrgp- 
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(No Pfam data available for DKFZphfbr2_62bll . 1) 
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DKFZphfbr2_62f 10 



group: intracellular transport and trafficking 

DKFZphfbr2_62f 10 encodes a novel 320 amino acid protein with strong similarity to mammalian 
zinc transporter proteins. 

The novel proteins is a membrane protein, which should be involved in the transport of Zinc 
across the cell membrane. 

The Zn-T-transporters are membrane proteins that facilitates sequestration of zinc in 
endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved in the accumulation of zinc 
in synaptic vesicles. Zinc (Zn) is an essential element in normal development and metabolism. 
Recent studies show that in Alzheimer's disease, Zn functions as a double-edged sword, 
affording protection against Alzheimer's amyloid beta peptide (the major component of senile 
plaques) at low concentrations and enhancing toxicity at high concentrations by accelerated 
aggregation of the amyloid beta peptide. 

The new protein can find application in modulation of Zinc transport in neuronal cells, thus 
providing means for a modulation of Alzheimer's amyloid beta peptide plaque formation. 



strong similarity to zinc transporter proteins ; 
membrane regions : 5 

Summary DKFZphfbr2_62 f 10 encodes a novel 320 amino acid protein with 
similarity to zinc transporter protein. 

The new protein can find clinical application in modulating Zn2+ 
uptake . 



strong similarity to zinc transporter proteins 

complete cDNA, complete cds, few EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 5422 bp 

'Poly A stretch at pos . 5397, polyadenylation signal at pos. 5381 



1 GTCTAACTTT GGAAATATCA CCCTCATGCT GTCTTCCCAG GATGTCTCTC 

51 TCCCTAAGTA AGGGATGTTA CTTCCTGGAG GGAATGCAGT GTTGGGAATC 

101 TGAAGACCCA GCTTTGAGCT GAATTTGCTT TGTGATACCT GGAGAGAAGA 

151 CGTGTTTTCT TGACAACAGC ACAGTACCTA GTGAGTTCAA CAACAACGAC 

201 AACAACAGCC GCAGCTCATC CTGGCCGTCA TGGAGTTTCT TGAAAGAGCG 

251 TATCTTGTGA ATGATAAAGC TGCCAAGATG TATGCTTTCA CACTAGAAAG 

301 AAGGAGCTGC AAATGAACAC TTCATAGCAA TGTGGAACTC CAACAGAAAC 

351 CGGTGAATAA AGATCAGTGT CCCAGAGAGA GACCAGAGGA GCTGGAGTCA 

401 GGAGGCATGT ACCACTGCCA CAGTGGCTCC AAGCCCACAG AAAAGGGGGC 

451 GAATGAGTAC GCCTATGCCA AGTGGAAACT CTGTTCTGCT TCAGCAATAT 

501 GCTTCATTTT CATGATTGCA GAGGTCGTGG GTGGGCACAT TGCTGGGAGT 

551 CTTGCTGTTG TCACAGATGC TGCCCACCTC TTAATTGACC TGACCAGTTT 

601 CCTGCTCAGT CTCTTCTCCC TGTGGTTGTC ATCGAAGCCT CCCTCTAAGC 

651 GGCTGACATT TGGATGGCAC CGAGCAGAGA TCCTTGGTGC CCTGCTCTCC 

701 ATCCTGTGCA TCTGGGTGGT GACTGGCGTG CTAGTGTACC TGGCATGTGA 

751 GCGCCTGCTG TATCCTGATT AC C AGAT CCA GGCGACTGTG ATGATCATCG 

801 TTTCCAGCTG CGCAGTGGCG GCCAACATTG TACTAACTGT GGTTTTGCAC 

851 CAGAGATGCC TTGGCCACAA TCACAAGGAA GTACAAGCCA ATGCCAGCGT 

901 CAGAGCTGCT TTTGTGCATG CCCCTGGAGA TCTATTTCAG AGTATCAGTG 

951 TGCTAATTAG TGCACTTATT ATCTACTTTA AGCCAGAGTA TAAAATAGCC 

1001 GACCCAATCT GCACATTCAT CTTTTCCATC CTGGTCTTGG CCAGCACCAT 

1051 CACTATCTTA AAGGACTTCT CCATCTTACT CATGGAAGGT GTGCCAAAGA 

1101 GCCTGAATTA CAGTGGTGTG AAAGAGCTTA TTTTAGCAGT CGACGGGGTG 

1151 CTGTCTGTGC ACTGCCTGCA CATCTGGTCT CTAACAATGA ATCAAGTAAT 

1201 TCTCTCAGCT CATGTTGCTA CAGCAGCCAG CCGGGACAGC CAAGTGGTTC 

1251 GGAGAGAAAT TGCTAAAGCC CTTAGCAAAA GCTTTACGAT GCACTCACTC 

1301 ACCATTCAGA TGGAATCTCC AGTTGACCAG GACCCCGACT GCCTTTTCTG 

1351 TGAAGACCCC TGTGACTAGC TCAGTCACAC CGTCAGTTTC CCAAATTTGA 

1401 CAGGCCACCT TCAAACATGC TGCTATGCAA TTTCTGCATC ATAGAAAATA 

1451 AGGAACCAAA GGAAGAAATT CATGTCATGG TGCAATGCAT ATTTTATCTA 

1501 TTTATTTAGT TCCATTCACC ATGAAGGAAG AGGCACTGAG ATCCATCAAT 

1551 CAATTGGATT ATATACTGAT CAGTAGCTGT GTTCAATTGC AGGAATGTGT 

1601 ATATAGATTA TTCCTGAGTG GAGCCGAAGT AACAGCTGTT TGTAACTATC 

1651 GGCAATACCA AATTCATCTC CCTTCCAATA ATGCATCTTG AGAACACATA 

1701 GGTAAATTTG AACTCAGGAA AGTCTTACTA GAAATCAGTG GAAGGGACAA 

1751 ATAGTCACAA AATTTTACCA AAACATTAGA AACAAAAAAT AAGGAGAGCC 

1801 AAGTCAGGAA TAAAAGTGAC TCTGTATGCT AACGCCACAT TAGAACTTGG 
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1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 
4651 
4701 
4751 
4801 
4851 
4901 
4951 
5001 
5051 
5101 
5151 
5201 
5251 
5301 
5351 
5401 



TTCTCTCACC 
TATGTATGAA 
CACATTTTGG 
TCAAAACTTT 
AGTGCTTAAA 
AGTCTTCAAG 
TTGCCACAGA 
CAACCAATTC 
GACATTGGGC 
GCTTCTGTGT 
AGATGTGCAG 
GAACCTGGAC 
TGAGCCATAG 
AGCTAGACAC 
GTAGTGGGGT 
AGTGATGTTT 
ATATGGAGGC 
AAGAAGACTA 
GAAAAAGCAA 
AAAAC T ACAT 
GGAGATAGGT 
ACTAGAAACC 
TATATCTGGG 
AAATTTGAGG 
TGCCATCTCT 
TGCTGAGCTT 
AGTGAGCTTG 
CTCCTGCCAC 
ACTTTTCCTG 
CATAGGACGC 
ATAATTCTTT 
TAAGTCAGAA 
TGTGTCAACC 
CCCTGAAATT 
GATGTTTATA 
TTTTGAGATG 
CAATTCTGAA 
ACTCCAGGTC 
CAAGACATTG 
CTTTTTTTTG 
ATGTAGGTTT 
CATCAACCTG 
CTGTTCTTCT 
TTTAATATGC 
TCTGTGGTGA 
TGGTTACAGA 
TTACACATTA 
AGCTCTTCCC 
CTGATCCTCT 
AGTTTAATAA 
AGTGTTCAGT 
GGGCACAGAA 
ACACCGTTCC 
TGTTTCTCAG 
AGAGGCTGTT 
CTTTCTGACT 
TGTCACCCCA 
GAAGGCAAAG 
CAAGTTCTTG 
TGCATTTCTT 
TACTTAATAT 
GGTGTCAAAA 
GGAGAATGTT 
AGAGTCTTGC 
CCTCCAGGAA 
ATCACAGGGC 
CCAAGCAAAT 
GCCTGGCTTG 
TTAAAAAAAA 
GATTTCACAC 
CTATTTTAAA 
GAAAAAAAAA 



AAGCTGTAAT 
TATACAGAGA 
CAATAAATCC 
ATATAATCAC 
CACTGGCACC 
AACAGCCGAC 
TAATTTAGAT 
ATTCAGTCCA 
TTAGCACTGA 
TCTGGTAGCA 
GCCAACATTC 
TTCTGCATTT 
TCTAGAAGAT 
ATACATTGGC 
ATAAAAGGAA 
CACGTCATTG 
TCTCCAGGAA 
GGCACAAGGC 
AATACATGAT 
ACTTTTTGCA 
CTTAGATGAT 
TAGCAGGCAT 
CCTTGTCATT 
GCCAAGAAAA 
GCAAATCAAT 
TCCCTGCTCA 
TTTAGGCAAC 
ATCGGGTTCT 
TCTGAAGGAC 
CCTAAAGACT 
GCTTCTGCTT 
ATTCACTGAA 
AAAGTAATTG 
CTGCTTTTTT 
TGGCAAAATG 
AAAAAACAGA 
GTTCTGACTC 
ACTGGAAGTT 
TATTCTCTCC 
TTATTGTTAT 
GTTACATAGG 
TCATCTACAT 
GAGTAGTGAA 
TTCACCATCA 
TGTTAGGACC 
TGGGAAATGG 
AGCATCAGTT 
TTTATTAGCT 
ATTTCCTGAT 
ATTAGGACAC 
ACATGTTAAA 
TTTTAAATCA 
CACAAGACAG 
AGGAAGCAAT 
ACCTAGTGAG 
GTCTCTGAAA 
AGGACATTTA 
TCATAGGTCT 
GCTTTTCTCT 
TCTCTGGTTT 
CTGTTAAATT 
AGAAAAACAC 
GCTCTCCAGC 
AAAGAGACAA 
ATGATTTTCT 
AATCTGTTTA 
GTCATCTCTG 
ACAGTGATGA 
TAAAAAGGTG 
TTAACGTCTG 
ATGTTAATGA 
AAAAAAAAAA 



GTGATTTTTT 
AGTGCTTACA 
CTCTTATTTC 
TGTTCAAAAG 
AGCCAAAGAA 
AAAAACATTC 
ATTTACCTGC 
CGAGCATGAT 
AACTATAAAG 
ACTCAACACT 
TGGAAATCCT 
TTAAAAGTTA 
TGTCAACCAC 
AGTTACAATA 
AGCGATGGAT 
AGGTGACAGC 
GACGAAGAAG 
ACACTTATGT 
GCAAAGAAAC 
ACTTTATGGT 
TTTTATGTTG 
TAATAATTGT 
ATTTATCATT 
CATTGACTTT 
CAGCACCACT 
GTAGAGACAA 
CAGGATTAGA 
CAAAATGGAA 
CACTGAATGG 
AGGTGACTTG 
CTTTTTGAAA 
TGTCAGGTAA 
TCCCATGGCC 
AGTCAGCTAG 
CAAGACAATC 
TGCTACTCAG 
TCCCATTACC 
AGTGGAATCA 
AGCTATCAAA 
ACTTTAAGTT 
TATACATGTG 
TCTTTTATGT 
ATCAGGTCAA 
TCCAGCACCT 
CATAAAAGAA 
GAATGTTGAA 
CTGAAGCTAG 
CTGTGACCTC 
CAGTGAAACC 
TTAAAAATGT 
TGTTGTTTTT 
TCTCAACTTT 
TGGCAAAATT 
GGAGGCTTGC 
AGTGATGAAT 
GGTTCCGCTT 
TTAATAAAAA 
CCCAAGTCTT 
GTCATGTAGC 
CTAAATTGCC 
TTGTGACCCA 
CTCCCAGGCA 
TCCATCCCCA 
GCCTCATTTT 
CAGCTCATCT 
AATGACTAAT 
AATACACACA 
GGCCACTTAA 
ACCATCTGCG 
TCATTCTGTT 
GTGTTGTTTA 
AC 



TTTCTACTCT 
ACTAATTTTT 
TAAATTCTAA 
GAAATATTTT 
TGTGGTTGTA 
GAGTTGACCC 
AAGAAGGAAT 
GTGAGCACTG 
AGGAATCAGA 
ATCTGTGGAG 
ATGTCAGTGG 
CCCAGAGATG 
AGGAGTTCAT 
GTATCATGAA 
ATTGCCGGAT 
TCTGCTGGAC 
AGAAGGACAT 
TTGTCTGTTA 
CTCTCCACGC 
TATGAGTATT 
TTGTCAGACT 
TGAGGCAATG 
TATATTTGTA 
GACTGAGGAG 
GAAATAACTA 
ATATACTCAT 
GCTGCTCAGG 
AGAATGGTTT 
TTTTGTTTTT 
GCAAACACAC 
ATCATGTTTA 
TCATTATGGA 
CCAGGGTATT 
ATTGAAAACT 
TATAAGGGAG 
GGGCTTTATG 
CTTTCCCTGG 
TGTAGTTGAA 
ACATTAATGA 
CTGGGGTACA 
CCATGGTGGT 
CTGTCTTTCA 
CTTTACCACC 
ACTTAAGATT 
ATTTATGCCT 
GGACATGAAA 
ATTGTCTGAG 
GAGCTAGTTA 
TCCCTATTCA 
TGGAGCAGTG 
TATTATGTAC 
TGAGAAATTT 
ATTGGTGAGA 
TGGGATAAAG 
TAATTAAAAT 
TTATCTTTGA 
GAACAACTGT 
ACCCCATTCC 
CTCAACTTTC 
AGTGGCAAAT 
ACAAAGTCTT 
TATACATTTT 
CCCAATGAAA 
CCACAATTAG 
CTCTGTATTC 
TACAGAAATC 
TCCCAAGCTT 
CAGTCCAGCG 
GTTTAGTTTT 
ACTGGGCACC 
AAATAAAATC 



GAATTGGAAA 
ATTTACTTGT 
CTTGTTTATT 
CACCTACCAG 
GAGACCCAGA 
CACCAAGTTG 
AAAGCAGATG 
CTTTGTGCTA 
CGCAGCAAGT 
AGTAAACTGA 
GTTTGGTTTG 
CTTCTAAAGA 
TGAGTGGGAC 
TTGCAATGAT 
GGGCATGGCC 
TTTGAATTAC 
TCTAGGCAAA 
GCTTTTAGTT 
TGTGATTTTT 
GTAGAGAACA 
CTAGCAAGGT 
ACTCTGAGGC 
TTTTTTTCTG 
GTCACATCTG 
CTTAGCATTC 
CCCCCACCTC 
TTCCCAACGT 
ATGCCAAATC 
CCATATTTTG 
AAGTGTTAGT 
GATTTGATTT 
GGGAGATTTG 
TCTGTTGTTT 
CTGAACAGTA 
ATTTTAAGGA 
GACCATCCAT 
TGTGGTCAGA 
TTCTTTACTT 
TCTTTTATGT 
TGTGCGGAAC 
TTGCTGCACT 
AAGCAACACT 
AGCCTCCATT 
TATCTAGGGC 
TCCATATCTT 
GAAAGGATGT 
TTTGAATCTT 
CTTAAATGCT 
AATGTGTGAG 
CATAGCATGT 
AAACATGTGT 
TGAGTTATCA 
ATTAAACAGC 
GCATTTACTG 
AGTCGAATCC 
AGAGCAGAAT 
CCAGTGCAAT 
TGTGAAATAT 
TCCGACCGGG 
TTGGATCACT 
TTAGCACTGT 
ATAGATTCCT 
TATGATCCAG 
CTCTAAAGTG 
CCTGTTTTGG 
ATTAAAGGCA 
TACAAATCCT 
CAGGCGGATG 
TTAACTTTCT 
TGTTTAAATT 
AGGAAAGAGA 



No BLAST result 



BLAST Results 
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97121493: 

ZnT-3, a putative transporter of zinc into synaptic vesicles . 
96203098: 

ZnT-2, a mammalian protein that confers resistance to zinc by 
facilitating vesicular 
sequestration . 



Peptide information for frame 2 



ORF from 407 bp to 13 66 bp; peptide length: 320 
Category: strong similarity to known protein 



1 MYHCHSGSKP TEKGANEYAY AKWKLCSASA ICFIFMIAEV VGGHIAGSLA 
51 VVTDAAHLLI DLTSFLLSLF SLWLSSKPPS KRLTFGWHRA EILGALLSIL 
101 CIWVVTGVLV YLACERLLYP DYQIQATVMI IVSSCAVAAN IVLTVVLHQR 
151 CLGHNHKEVQ ANASVRAAFV HAPGDLFQSI SVLISALIIY FKPEYK1ADP 
201 ICTFIFSILV LASTITILKD FSILLMEGVP KSLNYSGVKE LILAVDGVLS 
251 VHCLHIWSLT MNQVILSAHV ATAASRDSQV VRREIAKALS KSFTMHSLTI 
301 QMESPVDQDP DCLFCEDPCD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_62f 10, frame 2 

PIR:S70632 zinc transporter ZnT-2 - rat, N = 1, Score = 884, P = 
1.5e-88 

TREMBL:MMU7 6007_1 gene: "ZnT-3"; product: "ZnT-3"; Mus musculus zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N = 1, Score = 772, P = 
l.le-76 

TREMBL:HSU76010_1 gene: "ZnT-3"; product: "ZnT-3"; Human putative zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N = 1, Score = 742, P = 
1.6e-73 

TREMBL : MJMUZNT02_1 gene: "ZnT-3"; product: "zinc transporter"; Mus 
mu3culus zinc transporter (ZnT-3) gene, complete cds., N - 1, Score = 
715, P = 1.2e-70 

TREMBL : CET18D3_3 gene: "T18D3.3"; Caenorhabditis elegans cosmid T18D3, 
N = 1, Score = 699, P = 5.9e-69 

>PIR:S70632 zinc transporter ZnT-2 - rat 
Length = 359 

HSPs : 

Score = 884 (132.6 bits), Expect = 1.5e-88, P = 1.5e-88 
Identities = 171/326 (52%), Positives = 230/326 (70%) 



++CH+ +E A+ KL ASAIC +FMI E++GG++A SLA++TDAAHLL D 



S L+SLFSLW+SS+P +K + FGW RAEILGALLS+L IWVVTGVLVYLA +RL+ D 



YQI QAT VMI I VS SC AVAAN I VLTVVLHQRCLGHNH KEVQANASVRAAFVHAPG 17 4 

Y+I+ M+I S CAVA NI++ + LHQ GH+H + Q N SVRAAF+H G 



DL QS+ VL++A I I YFKPEYK DPICTF+FSILVL +T+TIL+D ++LMEG PK ++ 



++ VK L+L+VDGV ++H LHIW+LT+ Q +LS H+A A + D+Q V + 



Query: 


2 


Sbjct: 


34 


Query: 


62 


Sbjct: 


94 


Query: 


122 


Sbjct: 


154 


Query: 


175 


Sbjct : 


214 


Query: 


235 


Sbjct: 


274 
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Query: 295 MHSLTIQMESPVDQDPDCLFCEDPCD 320 

H++TIQ+ES + C C+ P + 

Sbjct: 334 FHTMTIQIESYSEDMKSCQECQGPSE 359 



Pedant information for DKFZphfbr2_62flO r frame 2 
Report for DKFZphfbr2_62f 10 . 2 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

2e-16 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



320 

35053.51 
6.48 

PIR:S70632 zinc transporter ZnT-2 - rat 3e-84 

30.02 organization of plasma membrane [S. cerevisiae, YMR243c] 2e-16 

13.01 homeostasis of metal ions [S. cerevisiae, YMR243c] 2e-16 

03.19 cellular import [S. cerevisiae, YMR243C] 2e-16 , 
11.07 detoxif icaton [S. cerevisiae, YMR243c] 2e-16 

07.04.01 metal ion transporters (cu, fe, etc.) [S. cerevisiae, YMR243c] 



08.04 mitochondrial transport 
30.16 mitochondrial organization 
99 unclassified proteins [S 
transmembrane protein 2e-30 
mitochondrial inner membrane 6e-12 
mitochondrion 6e-12 
membrane protein le-11 
zinc transporter ZnT-2 2e-30 
membrane protein czcD le-11 
MYRISTYL 4 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 1 
PROKAR_LIPOPROTEIN 1 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 2 
TRANSMEMBRANE 5 
LOW COMPLEXITY 8 . 12 % 



[S. cerevisiae, YOR316c] 3e-13 
[S. cerevisiae, YOR316c] 3e-13 
cerevisiae, YDR205w] 4e-07 



SEQ MYHCHSGSKPTEKGANEYAYAKWKLCSASAICFI FMIAEVVGGHIAGSLAWTDAAHLLI 

SEG xxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DLTSFLLSLFSLWLSSKPPSKRLTFGWHRAEILGALLSILCIWVVTGVLVYLACERLLYP 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 

MEM MMMMMMMMMMMMM MMMMMMMMMMMMMMMMKMMMMMMMMMMMM 

SEQ DYQIQATVMIIVSSCAVAANIVLTVVLHQRCLGHNHKEVQANASVRAAFVHAPGDLFQSI 

SEG 

PRD cccccccceeeehhhhhhhhhhhhhhhhhcccccccccccccchhhhhhhhhhhhhchhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



SVLISALII YFKPEYKI ADPICTFIFS ILVLA3TITILKDFSILLMEGVPKSLNY5GVKE 

hhhhhhhhhhcccceeeccchhhhhhhhhhhhhchhhhhhhheeeeeccccccchhhhhh 
. . MMMMMMMMMMMMMMMMMMMM 

LILAVDGVLSVHCLHIWSLTMNQVILSAHVATAASRDSQVVRREI AKALSKSFTMHSLTI 

hhhhhhceeecccceeeeeccchhhhheeeeeccccchhhhhhhhhhhhhhhhcccccee 



SEQ 
SEG 
PRD 
MEM 



QMESPVDQDPDCLFCEDPCD 
eeeccccccccccccccccc 



Prosite for DKFZphfbr2_62fl0.2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



162->166 
234->238 
81->85 
11->14 
75->78 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
CAMP_PHOS PHOSITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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80->83 


PKC PHOSPHO 


SITE 




PS00005 


164->167 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


304->308 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00007 


13->21 


TYR PHOSPHO 


SITE 


PDOC00007 


PS00008 


7->13 


MYRISTYL 




PDOC00008 


PS00008 


42->48 


MYRISTYL 




PDOC00008 


PS00008 


94->100 


MYRISTYL 




PDOC00008 


PS00008 


228->234 


MYRISTYL 




PDOC00008 


PS00013 


125->136 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphf br2_62f 10 . 2 ) 
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DKFZphfbr2_62nlO 



group: brain derived 

DKFZphfbr2 62nl0 encodes a novel 541 amino acid protein with similarity to 
Plasmodium vivax reticulocyte-binding protein 1. 

The novel protein contains one Leucine Zipper, involved in protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to reticulocyte-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="13" 
Insert length: 3522 bp 

Poly A stretch at pos . 3503, polyadenylation signal at pos . 3479 



1 GGGGCGTGTT GGCGGGATTC TGAACGCTGC CATGGCTCAG ACCGTGTAGA 

51 ATGTTACATT GTCGCTCACT CTGCCCATCA CGTGCCACAT TTCCTTGGGG 

101 AAGGTACGTC AGCCTGTCAT ATGCATCAAC AACCATGTAT TTTGTTCGAT 

151 TTGTATTGAT TTGTGGTTGA AGAATAATAG CCAGTGTCCA GCTTGCAGAG 

201 TCCCCATCAC TCCTGAAAAT CCTTGCAAAG AAATTATAGG AGGAACAAGT 

251 GAAAGTGAAC CTATGCTAAG CCATACGGTC AGGAAGCATC TTCGGAAAAC 

301 TAGACTTGAA TTACTACACA AAGAATATGA GGACGAAATA GATTGTTTAC 

351 AGAAAGAAGT AGAAGAGCTT AAGAGTAAAA ATCTCAGCTT GGAGTCACAG 

401 ATCAAAGCTA TTCTGGATCC TTTAACCTTG GTGCAGGGCA ACCAAAATGA 

451 AGACAAACAT CTAGTCACAG ATAATCCAAG TATAATTAAC CCAGAAACTG 

501 TAGCAGAGTG GAAGAAAAAA CTCAGAACAG CTAATGAAAT CTATGAAAAA 

551 GTGAAAGATG ATGTGGATAA GCTAAAGGAG GCAAATAAAA AATTGAAATT 

601 GGAAAATGGT GGTCTGGTGA GGGAGAATTT ACGACTGAAG GCTGftAGTTG 

651 ATAACAGATC ACCTCAAAAG TTTGGAAGGT TTGCAGTTGC TGCTCTTCAG 

701 TCCAAAGTAG AACAGTATGA GCGTGAAACC AATCGCCTCA AGAAAGCCCT 

751 GGAACGAAGT GATAAGTATA TAGAGGAACT AGAATCTCAA GTTGCACAGC 

801 TAAAAAATTC AAGTGAAGAG AAAGAGGCTA TGAATTCCAT TTGCCAGACA 

851 GCACTTTCTG CAGATGGCAA AGGGAGCAAA GGCAGTGAGG AGGATGTGGT 

901 GTCAAAGAAT CAAGGCGATA GTGCCAGAAA GCAGCCTGGC TCATCCACCT 

951 CCAGTTCTTC TCACCTAGCG AAGCCTTCCA GCAGCAGACT GTGTGACACC 

1001 AGTTCTGCAA GGCAGGAAAG TACCAGCAAA GCAGACCTTA ACTGTTCTAA 

1051 GAACAAAGAC CTATATCAAG AACAGGTAGA AGTAATGTTA GATGTGACAG 

1101 ATACAAGTAT GGATACTTAT TTGGAAAGAG AATGGGGGAA TAAACCAAGT 

1151 GACTGTGTAC CCTACAAAGA TGAAGAACTT TATGATTTTC CAGCTCCTTG 

1201 TACTCCTTTG TCCCTTAGTT GCCTTCAGCT CAGTACTCCA GAAAATAGAG 

1251 AGAGCTCTGT GGTCCAAGCA GGAGGTTCCA AAAAGCACTC AAACCATCTC 

1301 AGAAAATTGG TGTTTGATGA TTTTTGTGAT TCTTCAAATG TTTCTAATAA 

1351 AGATTCTTCA GAAGATGATA TAAGTAGAAG TGAAAATGAG AAGAAATCAG 

1401 AATGTTTTTC TTCCACAAAG ACAGGATTTT GGGACTGTTG TTCCACAAGC 

1451 TATGCCCAAA ACTTAGATTT TGAAAGTTCA GAGGGGAACA CGATAGCAAA 

1501 TTCTGTTGGA GAAATATCTT CAAAATTGAG TGAGAAATCA GGCTTATGTT 

1551 TATCCAAAAG GTTGAATTCT ATTCGCTCTT TTGAAATGAA CCGGACAAGA 

1601 ACATCCAGTG AAGCATCGAT GGATGCTGCT TACCTTGACA AAATCTCTGA 

1651 GTTGGATTCA ATGATGTCAG AGTCAGACAA CAGCAAGAGC CCTTGTAATA 

1701 ACGGTTTTAA GTCACTGGAT TTGGATGGGT TATCAAAGTC ATCTCAAGGC 

1751 AGTGAATTTC TTGAGGAACC TGATAAGTTG GAAGAAAAAA CTGAGCTAAA 

1801 CCTTTCCAAA GGTTCTCTAA CTAATGATCA GTTAGAAAAT GGAAGTGAAT 

1351 GGAAACCCAC TTCTTTTTTT TCTCCTCTCT CCATCTGACC AAGAAATGAA 

1901 TGAAGATTTT TCACTCCATT CCAGTTCTTG TCCAGTAACT AATGAAATCA 

1951 AACCCCCAAG CTGCTTGTTT CAGACAGAGT TTTCCCAGGG CATTTTGTTA 

2001 AGCAGTTCAC ATCGACTATT GGAAGATCAA AGATTTGGGT CATCTTTGTT 

2051 TAAGATGTCC TCAGAGATGC ACAGTCTTCA TAACCACCTT CAGTCTCCTT 

2101 GGTCTACTTC CTTTGTGCCT GAAAAGAGGA ATAAAAATGT GAATCAATCA 

2151 ACAAAAAGAA AAATCCAGAG CAGCCTTTCC AGTGCCAGCC CATCAAAAGC 

2201 AACTAAAAGT TGACTCATTA GAAAGGTGTC ATTTGTGGTT TTGTCCTGAG 

2251 AGAAATAGAA AAGTTGTTAA AGTTACCTTT TTTCCTCATA AAAGTTCTAT 

2301 ACAAATTGGA ATTGATAATC TTTAGTCAAG TATCAAGTCA GGATGGTGGA 

2351 TTAACCTGTA CCCAGAATAC TTATTGTTCA TTTTGAAAAG ACTTTGTTCT 

2401 TTTCATTTTT ATTTGGGAGT CTTTGTGACC AGAGAAGTTA GGGAGGAGGT 

2451 TATTTTTGTG TTTTGGGGTT GGTTGGTTGG TTGGTTTTGT TTTTGGTTTT 

2501 GTTTTTTTAC TGAATTTGAT ATGTATCTCG GTTGGATATA CATTGTTTTT 

2 551 TTAAAAAATG TTATTTAACT GTTAGATACA GTGGCCTGTT GATAAGCCCC 

2 601 ACTTGTCTTC AGAACTTGGA TTTCTTAAAT AAAACTTTTA GTGTTGTCTA 
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2 651 TACACTGCTC AATAAGACAC TTGAGTTTAA GCTTTTCCCA GGGTGGAAAT 

27 01 TATTTTACCT GTCCCTTTTT ATTTATGTTT AGTGATGGCC TAGTTTTTCT 

27 51 GCAGGGCCAT GATGGAGAAA TAGCACTCTA GCCTTAGTCC AATATTGATT 
2801 TACTTTCTTT TTTTAGGTTT TATGTATATG TTTGCATTTT TTAGCATTGT 

28 51 GTTTTGTCCA GTTTTGTGAA AATGTTCTGC TAGTATGAAA GAAAACATTT 
2901 TCTATATGAA GACATTTGTT TTATGTTAGG TAGCTTACAT TTTCTCCTCT 
2951 GCGTGTGTGT GTATGTGTGT AAAATCAGAA ATTTAGCATA CTATGGAAAG 
3001 AAGGCATGGA GCACTTGGGT TTAGAGGAAC CTAAAACATC ATAGCTTCAT 
3051 TGTTCCAGAT GTAACAGGTT TGAAAGAGCT CATCGCCAAG TTCTTGATCC 
3101 ACTTGCATTC CAGGGGAGTT CTCTTTTGAG TAGTATGTTT CTTGTTTGCA 
3151 TGTTCCTGTT CTTTGTGGAA ACTATGCATG GTAGCATTTT TGCTTGCTGT 
32 01 GTTTTCCATA CTTAAGAAAA AGAGGTTTCA GTTGGCTGAT AGAATATCTT 
3251 TTATGTAGGA CAAAACTTTT CTGTGAAGAG TGTTGAGGGG GTGAAGATAG 
3301 GTAAGAGGTA AGCACAATTT TTAATTTAGG CTCTGAAAAA GTGTATTGTT 
3351 CTAAACGTAT TTGGTATGCC TATATAGGTC TTTAAAAATG GGTTTGTATG 
34 01 CTGTTTAATG TGCACTGAAC ATTTTACATT AATATTGTAC TGTTTTACAT 
3451 TAATACTGCA TGCTTTTCTA TGTGAATTGA ATAAAGAATG TCATAAGCAC 
3501 TGGAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry HS658254 from database EMBL: 
human STS SHGC-11774. 
Score = 1643, P = 8.0e-67, identities = 345/355 

Entry HS513217 from database EMBL: 
human STS SHGC-14656. 
Score = 1193, P = 5.8e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 263 bp to 1885 bp; peptide length: 541 
Category: similarity to known protein 



1 MLSHTVRKHL RKTRLELLHK EYEDEIDCLQ KEVEELKSKN LSLESQIKAI 

51 LDPLTLVQGN QNEDKHLVTD NPSIINPETV AEWKKKLRTA NEI YEKVKDD 

101 VDKLKEANKK LKLENGGLVR ENLRLKAEVD NRSPQKEGRF AVAALQSKVE 

151 QYERETNRLK KALERSDKYI EELESQVAQL KNSSEEKEAM NSICQTALSA 

201 DGKGSKGSEE DVVSKNQGDS ARKQPGSSTS SSSHLAKPSS SRLCDTSSAR 

251 QESTSKADLN CSKNKDLYQE QVEVMLDVTD TSMDTYLERE WGNKPSDCVP 

301 YKDEELYDFP APCTPLSLSC LQLSTPENRE SSVVQAGGSK KHSNHLRKLV 

351 FDDFCDSSNV SNKDSSEDDI SRSENEKKSE CFSSTKTGFW DCCSTSYAQN 

401 LDFESSEGNT IANSVGEISS KLSEKSGLCL SKRLNSIRSF EMNRTRTSSE 

451 ASMDAAYLDK ISELDSMMSE SDNSKSPCNN GFKSLDLDGL SKSSQGSEFL 

501 EEPDKLEEKT ELNLSKGSLT NDQLENGSEW KPTSFFSPLS I 

BLASTP hits 

Entry A42771 from database EIR: 

reticulocyte-binding protein 1 - Plasmodium vivax 

Score = 127, P - 3.7e-08, identities = 68/300, positives = 145/300 

Entry RBP1_PLAVB from database SWISSPROT: 
RETICULOCYTE BINDING PROTEIN 1 PRECURSOR. 

Score = 127, P = 3.9e-08, identities = 68/300, positives = 145/300 
Entry MMDSPPG_1 from database TREMBL: 

gene: "DSPP"; product: "dentin sialophosphoprotein"; Mus musculus DSPP 
gene 

Score = 160, P = 5.2e-08, identities = 87/373, positives = 146/373 



Alert BLASTP hits for DKFZphfbr2_62nlO, frame 2 
No Alert BLASTP hits found 
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Report for DKFZphfbr2_62nlO . 2 



[LENGTH] 


541 




[MW] 


60533.06 




[pi] 


5.10 




[FUNCAT] 


04.99 other transcription activities [S. cerevisiae, YKR092c] 3e-05 


[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YKR092c] 3e-05 


[PROSITE] 


LEUCINE ZIPPER 1 




[PROSITE] 


MYRI ST YL 1 




[PROSITE] 


CAMP PHOSPHO_SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


18 


[PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


14 


[PROSITE] 


ASN GLYCOSYLATION 


7 


[KW] 


All Alpha 




[KW] 


LOW COMPLEXITY 9. 


.24 % 


[KW] 


COILED COIL 22 . 


.55 % 



SEQ MLSHTVRKHLRKTRLELLHKEYEDEIDCLQKEVEELKSKNLSLESQIKAILDPLTLVQGN 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhcccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QNEDKHLVTDNPSI INPETVAEWKKKLRTANEI YEKVKDDVDKLKEANKKLKLENGGLVR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccceeeeeccccccccchhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccceee 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ ENLRLKAEVDNRSPQKFGRFAVAALQSKVEQYERETNRLKKALSRSDKYIEELESQVAQL 

SEG 

PRD ehhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KNSSEEKEAMNSICQTALSADGKGSKGSEEDVVSKNQGDSARKQPGSSTSSSSHLAKPSS 

SEG xxxxxxxxxxxxxx 

PRD hcchhhhhhhhhhhhhhhccccccccccceeeeecccccccccccccccccccccccccc 

COILS CCCCCC 

SEQ SRLCDTSSARQESTSKADLNCSKNKDLYQEQVEVMLDVTDTSMDTYLEREWGNKPSDCVP 

SEG x 

PRD ccccccccccccccccccccccccchhhhhhhhhcccccccccchhhhhhhccccccccc 

COILS 

SEQ YKDEELYDFPAPCTPLSLSCLQLSTPENRESSVVQAGGSKKHSNHLRKLVFDDFCDSSNV 

SEG 

PRD cccccccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccc 

COILS 

SEQ SNKDSSEDDISRSENEKKSECFSSTKTGFWDCCSTSYAQNLDFESSEGNTIANSVGEISS 

SEG 

PRD cccccccchhhhhccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ KLSEKSGLCLSKRLNSIRSFEMNRTRTSSEASMDAAYLDKISELDSMMSESDNSKSPCNN 

SEG 

PRD ccccccccchhhhhcccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

COILS 

SEQ GFKSLDLDGLSKSSQGSEFLEEPDKLEEKTELNLSKGSLTNDQLENGSEWKPTSFFSPLS 

SEG . . xxxxxxxxxxxxxxx 

PRD ccccccccccccccccceeecccchhhhhhhhhccccccccccccccccccccccccccc 

COILS 

SEQ I 
SEG 

PRD c 
COILS 



Prosite for DKFZphfbr2_62nlO . 2 

PS00001 40->44 ASN_GLYCOSYLATION PDOC00001 

PS00001 182->186 ASN_GLYCOS YLATION PDOC00001 

PS00001 260->264 ASN GLYCOSYLATION PDOC00001 
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359 


->363 




PDOC0000I 


dc n ft ft ft i 


AA1 
H *± J 


—si 4 / 


7A.QTJ d VmCVT TiTTOM 




roUUUU 1 


J 1J 


— > D 1 1 


7\ cm r"T VfTlCVT ATTHN 
HOW bblLUbl Jjrt 1 1UH 


pnornnnm 


fbUUUUl 


roc; 
j^: o 


'J JU 


new )™t vrricvT aTTriw. 
Hon ijLji^uoi ±jrt.i i uin 


pnornnnm 


rb U UU U*i 


"iA ft 


- >3 4 4 


U.Hl"llr rrlUij r nU j ± 1 Ej 




rbUUUUj 




j o 


fiS.^ f HUornU bl lb 


It L/*J^ UUUUJ 


dc n n ft ft r 

rr b UUUUj 


15 6 


-> 1 5 9 


DVp DU^Qpun CTTIT 






1DD 


-> 1 69 


lri\\j. JrrlvjoirrlU bl 1 £j 


IT LS*JV« UUUUJ 


IrbU UU U J 


99ft 

ZZ U 


->223 


pvr* OWnQPHH CTTIT 
trivia rniJ'blrrHJ jJ. 1 Ej 


pnoonnnns 

r u*jv_» uuuuj 


dc n n n rm 


240 


->2 43 




pnnr* nnnn s 

IT L'v__l^, UUUUJ 


rouuuuj 


Z 4 O 




pvp PHACPHn CTTF 
trr\\— irrnj'b fnu ollL 


p nor* n o fi fi ^ 


rbUUUUj 


254 


->257 


PUT* DHOCPWn CTTIT 


i UVJV, UUUUJ 


IrbU UUU D 


339 


->342 


DI^P PHHCDOn CTTIT 


rlAJLUUUUJ 


DC AAAfm 


361 


->364 


PfcT" DUnQDtin CTTIT 
trt\L- r nUb rrnU bi i & 


r UvJIj UUUUj 


irb UUUUj 


384 


->387 


ovc Duncoun cttit 




dc n n ft rm 

Jrb UUUUj 


419 


->422 


DVi" DHOQPHn QTTTT 


r UUU UUUUj 


rbUUUUj 


423 


->426 


rrHJotrrHJ bl 1 Cj 


d nnr ft n n rm 


rb UUUUj 


431 


->434 


ouc DuncDnri cttit 

rrvl*- rtlUbrtlU bliEj 


p nnr ft ft n n ^ 


r b UUUUj 


436 


->439 


fcrlM— trrHJbCtHJ bl 1 III 


pnnpftft nni 

rUULUUUU J 


dq n nn o £ 

rb UUUUO 


13->17 


CVO PTinQPHH CTTIT 
ui\Z irrilJocnvJ bllil 


tr UKJKj UUUUO 


rauuuu o 


79->83 


PTf5 DUOCDHn CTTIT 
rtlUbrnU ol 1 cj 


rUULUUUU D 


dc nnnnc 


8 


9->93 


Ut\Z irrl'J'btrriU bl 1 ti 




rbU UU U D 


147 


->151 


DUi"i<;Dtin c t t it 

L.P.Z rnUbrflU bl 1 Cj 


onor' n n ft n £ 

r IJU\L.UU UU D 


irb UUUUD 


183 


->187 


CVO DUOCDPH CTTF 


r IJkJUUU UU D 


rbUUUU o 


208 


->212 


PVO DUAQ DUA C T TIT 
L.fS'l rnUbrtlU bllL 


pnorfinnnc 

rUUlvUUUUD 


rbUUUU 0 


255 


->259 


HKO du^c nun c T T IT 


rUUL'J U U U D 


rbUuUub 


281 


->285 


ur\Z rrlvjbrnU bl 1 ti 




dc nnnnc 

rbUUUU D 


285 


->289 


*_P.jC frUJbrrliJ bile* 


r LHJt— UU U U D 


PbUuUUb 


324 


->328 


Ut\Z rnUbrnU bllL 


Dr\rir*ftft ft ft er. 


ybU uu u o 


361 


->365 


HVO DUACDUA CTTIT 

(_r\Z rnUbf MU bl 1 


d nrir ft ft n n £ 


dc finnnc 
rbUUUUo 


365 


->369 


r-uo DU^cpun cttit 
IrrltJblrrlU bllHi 




rbUUUU O 


371 


->375 


CVO DU^QDHA CTTIT 

*—t\Z rnUbrtlU bllH 


r UUUUO 


RbUUUU o 


373 


->377 


nvi DunQDun cttit 


rUJUL/UU u o 


irb U UU U O 


414 


->418 


fVO DUACDUH CTTIT 

t^r\Z trnUbrtlU bllCj 


d n^r 1 n n ft ft ^ 


dc n nn n £ 


447 


->451 


l^l\Z trrlUbtrrlU bl 1 Cj 


Dnnrnfl ft ft K 

fDUL. UUU U 0 


FbUUuuo 


462 


->466 


(^r\Z rnUbfrlU bl 1 Ej 


d n or" n Pi ft ft c 


dc ft rift ft £ 

fbUUUUO 


469 


->473 


cv *y Dune duh cttit 
i,i\Z frtuofnu bl 1 1 


ennr ft ft ft ft £ 


ncnnnm 


294 


->302 


TVD DLTACDUA CTTIT 
1 iK IHUb L J riU bl 1 Cj 


d nnr* ft ft ft ft "7 


rbUUUUO 


204 


->210 


MVR T CTVT 

UlAl J 1 111 




PS00008 


226 


->232 


MYRISTYL 


PDOC00008 


PS00003 


292 


->298 


MVRI5TYL 


PDOC00008 


PS00008 


408- 


->414 


MYRISTYL 


PDOC00008 


P500008 


427- 


->433 


MYRISTYL 


PDOC00008 


PS00008 


489 


->495 


MYRISTYL 


PDOC00008 


PS00008 


517- 


->523 


MYRISTYL 


PDOC00008 


PS00013 


310- 


->321 


PROKAR LIPOPROTEIN 


PDOC0C013 


PS00029 


104- 


->126 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for DKF2phfbr2_62nl0.2) 
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DKFZphfbr2_62ol7 



group: metabolism 

DKFZphfbr2_62ol7 . 2 encodes a novel 282 amino acid protein with weak similarity to the 
apolipoprotein E receptor. 

The new protein contains a leucine zipper for protein-protein interaction, and three LDL- 
receptor class A domain (LDLRA_1) patterns. In LDL-receptors the class A domains form the 
binding site for LDL and calcium. The acidic residues between the fourth and sixth cysteines 
are important for high-affinity binding of positively charged sequences in LDLR's ligands. 

The new protein can find application in modulation of cholesterol binding and transport by 
LDL-receptors and LDL-binding proteins 



similarity to apolipoprotein E receptor 

complete cDNA, complete cds, start at Bp 56 matches kozak consensus 
ANCatg EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 1260 bp 

Poly A stretch at pos. 1240, polyadenylation signal at pos. 1218 



1 GGGGGATAAG AGAGCGGTCT 

51 ACAGCATGAG CGGCGGTTGG 

101 GCTCTGGGCC TGGCGCTGCT 

151 GGCCGCCGCG AGCCCGCTTT 

201 CCAGCTCAGG CTCGTGCCCA 

251 TTATGCGTGC CCCTCACCTG 

301 TGGCAGCGAT GAGGAGGAGT 

351 AATGCCCACC GCCCCCTGGC 

401 TGCTCTGGGG GAACTGACAA 

4 51 CCTAGCAGGC GAGCTCCGTT 

501 CGTGGCGCTG CGACGGCCAC 

551 GGCTGTGGAA CCAATGAGAT 

601 GCCCCCTGTG ACCCTGGAGA 

651 TGGGGCCCCC TGTGACCCTG 

7 01 TCCTCCTCTG CCGGAGACCA 
751 TGCAGCTGCT GCGGTGCTCA 

8 01 TTTTGTCCTG GCTCCGAGCC 
851 GTGGCCATGA AGGAGTCCCT 
901 CTGAGGACAA GCACTTGCCA 
951 ACAGGAGGAG AGCAGTGATG 

1001 GAGACCTGAG CTCTTCTGGC 

1051 GAAGTGGCCC TGGAGATTGA 

1101 GGGAGCTAGG ATGGGGAACC 

1151 AGGCAGCTCC CAGGGGGTAG 

1201 CCCCGTCTGA GGGTGGCGAT 

1251 AAAAAAAAAC 



GGACAGCGCG TGGCCGGCGC CGCTGTGGGG 
ATGGCGCAGG TTGGAGCGTG GCGAACAGGG 
GCTGCTGCTC GGCCTCGGAC TAGGCCTGGA 
CCACCCCGAC CTCTGCCCAG GCCGCAGGCC 
CCCACCAAGT TCCAGTGCCG CACCAGTGGC 
GCGCTGCGAC AGGGACTTGG ACTGCAGCGA 
GCAGGATTGA GCCATGTACC CAGAAAGGGC 
CTCCCCTGCC CCTGCACCGG CGTCAGTGAC 
GAAACTGCGC AACTGCAGCC GCCTGGCCTG 
GCACGCTGAG CGATGACTGC ATTCCACTCA 
CCAGACTGTC CCGACTCCAG CGACGAGCTC 
CCTCCCGGAA GGGGATGCCA CAACCATGGG 
GCGTCACCTC TCTCAGGAAT GCCACAACCA 
GAGAGTGTCC CCTCTGTCGG GAATGCCACA 
GTCTGGAAGC CCAACTGCCT ATGGGGTTAT 
GTGCAAGCCT GGTCACCGCC ACCCTCCTCC 
CAGGAGCGCC TCCGCCCACT GGGGTTACTG 
GCTGCTGTCA GAACAGAAGA CCTCGCTGCC 
CCACCGTCAC TCAC-CCCTGG GCGTAGCCGG 
CGGATGGGTA CCCGGGCACA CCAGCCCTCA 
CACGTGGAAC CTCGAACCCG AGCTCCTGCA 
GGGTCCCTGG ACACTCCCTA TGGAGATCCG 
TGCCACAGCC AGAACCGAGG GGCTGGCCCC 
GACGGCCCTG TGCTTAAGAC ACTCCTGCTG 
TAAAGTTGCT TCACATCCTC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 56 bp to 901 bp; peptide length: 282 

Category: similarity to known protein 

Classification: unset 

Prosite motifs: LDLRA_1 (67-90) 

LDLRA_1 (67-90) 

LDLRA 1 (145-168) 
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LEOCINE_ZIPPER (17-39) 



1 MSGGWMAQVG AWRTGALGLA LLLLLGLGLG LEAAASPLST PTSAQAAGPS 
51 SGSCPPTKFQ CRTSGLCVPL TWRCDRDLDC SDGSDEEECR IEPCTQKGQC 
101 PPPPGLPCPC TGVSDCSGGT DKKLRNCSRL ACLAGELRCT LSDDCIPLTW 
151 RCDGHPDCPD SSDELGCGTN EILPEGDATT MGPPVTLESV TSLRNATTMG 
201 PPVTLESVPS VGNATSSSAG DQSGSPTAYG VIAAAAVLSA SLVTATLLLL 
251 SWLRAQERLR PLGLLVAMKE SLLLSEQKTS LP 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_62ol7, frame 2 

TREMBL:AF110520__6 product: "NG29"; Mus musculus major 
histocompatibility complex region NG27, NG28, RPS28, NADH 
oxidoreductase, NG29, KIFC1, Fas-binding protein, B1NG1, tapasin, 
RalGDS-like, KE2, BING4, beta 1, 3-galactosyl transferase, and RPS18 
genes, complete cds; Sacm21 gene, partial cds; and unknown gene., N = 
1, Score = 733, P = 1.5e-72 

PIR:JE0237 apolipoprotein E receptor 2 precursor - mouse, N = 2, Score 
= 290, P = l.le-26 

TREMBL:HSZ75190_1 product: "apolipoprotein E receptor 2 906"; 

H. sapiens mRNA for apolipoprotein E receptor 2, N = 1, Score = 279, P = 

I. 8e-23 



>TREMBL:AF110520_6 product: "NG29"; Mus musculus major histocompatibility 
complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, 
Fas-binding protein, BING1, tapasin, RalGDS-like, KE2, BING4 , beta 
1, 3-galactosyl transferase, and RPS18 genes, complete cds; Sacm21 gene, 
partial cds; and unknown gene. 
Length = 260 



HSPs: 



Score = 733 (110.0 bits), Expect - 1.5e-72, P = 1.5e-72 
Identities = 157/276 (56%), Positives = 178/276 (64%) 



Query : 


6 


MAQVGAWRTGALGLALLLLLGLGLGLEAAASPLST PTSAQAAGPS SGSCPPTKFQCRTSG 


65 






MA+ GA R ALGL L LL GL GLEAA +P T Q +G + SCP FQC TSG 




Sbjct : 


1 


MARGGAGRAVALGLVLRLLFGLRTGLEAAPAPAHT — RVQVSGSRADSCPTDTFQCLTSG 


58 


Query : 


66 


LCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGTDKKLR 


125 






CVPL+WRCD D DCSDGSDEE+CP.IE C Q GQC P LPC C +S CS +DK L 




Sbjct: 


59 


YCVFLSWRCDGDQDCSDGSDEEDCRIESCAQNGQCQPQSALPCSCDNISGCSDVSDKNL- 


117 


Query: 


126 


NCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDSSDELGCGTNEILPEGDATTMGPPV 


185 






NCSR C EL C L D CIP TWRCDGHPDC DSSDEL C T+ 




Sbjct: 


118 


NCSRPPCQESELHCILDDVCIPHTWRCDGHPDCLDSSDELSCDTD T 


163 


Query: 


186 


TLESVTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSASLVTA 


245 






++ + NATT T+E+ S NT +SAGD S +P+AYGVIAAA VLSA LV+A 




Sbjct: 


164 


EIDKI FQEENATTTRI STTMENETSFRNVTFTSAGDSSRNPSAYGVIAAAGVLSAILVSA 


223 


Query : 


246 


TLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSL 281 








TLL+L LR Q LP GLLVA+KESLLLSE+KTSL 




Sbjct : 


224 


TLLILLRLRGQGYLPPPGLLVAVKESLLLSERKTSL 259 





Pedant information for DKFZphf br2_62ol7 , frame 2 



Report for DKFZphf br2_62ol7 . 2 



[LENGTH] 282 

[MW] 28991.19 

[pi] 4.61 

[HOMOL] TREMBL:AF110520_6 product: "NG29"; Mus musculus major histocompatibility 

complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, Fas-binding protein, 
BING1, tapasin, RalGDS-like, KE2, BING4, beta 1, 3-galactosyl transferase, and RPS18 genes, 
complete cds; Sacm21 gene, partial cds; and unknown gene. 5e-55 
[BLOCKS] BL01209 LDL-receptor class A (LDLRA) domain proteins 

[SCOP] dlajj 7.11.1.1.1 Ligand-binding domain of low-density lipoprotei 2e-10 
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f PTRKWl 

|_ C -L 1\1\ WI J 


rprpntOT" T p-1 Q 


[ PIRKW] 


glycoprotein 1 e — 19 


[ P I RKW ] 


1 ipid transport 4 e- 1 8 


[ PIRKW ] 


LDL 5e-14 


[ PIRKW] 


calcium binding 6e — 18 


[ PI RKW] 


extracellular protein 6e — 13 


[ PIRKW] 


a 1 t" o v ti a t~ ~i t/p cnl i ri 1 p — 1 Q 


[ PIRKW] 


extracellular ma trix 3 e — 10 


[ PI RKW 1 


chondroitin sulfate proteoglycan 2e — 12 


r PT RPTW 1 


t~\~\ & o -f* d T - /~\ i ^i p — 1 ft 




1 pnri nf-ri rh 1 ri h a — ") — fil urnnrotp i n rprvpat- K oron 1 nnu 1 p- 

1CUL111C I cl _L 1 I Cl Z_ yiyi,upLULC±ll -L C^CG L 11 Ul L I t_J _L U y y J_ ~ 


[ our i nil J 


T.flT. rprpnt"rti" YWT D-rAnt^i ni nrr tpiip^i t" h omfi 1 nrrv 1 p-1 Q 

±J LJAj L C LCp L Ul 1 Ml U ^<vLl LUllllW^ 1 CyCQ L 11 11 '^J _1_ ' y J J. C _L ^ 


t SUPFAM] 


trypsin homology 6e — 13 


f QHPFAMl 
lourc Mil j 


s 1 nha-9-inai">*riAl /^Vm 1 1 i ri KP^onfrti* ftp — 1 ft 

d X LJ 1 1 Ct i— IlLcL LUHlviJLllXLl X- tz: C; \J L. J_ v> C; AO 


I O U C C ni l J 


T TIT rprcnhnr 1 p— 1 Q 


[ SUPFAM] 


LDL receptor 1 igand-bindi nc[ repeat homology le- 19 


[SUPFAM] 


EGF homology le-19 


[PROSITE] 


LDLRA 1 3 


[PROSITE] 


LEUCINE_ZIPPER 1 


[PFAM] 


Low-density lipoprotein receptor domain class A 


[PFAM] 


TNFR/NGFR cysteine-rich region 


[KH] 


SIGNAL PEPTIDE 31 


[KW] 


TRANSMEMBRANE 1 


[KW] 


LOW COMPLEXITY 22.34 % 



SEQ MSGGWMAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQ 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccceee 

MEM 

SEQ CRTSGLCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGT 

SEG xxxxxxxxxxx 

PRD ecccccceeeeecccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ DKKLRNCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDSSDELGCGTNEILPEGDATT 

SEG 

PRD cccccccccccccccceeeccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ MGPPVTLESVTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSA 

SEG xxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

MEM MMMMMMM 

SEQ SLVTATLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSLP 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhcccccc 

MEM MMMMMMMMMM 



Prosite for DKFZphfbr2_62ol7 . 2 



PS01209 
PS01209 
PS01209 
ES00029 



67->90 
67->90 
145->168 
17->39 



LDLRA_1 
LDLRA_1 
LDLRA_1 

LEUCINE ZIPPER 



PDOC00929 
PDOC00929 
PDOC00929 
PDOC00029 



Pfam for DKFZphfbr2_62ol7 . 2 



HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeGtYtD. WNHvpqClpC . trCePEMGQYMvqPCTwTQNT . VC* 

CP+ ++ + + C+P RC+ ++ +c + ++ +c 

Query 54 CPPTKFQCRTS — GLCVPLTWRCDR — DL DCSDGSDEEEC 
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HMM_NAME Low-density lipoprotein receptor domain class A 

HMM *tTCeGPDEFQCgSGeMRCIPMsWvCDGDpDCeDWSDEWPeNChp* 

C P +FQC+++ C+P+ W+CD D DC D+SDE E+C+ 
Query 52 GSCP-PTKFQCRTSG-LCVPLTWRCDRDLDCSDGSDE — EECRI 91 

54.99 (bits) f: 130 t: 169 Target: dkf zphfbr2_62ol7 . 2 similarity to apolipoprotein E 
receptor 

Alignment to HMM consensus: 
Query * tTCeGPDEFQCgSGeMRCI PMsWvCDGDpDCeDWSDEWPeNChp* 

C + E +C + CIP+ W+CDG PDC D SDE ++C+ 

dkfzphfbr2 130 LACL-AGELRCTLSD-DCI PLTWRCDGHPDCPDSSDE— LGCGT 169 
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DKFZphfbr2_64al5 

group: nucleic acid management 

DKFZphfbr2_64al5 encodes a novel 255 amino acid protein with strong similarity to inorganic 
pyrophosphatases 

Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the enzyme responsible for the hydrolysis of 
pyrophosphate (PPi) which is formed as the product of the many biosynthetic reactions that 
utilize ATP. All known PPases require the presence of divalent metal cations, with magnesium 
conferring the highest activity. 

The new protein can find application as a new enzyme for biotechnologic processes. 

strong similarity to inorganic pyrophosphatases 

unspliced Intron 212-256 see EST HS1190948 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1188 bp 

Poly A stretch at pos . 1170, polyadenylation signal at pos . 1151 

1 GGGGGTTGGG GACCAGTGCA GGGACCGGGT CGCGCCGTGC TATGGCCCTG 
51 TACCACACTG AGGAGCGCGG CCAGCCCTGC TCGCAGAATT ACCGCCTCTT 

101 CTTTAAGAAT GTAACTGGTC ACTACATTTC CCCCTTICAT GATATTCCTC 

151 TGAAGGTGAA CTCTAAAGAG GACACTGAGG CTCAAGGCAT TTTTATAGAC 

2 01 TTGTCTAAGA TCTGGAAAAT GGCATTCCTA TGAAGAAAGC ACGAAATGAT 

251 GAATATGAGA ATCTGTTTAA TATGATTGTA GAAATACCTC GGTGGACAAA 

301 GGCTAAAATG GAGATTGCCA CCAAGGAGCC AATGAATCCC ATTAAACAAT 

351 ATGTAAAGGA TGGAAAGCTA CGCTATGTGG CGAATATCTT CCCTTACAAG 

4 01 GGTTATATAT GGAATTATGG TACCCTCCCT CAGACTTGGG AAGATCCCCA 

451 TGAAAAAGAT AAGAGCACGA ACTGCTTTGG AGATAATGAT CCTATTGATG 

501 TTTGCGAAAT AGGCTCAAAG ATTCTTTCTT GTGGAGAAGT TATTCATGTG 

551 AAGATCCTTG GAATTTTGGC TCTTATTGAT GAAGGTGAAA CAGATTGGAA 

601 ATTAATTGCT ATCAATGCGA ATGATCCTGA AGCCTCAAAG TTTCATGATA 

651 TTGATGATGT TAAGAAGTTC AAACCGGGTT ACCTGGAAGC TACTCTTAAT 

701 TGGTTTAGAT TATGTAAGGT ACCAGATGGA AAACCAGAAA ACCAGTTTGC 

7 51 TTTTAATGGA GAATTCAAAA ACAAGGCTTT TGCTCTTGAA GTTATTAAAT 

801 CCACTCATCA ATGTTGGAAA GCATTGCTTA TGAAGAACTG TAATGGAGGA 

851 GCTACAAATT GCACAAACGT GCAGATATCT GATAGCCCTT TCCGTTGCAC 

901 TCAAGAGGAA GCAAGATCAT TAGTTGAATC GGTATCATCT TCACCAAATA 

951 AAGAAAGTAA TGAAGAAGAG CAAGTGTGGC ACTTCCTTGG CAAGTGATTG 
1001 AAACATCTGA AATTCTGCTG TCAAGATTCC CATCTCTAAG GACTCCAAGA 
1051 CTCTTTTTCC CCAAGTGCTA GAGACAAGGG GGTCTATGAG CATTTACTGA 
1101 CTTCCTGTTA AAACTTCATT TTTTCAAACT TTTTGAGCTA TGCAATATAT 
1151 AAATAAACAG TAAGAATTTT AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry HSPPASEMR from database EMBL: 

H. sapiens partial mRNA for pyrophosphatase. 

Score = 1706, P = 1.6e-70, identities » 342/343 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 230 bp to 994 bp; peptide length: 255 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: PPASE (85-92) 
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1 MKKARNDEYE NLFNMIVEIP RWTKAKMEIA TKEPMNPIKQ YVKDGKLRYV 
51 ANIFPYKGYI WNYGTLPOTW EDPHEKDKST NCFGDNDPID VCEIGSKILS 
101 CGEVIHVKIL GILALIDEGE TDWKLIAINA NDPEASKFHD I DDVKKFKPG 
151 YLEATLNWFR LCKVPDGKPE NQFAFNGEFK NKAFALEVIK STHQCWKALL 
201 MKNCNGGATN CTNVQISDSP FRCTQEEARS LVESVSSSPN KESNEEEQVW 
251 HFLGK 

BLASTP hits 

Entry IPYR_KLULA from database SWISSPROT: 

INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE PHOSPHO- 
HYDROLASE) (PPASE) . 

Score = 689, P = 6.0e-68, identities = 128/248, positives = 170/248 
Entry A45153 from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - bovine 

Score = 862, P = 2.8e-86, identities = 146/226, positives = 190/226 
Entry AF085600_1 from database TREMBLNEW: 

gene: "Nurf-38"; product: "inorganic pyrophosphatase NURF-38"; 
Drosophila melanogaster inorganic pyrophosphatase NURF-38 (Nurf-38) 
gene, complete cds . 

Score = 731, P = 2.1e-72, identities = 134/248, positives = 177/248 
Entry PWBY from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - yeast (Saccharomyces 
cerevisiae) 

Score - 688, P - 7.7e-68, identities - 133/251, positives = 174/251 



Alert BLASTP hits for DKFZphfbr2_64al5, frame 2 

SWISSPROT : IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N = 1, Score = 731, P = 
2. 4e-72 



>SWISSPR0T:IPYR_DROMB INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE) . 
Length = 290 

HSPs : 



Score = 731 (109.7 bits), Expect = 2.4e-72, P = 2.4e-72 
Identities - 134/248 (54%), Positives - 177/248 (71%) 



Query: 


7 


DEYENLFNMI VEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANI FPYKGYIWNYGTL 


66 






+E + ++NM+VE+PRWT AKMEI+ K PMNPIKQ +K GKLR+VAN FP+KGYIWNYG L 




Sbjct: 


40 


NEEKTI YNMVVEVPRWTNAKMEISLKTPMNPI KQDIKKGKLRFVANCFPHKGYIWNYGAL 


99 


Query: 


67 


PQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGETDWKLI 


126 






PQTWE+P + ST C GDNDPIDV EIG ++ G+V+ VK+LG ALIDEGETDWK+ I 




Sbjct: 


100 


PQTWENPDHIEPSTGCKGDNDPIDVIEIGYRVAKRGDVLKVKVLGQFALIDEGETDWKII 


159 


Query: 


127 


AINANDPEASKFHD I DDVKKFKPG YLEATLNWFRLCKVPDGKPENQFAFNGEFKNKAFAL 


186 






AI+ NDP ASK +DI DV ++ PG L AT+ WF++ K+PDGKPENQFAFNG+ KN FA 




Sbjct: 


160 


AI DVNDPLASKVNDI ADVDQYFPGLLRATVEWFKI YKIPDGKPENQFAFNGDAKNADFAN 


219 


Query: 


187 


EVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARS-LVESVSSSPNKESNE 


245 






+1 TH+ W+ L+ ++ G+ + TN+ +S +EEA L E+ +E ++ 




Sbjct: 


220 


TIIAETHKFWQNLVHQSPASGSISTTMITNRNSEHVIPKEEAEKILAEAPDGGQVEEVSD 


279 


Query: 


246 


EEQVWHFL 253 








WHF+ 




Sbjct: 


280 


TVDTWHFI 287 





Peptide information for frame 3 



ORF from 42 bp to 230 bp; peptide length: 63 
Category: strong similarity to known protein 
Classification: unset 



1 MALYHTEERG QPCSQNYRLF FKNVTGHYIS PFHDIPLKVN SKEDTEAQGI 
51 FIDLSKIWKM AFL 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64al5, frame 3 

SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N = 1, Score = 118, P = 
8.8e-07 

PIR:A45153 inorganic pyrophosphatase (EC 3.6.1.1) - bovine, N = 1, 
Score = 113, P = 3.1e-06 

TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; 
Homo sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds . , N 
= 1, Score = 106, P = 1.8e-05 



>SWISSPROT:IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE). 
Length = 290 

HSPs: 



Score = 118 (17.7 bits), Expect = 8.8e-07, P = 8.8e-07 
Identities = 23/43 (53%), Positives = 29/43 (67%) 

Query: 1 MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKE 43 

MALY T E+G S +Y L+FKN G+ ISP HDIPL N ++ 
Sbjct: 1 MALYETVEKGAKNSPSYSLYFKNKCGNVISPMHDIPLYANEEK 43 

Pedant information for DKFZphfbr2_64al5, frame 2 



Report for DKFZphf br2_64al5 . 2 

[LENGTH] 255 

[MW] 29177.34 

[pi] 5.67 

[HOMOL] TREMBLNEW: AF108211_1 product: "cytosolic inorganic pyrophosphatase"; Homo 

sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds. 2e-93 

[FUNCAT] 01.04.01 phosphate utilization [S. cerevisiae, YBROllc) 9e-73 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YBROllc] 9e-73 

[ FUNCAT] 02.99 other energy generation activities [S. cerevisiae, YMR267w] le-58 

[FUNCAT] 30.16 mitochondrial organization [S. cerevisiae, YMR267w) le-58 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 

genitalium, MG351] le-06 

[FUNCAT] g carbohydrate metabolism and transport [H. influenzae, HI0124] 2e-06 

[BLOCKS] BL00387D 
[BLOCKS] BL00387C 
[BLOCKS] BL00387B 
[BLOCKS] BL00387A 

[SCOP] dlwgja_ 2.29.5.1.1 Inorganic pyrophosphatase [baker's yeas le-113 

[EC] 3.6.1.1 Inorganic pyrophosphatase 7e-92 

[PIRKW] mitochondrion 3e-57 

[PIRKW] hydrolase 7e-92 

[PIRKW] homodimer 2e-71 

[SUPFAM] inorganic pyrophosphatase 7e-92 

[PROSITE] PPASE 1 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.27 % 

SEQ MKKARNDEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANIFPYKGYI 

SEG 

lhukB EGGGCEEEEEEEETTTbCBCEEETTTTTTTCEEECEETTEECBCCBBTTBTTbT 



SEQ WNYGTLPQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGE 

SEG 

lhukB CEEEETTTTCBTTTTEETTTTEECCCBCCEEEECCCCCCTTTEEEEEEEEEEEEETTTTB 

SEQ TDWKLIAINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFK 

SEG 

lhukB CEEEEEEEETTTTTGGGCCCHHHHHHHTTTHHHHHHHHHHHHCGGGCCCCCCBCGGGCCB 

SEQ NKAFALEVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARSLVESVSSSPN 

SEG xxxxxxxxx 

lhukB CHHHHHHHHHHHHHHHHHHHHCTTTTTTTCCCBTTTTTTT 
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SEQ KESNEEEQVWHFLGK 

SEG xxxxxxx 

lhukB 



Prosite for DKFZphf br2_64al5 . 2 
PS00387 85->92 PPASE PDOC00325 

(No Pfam data available for DKFZphfbr2_64al5 . 2) 

Pedant information for DKFZphfbr2_64al5, frame 3 

Report for DKFZphf br2_64al5 . 3 

[LENGTH] 63 

[MW] 7405.54 

[pi] 6.81 

[HOMOL] SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 

PHOSPHO- HYDROLASE) (PPASE) . le-06 

[EC] 3.6.1.1 Inorganic pyrophosphatase 5e-06 

[PIRKW] hydrolase 5e-06 

[SUPFAM] inorganic pyrophosphatase 5e-06 

[KW] All_Beta 



SEQ MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLXVNSKSDTEAQGIFIDLSKIWKM 

PRD cccccccccccccccceeeeeecccccccccccccccccccccccccceeeechhhhhhh 

SEQ AFL 

PRD CCC 



(No Prosite data available for DKFZphfbr2_64al5 . 3) 
(No Pfam data available for DKFZphfbr2_64al5 . 3) 
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DKFZphfbr2_64cl6 



group: brain derived 

DKFZphf br2_64al6 . 2 encodes a novel 101 amino acid protein without similarity to known 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by Qiagen 

Locus: /map="7 45_A_2; 756_F_2; 842_C_2" 
Insert length: 1866 bp 

Poly A stretch at pos . 1848, polyadenylation signal at pos . 1829 

1 GGGCGCGGCG CCGGAGGAGG AAGTGGTGAG GTTGTTGCTC CTTCAGCGCC 
51 TATCGCTGGC TCTTGGGGCG CAGAGAGGGG CCGCAGTCTC CGCGGCTGCG 

101 TCGAGCTCCC TTGCAGTCCC CTCCATGTTC CCCGGCGCCA CTACTCCCCT 

151 TCCTAAGGCC GCCGCTTACC CCGGGGTCTA TGGAAGTAAT GGAAGGACCC 

201 CTCAACCTGG CTCATCAACA GAGCAGACGA GCAGACCGTT TATTAGCTGC 

251 AGGCAAATAC GAAGAGGCTA TTTCTTGTCA CAAAAAGGCT GCAGCATATC 

301 TTTCTGAAGC CATGAAGCTG ACACAGTCAG AGCAGGCTCA TCTTTCACTG 

351 GAATTGCAAA GGGATAGCCA TATGAAACAG CTCCTCCTCA TCCAAGAGAG 

401 ATGGAAAAGG GCCCAGCGTG AAGAAAGATT GAAAGCCCAG CAGAACACAG 

451 ACAAGGATGC AGCTGCCCAT CTTCAGACAT CTCACAAACC CTCTGCAGAG 

501 GATGCAGAGG GCCAGAGTCC CCTTTCTCAG AAGTACAGCC CTTCCACAGA 

551 GAAATGCCTG CCTGAGATTC AGGGGATCTT TGACAGGGAT CCAGACACAC 

601 TACTTTATTT ACTTCAGCAA AAGAGTGAGC CAGCAGAGCC ATGTATTGGA 

651 AGCAAAGCCC CAAAAGATGA TAAAACAATT ATAGAGGAGC AGGCAACCAA 

701 AATTGCAGAT TTGAAGAGGC ATGTGGAATT CCTTGTGGCT GAGAATGAAA 

751 GATTAAGGAA AGAAAATAAA CAACTAAAGG CTGAAAAGGC CAGACTTCTA 

801 AAAGGTCCAA TAGAAAAGGA GCTGGATGTA GATGCTGATT TTGTAGAAAC 

851 GTCAGAGTTA TGGAGCTTGC CACCACATGC AGAAACTGCT ACAGCCTCCT 

901 CAACCTGGCA GAAGTTCGCA GCAAATACTG GGAAAGCCAA GGACATTCCA 

951 ATCCCCAATC TTCCTCCCTT GGATTTTCCA TCTCCAGAAC TTCCTCTTAT 
1001 GGAGCTCTCT GAGGATATTC TGAAAGGACT TATGAATAAT TAAAATGGAA 
1051 GGCCACAGAA AAGGGGAAAA GAGGAAATAA TACAGTAATC GTTAATCCAG 
1101 CAAAAAGAAA TGAAAAGGGA AAACCACATA GAAGGGTAAT CCCGGAAATG 
1151 CTTCATCTGG TGGACTGTGG GAGCAGAGGC ATTGCCAGGA CTTGGGAAAC 
1201 AGTCACTGTG AAATGCGCTG CGTATCTCAT TCACTCACTT CAGCTAATGA 
1251 CTCCGACTTG GCAGACGCTA AACTCATGGA GGTTCGGTTT CTCCTGATAC 
1301 AAACCAAATG GCTACCTGGA AGAATTTCTT TCAAGCAACA GTTATTTTTC 
1351 TTATCTTCAG GGTTAAAATG TATAAAAGTT ATGTGTAATT AATCTATAAT 
1401 GCCATAAATG ATAATGCAAA ACCTAAATAA TATGGTGGCC GGAGGGGCTG 
1451 CCTTATATTT GAAACATGCT TTCTATCATG CATTGACTGT ATGCATTTTG 
1501 TTAATGCACA TTCTGTTTGT TTAAGGTGTG TGAGATACAC ACCTTTCTAG 
1551 ATGAAACTAT ATGTGCCACA CTTTGCACTA CTCATAATGA TAACCTCAAG 
1601 ACTATCAGAA GAAATATTTA AATTTCCATT TTATGAAGAA AGGAACCAAA 
1651 TTATTATGCT TTTTAAAACA AATTACCAGT TTACATAATT AATCAGGGTG 
1701 CATTTTAAGT TCTAACTTCG TTTATTGTAT AATGCATCAT TTGAAAATAC 
1751 CAAGGAGGAA ATACCCTTTG TTTTTAATGA TGCAAGAGTG GACGTAATGC 
1801 TAGTTGGCAG TATTTTATTG TAAGAAATCA ATAAAGTAAT TGTGTTTTAA 
1851 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS286143 from database EMBL: 
human STS WI-6844. 
Score = 1460, P = 3.4e-61, identities = 292/292 



Medline entries 



No Medline entry 
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Peptide information for. frame 2 



ORF from the beginning to 304 bp; peptide length: 102 
Category: questionable ORF 
Classification: unset 



1 GAAPEEEVVR LLLLQRLSLA LGAQRGAAVS AAASSSLAVP SMFPGATTPL 
51 PKAAAYPGVY GSNGRTPQPG SSTEQTSRPF I SCRQIRRGY FLSQKGCSIS 
101 F 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_64cl6, frame 2 



No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 180 bp to 1040 bp; peptide length: 287 
Category: putative protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (178-200) 
LEUCINE_ZIPPER (185-207) 



1 MEVMEGPLNL AHQQSRRADR LLAAGKYEEA ISCHKKAAAY LSEAMKLTQS 

51 EQAHLSLELQ RDSHMKQLLL IQERWKRAQR EERLKAQQNT DKDAAAHLQT 

101 SHKPSAEDAE GQSPLSQKYS PSTEKCLPEI QGIFDRDPDT LLYLLQQKSE 

151 PAEPCIGSKA PKDDKTIIEE QATKIADLKR HVEFLVAENE RLRKENKQLK 

201 AEKARLLKGP IEKELDVDAD FVETSELWSL PPHAETATAS STWQKFAANT 

251 GKAKDIPIPN LPPLDFPSPE LPLMELSEDI LKGLMNN 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_64cl6, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphf br2_64cl 6, frame 2 



Report for DKFZphfbr2_64cl6. 2 



[LENGTH] 101 

[MW] 10469.94 

[pi] 10.18 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 29.70 % 



SEQ GAAPEEEVVRLLLLQRLSLALGAQRGAAVSAAASSSLAVPSMFPGATTPLPKAAAYPGVY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ GSNGRTPQPGSSTEQTSRPFISCRQIRRGYFLSQKGCSISF 

SEG 

PRD ccccccccccccccccccccchhhhhccccccccccccccc 



(No Prosite data available for DKFZphfbr2_64cl6 . 2) 
(No Pfam data available for DKFZphfbr2_64cl6.2) 



Pedant information for DKFZphf br2_64cl 6, frame 3 
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Report for DKFZphfbr2_64cl6.3 



[LENGTH] 287 

[MW] 32343.79 

[pi] 5.61 

[PROSITE] LEUCINE_ZIPPER 2 

[KW] All_Alpha 

[KW] COILED_COIL 14.98 % 



SEQ MEVMEGPLNLAHQQSRRADRLLAAGKYEEAISCHKKAAAYLSEAMKLTQSEQAHLSLELQ 

PRD ccccchhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ RDSHMKQLLLIQERWKRAQREERLKAQQNTDKDAAAHLQTSHKPSAEDAEGQSPLSQKYS 

PRD hhcchhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccccccccccccccc 

COILS 

SEQ PSTEKCLPEIQGI FDRDPDTLLYLLQQKSEPAEPCIGSKAPKDDKTI IEEQATKIADLKR 

PRD cccccccchhhhhcccccchhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCC 

SEQ HVEFLVAENERLRKENKQLKAEKARLLKGPIEKELDVDADFVETSELWSLPPHAETATAS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ STWQKFAANTGKAKDIPI PNLPPLDFPSPELPLMELSEDILKGLMNN 

PRD hhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhcce 

COILS 



Prosite for DKFZphfbr2_64cl6.3 

PS00029 178->200 LEUCINE_ZI PPER PDOC00029 

PS00029 185->207 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphfbr2_64cl6 .3) 
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DKFZphfbr2_64c4 



group: brain derived 

DKFZphfbr2_64c4 encodes a novel 467 amino acid protein with similarity to A. thaliana T08I13.5 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A. thaliana T08I13.5 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC005043 11 exons 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1559 bp 

Poly A stretch at pos. 1540, no polyadenylation signal found 



1 TGGGACCGCC GGAAGTTTCT GCCGCGGCTT TGCGGGGACG GGGGAGTGGT 

51 AGTGGGGGCT GCAGCTGCCG GACCCAGGCG CGATGGCTAC GGGCGCGGAT 

101 GTACGGGACA TTCTAGAACT CGGGGGTCCA GAAGGGGATG CAGCCTCTGG 

151 GACCATCAGC AAGAAGGACA TTATCAACCC GGACAAGAAA AAATCCAAGA 

201 AGTCCTCTGA GACACTGACT TTCAAGAGGC CCGAGGGCAT GCACCGGGAA 

251 GTCTATGCCT TGCTCTACTC TGACAAGAAG GATGCACCCC CACTGCTACC 

301 CAGTGACACT GGCCAGGGAT ACCGTACAGT GAAGGCCAAG TTGGGCTCCA 

351 AGAAGGTGCG GCCTTGGAAG TGGATGCCAT TCACCAACCC GGCCCGCAAG 

401 GACGGAGCAA TGTTCTTCCA CTGGCGACGT GCAGCGGAGG AGGGCAAGGA 

451 CTACCCCTTT GCCAGGTTCA ATAAGACTGT GCAGGAGCCT GTGTACTCGG 

501 AGCAGGAGTA CCAGCTTTAT CTCCACGATA ATGCTTGGAC TAAGGCAGAA 

551 ACTGACCACC TCTTTGACCT CAGCCGCCGC TTTGACCTGC GTTTTGTTGT 

601 TATCCATGAC CGGTATGACC ACCAGCAGTT CAAGAAGCGT TCTGTGGAAG 

651 ACCTGAAGGA GCGGTACTAC CACATCTGTG CTAAGCTTGC CAACGTGCGG 

701 GCTGTGCCAG GCACAGACCT TAAGATACCA GTATTTGATG CTGGGCACGA 

751 ACGACGGCGG AAGGAACAGC TTGAGCGTCT CTACAACCGG ACCCCAGAGC 

801 AGGTGGCAGA GGAGGAGTAC CTGCTACAGG AGCTGCGCAA GATTGAGGCC 

851 CGGAAGAAGG AGCGGGAGAA ACGCAGCCAG GACCTGCAGA AGCTGATCAC 

901 AGCGGCAGAC ACCACTGCAG AGCAGCGGCG CACGGAACGC AAGGCCCCCA 

951 AAAAGAAGCT ACCCCAGAAA AAGGAGGCTG AGAAGCCGGC TGTTCCTGAG 

1001 ACTGCAGGCA TCAAGTTTCC AGACTTCAAG TCTGCAGGTG TCACGCTGCG 

1051 GAGCCAACGG ATGAAGCTGC CAAGCTCTGT GGGACAGAAG AAGATCAAGG 

1101 CCCTGGAACA GATGCTGCTG GAGCTTGGTG TGGAGCTGAG CCCGACACCT 

1151 ACGGAGGAGC TGGTGCACAT GTTCAATGAG CTGCGAAGCG ACCTGGTGCT 

1201 GCTCTACGAG CTCAAGCAGG CCTGTGCCAA CTGCGAGTAT GAGCTGCAGA 

1251 TGCTGCGGCA CCGTCATGAG GCACTGGCCC GGGCTGGTGT GCT AGGGGGC 

1301 CCTGCCACAC CAGCATCAGG CCCAGGCCCG GCCTCTGCTG AGCCGGCAGT 

1351 GTCTGAACCC GGACTTGGTC CTGACCCCAA GGACACCATC ATTGATGTGG 

1401 TGGGCGCACC CCTCACGCCC AATTCGAGAA AGCGACGGGA GTCGGCCTCC 

1451 AGCTCATCTT CCGTGAAGAA AGCCAAGAAG CCGTGAGAGG CCCCACGGGG 

1501 TGTGGGCGAC GCTGTTATGT AAATAGAGCT GCTGAGTTGG AAAAAAAAAA 

1551 AAAAAAAAA 



BLAST Results 



Entry AC005043 from database EMBL: 

Homo sapiens clone NH0576N21; HTGS phase 1, 5 unordered pieces. 
Score = 1506, P = 4.6e-244, identities = 316/330 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 83 bp to 1483 bp; peptide length: 467 
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Category: similarity to unknown protein 



1 MATGADVRDI LELGGPEGDA ASGTISKKDI INPDKKKSKK SSETLTFKRP 
51 EGMHREVYAL LYSDKKDAPP LLPSDTGQGY RTVKAKLGSK KVRPWKWMPF 
101 TNPARKDGAM FFHWRRAAEE GKDYPFARFN KTVQEPVYSE QEYQLYLHDN 
151 AWTKAETDHL FDLSRRFDLR FVVIHDRYDH QQFKKRSVED LKERYYHICA 
201 KLANVRAVPG TDLKI PVFDA GHERRRKEQL ERLYNRTPEQ VAEEEYLLQE 
251 LRKIEARKKE REKRSQDLQK LITAADTTAE QRRTERKAPK KKLPQKKEAE 
301 KPAVPETAGI KFPDFKSAGV TLRSQRMKLP SSVGQKKIKA LEOMLLELGV 
351 ELSPTPTEEL VHMFNELRSD LVLLYELKQA CANCEYELQM LRHRHEALAR 
401 AGVLGGPATP ASGPGPASAE PAVSEPGLGP DPKDTIIDVV GAPLTPNSRK 
451 RRESASSSSS VKKAKKP 

BLASTP hits 

Entry ATAC2337_5 from database TREMBLNEW: 

gene: "T08I13.5"; Arabidopsis thaliana chromosome II BAC T08I13 
genomic sequence, complete sequence. 

Score = 340, P = 2.6e-30, identities = 115/374, positives = 176/374 

Entry YE8D_SCHPO from database SWISSPROT: 

HYPOTHETICAL 4 7.1 KD PROTEIN C9G1.13C IN CHROMOSOME I. 

Score = 221, P = 1.9e-20, identities = 67/192, positives = 97/192 

Entry S64291 from database PIR: 

hypothetical protein YGR002c - yeast ( Saccharomyces cerevisiae) 
Score = 202, P = 2.8e-13, identities = 71/260, positives = 124/260 



Alert BLASTP hits for DKFZphf br2_64c4 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_64c4 , frame 2 



Report for DKFZphf br2_64c4 .2 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

T08I13 genomic 

[ FUNCAT ] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[KM] 

[KW] 



467 

53007.60 
9.51 

TREMBL : ATAC2337_5 gene: "T08I13.5" 
sequence, complete sequence. 4e-29 
99 unclassified proteins [S. cerevisiae, 

MYRISTYL 1 
CAMP_PHOSPHO_SITE 4 
CK2_PHOSPHO_SITE 10 
TYR_PHOSPHO_SITE 3 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 12 
ASN_GLYCOS YLATION 1 
All_Alpha 

LOW COMPLEXITY 20.13 % 



Arabidopsis thaliana chromosome II BAC 
YGR002c] le-19 



SEQ MATGADVRDILELGGPEGDAASGTISKKDIINPDKKKSKKSSETLTFKRPEGMHREVYAL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeeeeeeeeccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ LYSDKKDAPPLLPSDTGQGYRTVKAKLGSKKVRPWKWMPFTNPARKDGAMFFHWRRAAEE 

SEG 

PRD hhhhccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhc 

SEQ GKDYPFARFNKTVQEPVYSEQEYQLYLHDNAWTKAETDHLFDLSRRFDLRFVVIHDRYDH 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhccceeeeeeccccc 

SEQ QQFKKRSVEDLKERYYHICAKLANVRAVPGTDLKIPVFDAGHERRRKEQLERLYNRTPEQ 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcchhh 

SEQ VAEEEYLLQELRKIEARKKEREKRSQDLQKLITAADTTAEQRRTERKAPKKKLPQKKEAE 

SEG xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KPAVPETAGIKFPDFKSAGVTLRSQRMKLPSSVGQKKIKALEQMLLELGVELSPTPTEEL 

SEG xxx 



285 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PRD hccccccccccccccccceeehhhhhhhccccccchhhhhhhhhhhhhhhhcccccchhh 

SEQ VHMFNELRSDLVLLYELKQACANCEYELQMLRHRHEALARAGVLGGPATPASGPGPASAE 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PAVSEPGLGPDPKDTIIDVVGAPLTPNSRKRRESASSSSSVKKAKKP 

SEG xxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD cccccccccccccceeeeeccccccccccccccccccccceeecccc 



Prosite for DKFZphf br264c4 . 2 



PS00001 
PS00002 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00007 
PS00007 
PS00008 



130- >134 
412->416 

35->39 
39->43 
184->188 
451->455 
26->29 
38->41 
46->49 
63->66 
82->85 
89->92 
164->167 
284->287 
321->324 
324->327 
448->451 
460->463 
3->7 
26->30 
132->136 
139->143 
153->157 
187->191 
273->277 
277->281 
3S5->359 
435->439 

131- >139 
227->235 
116->125 

14->20 



ASN_GLYCOS YLAT I ON 

GLYCOSAMINOGLYCAN 

CAMP_PHOS PHO_S I TE 

CAMP_PHOS PHO_S I TE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHC_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

C K2_PHOS PHO_S ITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC0COO5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDCC00036 
PDCC00006 
PDOC00O06 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphf br2_64c4 . 2 ) 
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DKFZphfbr2_64h6 



group: brain derived 

DKFZphfbr2 64h6 encodes a novel 17 6 amino acid protein with similarity to predicted yeast 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to S.pombe SPBC337.09 and S.cerevisiae YER044C 

complete cDNA, complete cds accoring to YER044c/SPBC337 . 09, 
start at Bp 111, EST hits 

Sequenced by Qiagen 

Locus: /map="14" 

Insert length: 1212 bp 

Poly A stretch at pos . 1192, polyadenylation signal at pos . 1168 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 



GGGCTGGAGC 
CACTGCTGTG 
GAGGGGAGTC 
TGGTGTCCAT 
ACTTTTCTCT 
CCTCCAAGCT 
GCTGCCTCTG 
CTCTGGACCT 
CTATGGAACT 
CAAGTTTCTC 
GAACCAGTAT 
CCAGGACTTT 
TCCCCTTTAA 
AGCCTCTTTT 
TATAACATGT 
AACTCCATCC 
TCCTTCCCCT 
TTTTTCCCTT 
GTGAACTATG 
GGGCTGCCTC 
GACCCAAGAC 
CTCCCCTGTG 
TGAGGGTTGG 
TTGTGTAACA 
AAAAAAAAAA 



TGTCCTGGGG 
CTGGGGGCCC 
ATGAGCCGTT 
CATAGCCATG 
ATGAAAAGCT 
CGGACCTTTG 
TGCCATTGAC 
TCCTCCTTGC 
GCAGCTCCCA 
CATCCTGGGT 
CCAGACAGAA 
CTCGTTTTCC 
TTTCTTTTCT 
TTTAATTTTT 
ACGTACAATT 
AAGTCAAGAA 
ACCTGCAACC 
TTATTTTCAT 
AAACTTAAAC 
AAGGGGTTGT 
TCTGAACCTT 
TGTGAGCAAG 
AAGAGTCTGG 
ACTTTTGTAA 
AA 



GAGCTTGTTT 
GGTCGCCAGG 
TCCTGAATGT 
GGGAACACGC 
CTACACTGGC 
GGATCTGGAC 
ATTCACAACA 
CCTGGGGCAT 
CGATTGGCGT 
ATGCTGGTCG 
GAAGAGAAAC 
ACCTTGGCCA 
ATTCCATCAT 
AAAATTTAAA 
TAAAGAATAA 
ATTGCCAGCT 
TCTTCCAGGC 
GCCTTGATTT 
CTGCTGCCCA 
CCACGCAGGT 
CCAAGGGACA 
ACCACAGCTC 
GCTGTTTTTA 
TAAATAGAAA 



GCGGCAGCGG 
CAAAAAGCCC 
GTTAAGAAGT 
TGCAGAGCTT 
AAGCCAAACC 
GCTGCTCTCA 
AGACGCTCTA 
TTCCTCTCTG 
CCTGGCACCC 
GGCTCCGGTA 
TGAGGCCAGC 
TCTTCTTCCT 
CTGCCCTTTT 
GATATGCATA 
TTTTAAAGTG 
TCTCGGAAGC 
TCCCTTTTCC 
GACTTGTGTG 
CCCAGAGCAG 
TGGGCTCCTC 
GGCAGTTCTT 
TCCTTCTATC 
GACCTTCTGG 
AACCCTCTGC 



CTGCTGCTGC 
TCCCACGTTT 
TGGCTGGTTA 
CCGAGACCAC 
TTGTGAATGG 
TCAGTGATCC 
TCACATCACA 
AGTTGTTTGT 
CTGATGGTGG 
TCTAGAAGTA 
ATTATCACCT 
TCGTCGTCTC 
ACTCACTTTT 
CTGAAAAGTA 
AATACTACGT 
CCACTGTGTC 
AGCCTTCCCC 
GTGGGAACAT 
CTGTGACCAA 
TCTGCTGCTG 
CTGAGAAGGG 
TACAGATGCA 
TCAGCTGTAT 
TCAAAAAAAA 



BLAST Results 



Entry G38566 from database EMBL: 

SHGC-64295 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1398, P = 1.4e-56, identities = 284/288 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 0 bp to 530 bp; peptide length: 177 
Category: similarity to unknown protein 
Classification: unclassified 

1 AGAVLGELVC GSGCCCHCCA GGPVARQKAL PRLRGVMSRF LNVLRSWLVM 
51 VSIIAMGNTL QSFRDHTFLY EKLYTGKPNL VNGLQARTFG IWTLLSSVIR 
101 CLCAIDIHNK TLYHITLWTF LLALGHFLSE LFVYGTAAPT IGVLAPLMVA 
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151 SFSILGMLVG LRYLEVEPVS RQKKRN 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64h6, frame 3 

TREMBL:SPBC337_9 gene: "SPBC337 . 09"; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337., N = 1, Score = 224, P = 
1 . 4e-18 

PIR:S50547 hypothetical protein YER044c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 192, P = 3.4e-15 



>TREMBL:SPBC337_9 gene: "SPBC337 .09"; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337. 
Length = 136 

HSPs : 

Score = 224 (33.6 bits), Expect - 1.4e-18, P = 1.4e-18 
Identities = 49/113 (43%), Positives = 74/113 (65%) 

Query: 42 NVLRSWLVMVSII AMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGI WTLLSSVIRC 101 

+++ W V+VS + A+ NT+QSF L +++Y+ N VNGLQ RTFGIWTLLS+++R 

Sbjct: 11 SLVAKWNVVVSVAALFNTVQSFLTPK-LTKRVYSNT-NEVNGLQGRTFGI WTLLSAI VRF 68 

Query: 102 LCAIDIHNKTLYHITLWTFLLALGHFLSELFVYGTAAPTIGVLAPLMVASFSI 154 

CA I N +Y + T+ LA HFLSE ++ T G+L+P++V++ SI 

Sbjct: 69 YCAYHITNPDVYFLCQCTYYLACFHFLSEWLLFRTTNLGPGLLSPIVVSTVSI 121 



Pedant information for DKFZphf br2_64h6, frame 3 



Report for DKFZphf br2_64h6 . 3 



[LENGTH] 176 

[MW] 19359.31 

[pi] 9.53 

[HOMOL] TREMBL:SPBC337_9 gene: "SPBC337 . 09" ; product: "conserved hypothetical protein"; 
S.pombe chromosome II cosmid c337. 2e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YER044c] 7e-16 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 7.39 % 



SEQ AGAVLGELVCGSGCCCHCCAGGPVARQKALPRLRGVMSRFLNVLRSWLVMVSIIAMGNTL 

SEG xxxxxxxxxxxxx 

PRD ccceeeeeeeeccceeeeccccccccccccccccchhhhhhhhhhhhhhheeeecccccc 

MEM MMMMMMMMMMMMMMMMM . . . . 

SEQ QSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLCAIDIHNKTLYHITLWTF 

SEG 

PRD ccccchhhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhccccceeeehhhhti 

MEM 

SEQ LLALGHFLSELFVYGTAAPTIGVLAPLMVASFSILGMLVGLRYLEVEPVSRQKKRN 

SEG 

PRD hhhhhhhhhhhhhhhccccccccccceeehhhhhhhhhhhheeeeecccccccccc 

MEM MMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphfbr2_64h6. 3) 
(No Pfam data available for DKFZphfbr2_64h6 . 3) 
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DKFZphfbr2_64jl8 



group: Intracellular transport and trafficking 

DKFZphf br2_624j 18 . 1 encodes a novel 180 amino acid protein nearly identical to the microsomal 
signal peptidase 23 kd subunit of canis familiaris, gallus gallus and C. elegans. 

The new protein is identical to canine and chicken microsomal signal peptidase 23 kd subunit. 
The canine microsomal signal peptidase is a protein complex comprised of five subunits (25, 
22/23, 21, 18, and 12 kDa) . The 23kDa subunit is tightly associated with the 18- and 21-kDa 
subunits, that are integral membrane proteins. 

The new protein can find application in modulation of protein transport into microsomal 
compartments and as a tool for proteomic analysis. 



strong similarity to dog signal peptidase (EC 3.4.99.-) 

complete cDNA, complete cds, potential start at Bp 109, EST hits, 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 690 bp 

Poly A stretch at pos . 666, polyadenylation signal at pos . 646 



1 GCCGGAACGC GCGCACCGCA GACGGCGCGG ATCGCAGGGA GCCGGTCCGC 

51 CGCCGGAACG GGAGCCTGGG TGTGCGTGTG GAGTCCGGAC TCGTGGGAGA 

101 CGATCGCGAT GAACACGGTG CTGTCGCGGG CGAACTCACT GTTCGCCTTC 

151 TCGCTGAGCG TGATGGCGGC GCTCACCTTC GGCTGCTTCA TCACCACCGC 

201 CTTCAAAGAC AGGAGCGTCC CGGTGCGGCT GCACGTCTCG CGGATCATGC 

251 TAAAAAATGT AGAAGATTTC ACTGGACCTA GAGAAAGAAG TGATCTGGGA 

301 TTTATCACAT CTGATATAAC TGCTGATCTA GAGAATATAT TTGATTGGAA 

351 TGTTAAGCAG TTGTTTCTTT ATTTATCAGC AGAATATTCA ACAAAAAATA 

401 ATGCTCTGAA CCAAGTTGTC CTATGGGACA AGATTGTTTT GAGAGGTGAT 

451 AATCCGAAGC TGCTGCTGAA AGATATGAAA ACAAAATATT TTTTCTTTGA 

501 CGATGGAAAT GGTCTCAAGG GAAACAGGAA TGTCACTTTG ACCCTGTCTT 

551 GGAACGTCGT ACCAAATGCT GGAATTCTAC CTCTTGTGAC AGGATCAGGA 

601 CACGTATCTG TCCCATTTCC AGATACATAT GAAATAACGA AGAGTTATTA 

651 AATTATTCTG AATTTGAAAC AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



89034208: 

cDNA-derived primary structure of the glycoprotein component of canine 
microsomal 

signal peptidase complex. 



Peptide information for frame 1 



ORF from 109 bp to 648 bp; peptide length: 180 
Category: strong similarity to known protein 
Prosite motifs: TONB_DEPENDENT_REC_l (1-58) 
RGD (148-151) 



1 MNTVLSRANS LFAFSLSVMA ALTFGCFITT AFKDRSVPVR LHVSRIMLKN 

51 VEDFTGPRER SDLGFITSDI TADLENIFDW NVKQLFLYLS AEYSTKNNAL 

101 NQVVLWDKIV LRGDNPKLLL KDMKTKYFFF DDGNGLKGNR NVTLTLSWNV 

151 VPNAGILPLV TGSGHVSVPF PDTYEITKSY 

BLAST P hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64jl8, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64j 18, frame 1 

Report for DKFZphf br2_64j 18.1 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

6e-15 

[FUNCAT] 

palmitylation, 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



180 

20253.39 
8 . 66 

PIR:A31788 signal peptidase (EC 3.4.99.-) (SPC 22/23) - dog le-100 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YLR066w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 

f arnesylation and processing) [S. cerevisiae, YLR066w] 6e-15 

transmembrane protein 2e-92 

glycoprotein 2e-92 

hydrolase 2e-92 

RGD 1 

MYRISTYL 2 

PROKAR_LIP0PR0TEIN 1 

TONB_DEPENDENT_REC_l 1 

PKC_PHOSPHO_SITE 1 

ASN_GLYCOSYLATION 1 

Alpha_Beta 

SIGNAL PEPTIDE 32 



SEQ. MNTVLSRANSLFAFSLSVMAALTFGCFITTAFKDRSVPVRLHVSRIMLKNVEDFTGPRER 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhheeecccccceeehhhhhhhhhhhhccccccc 

SEQ SDLGFITSDITADLENIFDWNVKQLFLYLSAEYSTKNNALNQVVLWDKIVLRGDNPKLLL 

PRD ccccchhhhhhhhccccccchhhhhhhhhhhhhhhccccceeeeeeeceeecccchhhhh 

SEQ KDMKTKYFFFDDGNGLKGNRNVTLTLSWNWPNAGILPLVTGSGHVSVPFPDTYEITKSY 

PRD hhcccceeeeecccccccccceeeeeeeecccccceeeeeccccceeeeccccccccccc 



Prosite for DKFZphfbr2_64 jl8 . 1 



PS00001 
PS00005 
PS00008 
PS00008 
PS00013 
PS00016 
PS00430 



141->145 ASN_GLYCOSYLATION PDOC00001 

94->97 PKC_PHOSPHO_SITE PDOC00005 

25->31 MYRISTYL " PDOC00C03 

135->141 MYRISTYL PDOC00C03 

16->27 PROKAR_LIPOPROTEIN PDOC00013 

112->115 RGD PDOC00016 

l->22 TONE DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphf br2_64j 18 . 1) 
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DKFZphfbr2_64k24 



group: transmembrane proteins 

DKFZphfbr2_64k24 encodes a novel 412 amino acid protein with weak similarity to several known 
proteins . 

The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to AMAC1 "testicular condensing enzyme" ; 
membrane regions: 5 

Summary DKFZphfbr2_64k24 encodes a novel 412 amino acid protein, with 
similarity to AMAC1 " ; product: "testicular condensing enzyme 



similarity to AMAC1 "testicular condensing enzyme" 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1958 bp 

Poly A stretch at pos. 1939, polyadenylation signal at pos. 1918 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
17D1 
1751 
1801 
1851 
1901 
1951 



GGGCCCGCCT 
CGGGGCACCT 
GTCGGCGGCG 
AAGGGAATGA 
CACCTAAGGG 
CGCGCCCCGC 
AGGCCGCGGG 
ACGGGGACGG 
CCTCTGTCGC 
CTGTTGCTTT 
TCTGAAGAAA 
GGTGAAAATA 
CCCAGCCTGG 
TTTATGGAGG 
AGGGAGAGCT 
ACCCAATGAT 
ATTTTTCAAT 
TCATGGATGT 
TTCCATCTCT 
GTGTTAGTTG 
CAGATTACGA 
GTGCTTATAC 
TGGAGAGCCA 
AG AT GAG AAA 
TAGGTGTTTG 
TTGTTAAATG 
TGGACTGACC 
AGATCAGCAT 
TGGGGAATAT 
TGGAGAAACC 
CATTCTTAGG 
AGCACAGTAC 
CGTGCTGCAC 
TTATGATTAG 
TTAAGAAGGC 
ACCTGATTAT 
CATTTTAATG 
TGCTATAAAA 
ATATATTCAA 
AAAAAAAA 



CGATTTTCCC 
TCCTCGCCAC 
GCGGACTGGG 
GAGCGGACCC 
GCAGAACAGT 
CCGCTTTGCA 
GCCCGCATTT 
GAGGGCCAGC 
GTCTGGGCCC 
CCTCTCGCCC 
TGGATACTTC 
CATCCCAACA 
CGATGATGGA 
AAAATCCAAA 
TTCTTTGGAA 
CAATGAGATT 
CCCGAAAAAT 
GTAGCTCTTA 
AGAACTGATT 
TGTGTTACTA 
CTCTTCTTTT 
ATCATTTTCA 
CAACTACAGT 
ATGGCTTATG 
TCTTGTCATG 
CCTGGAAAGA 
ACTGCTCTCT 
GTGGACTGCG 
CTACTATGTT 
TGGAGTTATC 
AGTTTATTAT 
AACATTTGGA 
ATATTTCCTA 
TGTTTTTGTC 
AGGACTACCA 
TATTGTCTCA 
TTTACCTATG 
TATATAATAT 
ATACAAATAT 



AGGCGAGGGC 
GACACGCAGG 
ACCTTGATCC 
CGAACTCCAC 
CTTTTTGGGT 
GACTTCGGGG 
CTCTGTGCTG 
ATCGGCTACG 
TCCTGCAGCG 
AGTAGCCAAC 
TCCCTCCAGA 
CAGTGATGGT 
TATGAAGAAA 
GAAAGGTCTG 
CCATGGATAC 
GGACAATTCC 
GTGGATAGTG 
TCACTAGGCT 
TTTATCCGTT 
TCAGGAGGCC 
ATGGTGTATG 
ATAGTTCCTC 
CTTCAGTGCC 
TTGACATGGC 
ATCCCAAACA 
AGCCTTTGGG 
CAATGATAGT 
CTGTTTACTT 
TATTCTTCAA 
TCATTGCTAT 
GCCTTGGACA 
GATTGTGGTA 
GCATCTATGA 
CTTGCTGGCT 
GGAAATACTA 
TTAATGTTCA 
AATGTCTTTT 
ATACAAATGC 
TAAATATATG 



ACGCCCGCGT 
TAACCGGGCC 
TGCCTGCCCG 
ACACCCGCGT 
AAGGGCCGGG 
TGCTCTGCAC 
CCCTCCTGGA 
GCCCGGTTTC 
TCCATGATGA 
CCAAGCAAGG 
AAATATCCAG 
GAAATATACT 
TCAATGAAGG 
CTGAGTGAAA 
CCTACCTCCA 
AGAGCTTTGC 
CTGTTTGGAT 
TGTTTCTGAT 
CTGTTTTTCA 
CCCTTTGGAC 
CAATGTCATT 
CCAGCAATGG 
ATTTTGGCTT 
TACAGTTGTT 
TTGTTGATGA 
TACACCATGA 
ATACAGATCC 
TTGGTTGGAC 
GAACCCATCA 
ATGTGTCTGT 
AATTCCATCC 
GCTATGGTCT 
TGTTTTTGGA 
ATAAACTTTA 
GACTCTCCCA 
GTTATTAATA 
GTGTTATATA 
AGAAAATTTA 
AAATACGTTA 



CAGTCGCCTC 
CCGGGAGCCG 
GCCGCCCGAC 
TTAGCCGCCA 
CTGGGGGCGA 
GACGCCTGAA 
GAACCGCGAC 
CCGTTTCTTT 
AGGCCAGGGG 
GAATTAATTA 
TTAAAAAACG 
TCTCATTATC 
CTATGGGAAT 
TG AAAAAAAA 
CCAACAGAAG 
AGAAAAAAAC 
CTGCTTTGGC 
CGGTCTAAAG 
GGTCTTATCT 
CCAGTGGATA 
TCTATCACTT 
GACCACTATG 
TTTTACTCGT 
TGCAGCATCT 
AGACAATTCT 
CTGTGATGGC 
ATCAAGGAGA 
TGGGACAATT 
TCCCATTAGA 
TCTACTGCAG 
AGCTTTGGTT 
TGCAGCTTCT 
GGGGTAATCA 
CTGGAGGAAT 
TTAAATGAAT 
TGTATACTGC 
ACTGACAGAG 
TTCTAGTCTA 
AAAAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 510 bp to 1745 bp; peptide length: 412 
Category: similarity to known protein 



1 MDTSPSRKYP 
51 ENPKKGLLSE 
101 SRKMWIVLFG 
151 VCYYQEAPFG 
201 TTTVFSAILA 
251 AWKEAFGYTM 
301 STMFILQEPI 
351 QHLEIVVAMV 
4 01 QDYQEILDSP 



VKKRVKIHPN 
MKKKGRAFFG 
SALAHGCVAL 
PSGYRLRLFF 
FLLVDEKMAY 
TVMAGLTTAL 
IPLDGETWSY 
LQLLVLHIFP 
IK 



TVMVKYTSHY 
TMDTLPPPTE 
ITRLVSDRSK 
YGVCNVISIT 
VDMATVVCSI 
SMIVYRSIKE 
LIAICVCSTA 
SIYDVFGGVI 



PQPGDDGYEE 
DPMINEIGQF 
VPSLELIFIR 
CAYTSFSIVP 
LGVCLVMIPN 
KISMWTALFT 
AFLGVYYALD 
IMISVFVLAG 



INEGYGNFME 
QSFAEKNIFQ 
SVFQVLSVLV 
PSNGTTMWRA 
IVDEDNSLLN 
FGWTGTIWGI 
KFHPALVSTV 
YKLYWRNLRR 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_64k24, frame 3 

TREMBLNEW:AF016712_1 gene: "AMAC1"; product: "testicular condensing 
enzyme"; Mus musculus testicular condensing enzyme (AMAC1) mRNA, 
complete cds . , N = 1, Score = 191, P = 1.9e-12 

TREMBL : BMAJ733_6 product: "hypothetical protein"; Bacillus megaterium 
bgaM gene, N = 1, Score = 137, p = 1.6e-06 

PIR:G71841 hypothetical protein jhpll55 - Helicobacter pylori (strain 
J99), N - 1, Score = 129, P = 1.3e-05 



>TREMBLNEW:AF016712_1 gene: "AMAC1"; product: "testicular condensing 

enzyme"; Mus musculus testicular condensing enzyme (AMAC1) mRNA, complete 
cds . 

Length = 362 



HSPs : 



Score = 191 (28.7 bits). Expect = 1.9e-12, P = 1.9e-12 
Identities = 39/105 (37%), Positives = 66/105 (62%) 



Query: 289 FTFGNTGTIWGISTMFILQEPIIPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVS 348 

F FG G + + +F+LQ P++P D +WS ++A+ + + +F+ V YA+ K HPALV 
Sbjct: 248 FLFGLVGLMVSVPGLFVLQTPVLPQDTLSWSCVVAVGLLALVSFVCVSYAVTKAHPALVC 307 

Query: 349 TVQHLEI VVAMVLQLLVLH — I FPS I YD VFGGVI IMISVFVLAG YKL 393 

V H E+VVA++LQ VL+ + PS D+ G +++ S+ ++ L 
Sbjct: 308 AVLHSEVVVALMLQYYVLYETVAPS — DIMGAGVVLGSIAI ITAQNL 352 



Pedant information for DKFZphfbr2_64k24, frame 3 



Report for DKFZphf br2_64k24 . 3 



[LENGTH] 412 

[MW] 46449.87 

[pi] 6.99 

[HOMOL] TREMBL: AF016712__1 gene: "AMAC1"; product: "testicular condensing enzyme"; Mus 
musculus testicular condensing enzyme (AMAC1) mRNA, complete cds. 8e-14 

[PROSITE] MYRISTYL 6 

[PROSITE] CK2_PH0SPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[ PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 5 



SEQ MDTSPSRKYPVKKRVKIHPNTVMVKYTSHYPQPGDDGYEEINEGYGNFMEENPKKGLLSE 
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PRD ccccccccccccceeeecccceeeeeecccccccccceeeeecccccccccccccchhhh 

MEM 

SEQ MKKKGRAFFGTMDTLPPPTEDPMINEIGQFQSFAEKNIFQSRKMWIVLFGSALAHGCVAL 

PRD hhhhcceeecccccccccccccceeeecccchhhhhhhhccceeeeeeeccccchhhhhc 

MEM 

SEQ ITRLVSDRSKVPSLELI FIRSVFQVLSVLVVCYYQEAPFGPSGYRLRLFFYGVCNVISIT 

PRD chhhhhccccccccchhhhhhhhhhhheeeeeeeccccccccceeeeeeeecceeeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ CAYTSFSIVPPSNGTTMWRATTTVFSAILAFLLVDEKMAYVDMATVVCSILGVCLVMIPN 

PRD eccceeeeccccccceeeeeehhhhhhhhhhhhhhhhheeeeeeeeeeeeeeeeeeeecc 

MEM 

SEQ IVDEDNSLLNAWKEAFGYTMTVMAGLTTALSMIVYRSIKEKISMWTALFTFGWTGTIWGI 

PRD cccccchhhhhhhhhhhheeeeeeehhhhhhhcchhhhhhhhhhhhccccccccceeeec 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ STMFILQEPI IPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVSTVQHLEI VVAMV 

PRD ceeeeeecccccccccceeeeeccchhhhhhhhhccccccccccchhhhhhhhhhhhhhh 

MEM MMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMM 

SEQ LQLLVLHIFPSI YDVFGGVIIMISVFVLAGYKLYWRNLRRQDYQEILDSPIK 

PRD hhhhhhhhhccccccceeeeeeeeeecccccchhhhhhhhhhhhhhhccccc 

MEM MMMMMMM. . . . MMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphfbr2_64k24 . 3 



PS00001 


193- 


>197 


ASN GLYCOS YLAT I ON 


PDOC00001 


PS00005 




6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


101- 


>104 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


126- 


>129 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


277- 


>280 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


92 


->96 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


277- 


>281 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS000Q6 


371- 


>375 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00008 


70 


->76 


MYRISTYL 




PDOC00008 


PS00008 


88 


->94 


MYRISTYL 




PDOC00008 


PS00008 


110- 


>116 


MYRISTYL 




PDOC00008 


PS00008 


265- 


>271 


MYRISTYL 




PDOC00008 


PS00008 


295- 


>301 


MYRISTYL 




PDOC00008 


PS00008 


334- 


>340 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_64k24 . 3) 
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DKFZphfbr2_6al7 



group: brain derived 

DKFZphfbr2_6al7 encodes a novel 100 amino acid protein with very weak similarity to human 
finger protein zfOCl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1424 bp 

Poly A stretch at pos. 1405, polyadenylation signal at pos . 1389 



1 GGGACTGAGG GGGTGGGCTT ACTCCCTGGG CAGTCTTGGG GGCCAGAGCT 
51 GAGGCCAGTC CATATTACAG TGGCTGGGCT GTTTTTTTCA GTAGCCCCTA 
101 GCATTGGCTG GGATTCCTGT TCCTGGGTGC GCCTCCACCT CCCTTCTGAT 
151 GCTTCCTGGC TATGGTGGGG TGGGAACCTC AGTTTCCCCC AAAGTCTTCC 
201 CTGGATGCTG GCTTCAGGTT GAAGACCCTG GTTCTTCCAG TTCCTCACGG 
251 GTTAGGTAGG GGCTCCTGCA TCACCTTCAG AATCAGTTCC AACCCCCACT 
301 CTCCTTAGGC TTTGTGCTCT GCTCTGCCCT GCCAGGCTGC CCTTGTCCAT 
351 GTGAGTAGCA TGGGCGGGTG GTGGGGACGG CAGTGGTGAT GAAGGGGGTG 
401 CACCACAGGC CTCATGAAGC AGTTCCCACA TGGGCGTGTG GCTGGGGCGT 
451 GGCCACCACA GAGCACATGG CTGTGTCTAG GCGCAAGCAC TTTAGCAGTA 
501 TCTGTTTACA TGCGCAAGGA TCAAGCCGAC TACCTGTGCT GTCTACTGGG 
551 ACAGCAGTCT CCGAGCTACT CCGTACCTCC CTCTGCCAGG TCGTGGAGTT 
601 AGGCCCCAGT CCCTACTTGT CACTGGTTCC CACTGTGCTC CTAACTGTGC 
651 AGCACCTGGG AGCTCTGGCC TGGGGCTGGA GGCCCTGGTA GGAGCTGCAG 
701 TTGGAGGCCG TTCTGTGCCC AGCAGCGGTG AGCGGCTCCC ATGGGCCCTG 
7 51 TGTCTGCAGG GAGCCAGGGC TGCGGCACAT GTGCTGTGAA ACTGGCACCC 
801 ACCTGGCGTG CTGCTGCCGC CACTTGCTTC CTGCAGCACC TCCTACCCTG 
851 CTCCGTGTCC TCCCTCTCCC CGCGCCTGGC TCAGGAGTGC TGGAAAAGCT 
901 CACGCCTCGG CCTGGGAGCC TGGCCTCTTG ATATACCTCG AGCTTCCCCT 
951 GTGCTCCCCA GCCCCAGGAC CACTGGCCCC TTGGCCTGAG GGGCTGGGGG 
1001 CCCCACGACC TGCAGCGTCG AGTCCGGGAG AGAGCCCGGA GCGGCGTGCC 
1051 ATCTCGGCTC GGCCTTGCTG AGAGCCTCCG CCCTGGCTTT CTCCCTGTCT 
1101 GGTTTCAGTG GCTCACGTTG GTGCTACACA GCTAGAATAG ATATATTTAG 
1151 AGAGAGAGAT ATTTTTAAGA CAAAGCCCAC AATTAGCTGT CCTTTAACAC 
1201 CGCAGAACCC CCTCCCAGAA GAAGAGCGAT CCCTCGGACG GTCCGGGCGG 
1251 GCACCCTCAG CCGGGCTCTT TGCAGAAGCA GCACCGCTGA CTGTGGGCCC 
1301 GGCCCTCAGA TGTGTACATA TACGGCTATT TCCTATTTTA CTGTTCTTCA 
1351 GATTTAGTAC TTGTAAATAA ACACACACAT TAAGGAGACA TTAAACATTT 
14 01 TTGCCAAAAA AAAAAAAAAA A AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 389 bp to 688 bp; peptide length: 100 
Category: putative protein 



1 MKGVHHRPHE AVPTWACGWG VATTEHMAVS RRKHFSSICL HAQGSSRLPV 
51 LSTGTAVSEL LRTSLCQVVE LGPSPYLSLV PTVLLTVQHL GALAWGWRPW 

BLASTP hits 
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Entry S70007 from database PIR: 

finger protein zfOCl - human (fragment) 

Length = 183 

Score = 62 (21.8 bits), Expect = 0.24, Sum P(2) = 0.22 
Identities = 18/47 (38%), Positives = 24/47 (51%) 



Alert BLASTP hits for DKFZphfbr2_6al7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2 6al7 , frame 2 



Report for DKFZphf br2_6al7 . 2 



[ LENGTH ] 100 

[MW] 10944.82 

[pi] 9.49 

[PROSITE] MYRISTYL 2 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] Alpha_Beta 



SEQ MKGVHHRPHEAVPTWACGWGVATTEHMAVSRRKHFSSICLHAQGSSRLPVLSTGTAVSEL 

PRD cccccccccccccccccccccchhhhhhhhhhcccccceeeccccccceeecccchhhhh 

SEQ LRTSLCQWELGPSPYLSLVPTVLLTVQHLGALAWGWRPW 

PRD hhhhheeeeecccccceeecchhhhhhhhhchhhhhcccc 



Prosite for DKFZphf br2_6al7 . 2 

PS00005 30->33 PKC_PHOSPHO_SITE PDOC00005 

PS00005 4S->48 PKC_PHOSPHO_SITE PDOC00005 

PS00008 20->26 MYRISTYL PDOC00008 

PS00008 54->60 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphfbr2_6al7 . 2) 
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DKFZphfbr2_6b24 



group: metabolism 

DKFZphf kd2_6b24 encodes a novel 334 amino acid protein with similarity to several bacterial 
dTDP-4-dehydrorhamnose reductases (EC 1.1.1.133). 

The novel protein seems to be a human enzyme similar to dTDP-4-dehydrorhamnose reductases. EC 
1.1.1.133 catalises the reaction: drDP-6-deoxy-L-mannose + NADP(+) <=> dTDP-4-dehydro-6-deoxy- 
L-mannose + NADPH. 

The new protein can find application in modulation of rhamnose metabolism and as a new enzyme 
for biotechnologic production processes. 



similar to dTDP-6-deoxy-L-mannose-dehydrogenases 
complete cDNA, EST hits, complete cds 

Nucleotide sugars metabolism seems to be a dehydrogenase 
localisation: region of primer A missing 

Sequenced by AGOWA 

Locus: /map="5" 

Insert length: 2054 bp 

Poly A stretch at pos . 2028, polyadenylation signal at pos . 2015 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 



GGGGGAGGCC 
CTGCGGCGTG 
TACACTTTGT 
CCTAATAGGA 
TGTACACAAA 
TCAGAAGAGC 
GCAGTTCATC 
TGCAGCAGAG 
CTCAACTTAA 
GTTGGAGCAT 
AAATCCACCT 
GCAAAACAAA 
GCTGCTGTTT 
AGAAAGTGCA 
CAGCAAACAT 
GTGGCCACTG 
AATTAAGGGA 
AAATGGCATG 
AGACCTATTA 
TCAGCTTGAC 
CATTTCGAAT 
AG AT GG AG AC 
TTTTTTAAAT 
ATAGTTTTGT 
ATGATGCTCT 
TGCCCTGTTT 
AACTTGGAGT 
AGGATGAAGC 
GTAACCTCCA 
ATTATTTTAA 
TGCTTTGCCT 
TGCAAGTTAC 
TATTGGAACT 
GGAGCACTTG 
TGAATGCAAA 
TCCCATGTTG 
AATGGGCCTT 
CTGCTGGCAT 
TGTAAGGACC 
AAAAATTACA 
TTTAATATGT 
AAAA 



CGCGTCGATC 
AAGACGGCGG 
TCCCGGGAGC 
GGGTTCTGGT 
GAATTTCAGC 
AAGACCAAAA 
ACATCATTCA 
AGAAGACCAG 
T3TGGATGCT 
TTCTCATCTA 
TACAGAGAGG 
ATTAGATGGA 
TGAGGATTCC 
GTGACTGTTA 
GGATCACTGG 
TGTGCCGGCA 
ACCTTTCACT 
TGCAATTGCA 
CTGACAGCCC 
TGCTCCAAAT 
TGGAATCAAA 
AAACGGTCTT 
GAAAAGTATA 
ATGAGTACTT 
TGCACTAGTG 
GCAGTAATTT 
TTGAGTATAG 
AGATCTGCTG 
TATTTTCAGG 
ATTGTGTGGA 
GAGCTCAGAT 
GTACAGTTTT 
TCTACAGCTT 
AAAGAGCGTG 
CGTGTATTTT 
CCGCTAAGTG 
GTAAGTCTTT 
GTAATGCTTA 
AAACTTCTAA 
TTCTTCTGAT 
ATTGAAATAA 



CTGGGTTGGA 
GCATGGTGGG 
TGTCGGCTGG 
TACTGGTGCC 
AGAATAATTG 
TTTGAACAGG 
TGATTTTCAG 
ATGTTGTAGA 
TCTGGGAATT 
CATTAGCTCA 
AAGACATACC 
GAAAAGGCTG 
TATTCTGTAT 
TGTTTGATAA 
CAGCAGAGGT 
GCTAGCAGAG 
GGTCTGGCAA 
GATGCCTTCA 
TGTCCTAGGA 
TGGAGACCTT 
GAATCACTTT 
TCATTAGTTT 
GTATGTGGCC 
TAATTGTGAC 
AAATTGTCTA 
TTCTTTTTAT 
TAAATTATGA 
TAGACTTTTC 
ATTTTTGAAG 
ATAGTATAAA 
CAAAATGTTT 
TATGCTTGAG 
GATGCCTCCT 
TGTACATGTA 
TTTAATATAA 
ATATTTCATA 
TCACCATTCA 
GTTTTCTTGT 
ACTAATTGTT 
GTAACATGTG 
AACACAATAA 



GGAGGTGGCG 
GCGGGAGAAA 
TGGAGGAGGA 
ACTGGGCTTC 
GCATGCAGTT 
TTAAICTGTT 
CCCCATGTTA 
AAATCAGCCA 
TAGCAAAGGA 
GATTATGTAT 
AGCTCCCCTA 
TCCTGGAGAA 
GGGGAAGTTG 
AGTGCAGTTC 
TCCCCACACA 
AAGAGAATGC 
TGAA^AGATG 
ACCTCCCCAG 
GCACAACGTC 
GGGCATTGGC 
GGCCTTTCCT 
ATTTGTGTTG 
CTTTTTAAAG 
TCTTAGGATC 
AAGAAACTAA 
CATTATGTTT 
TCCTTAAATA 
AGATGAAATT 
CTGTTGACCA 
AATCATTGGT 
GAAGAAAGGA 
ATATTTCAAC 
GCTTTTATAG 
TTTTTTTTCT 
ATATATAACT 
TGTGTGGTTA 
TGAATAATAA 
ATTTACTTCT 
CTTTTGTTGC 
ATACATACAA 
AATTAAAAAA 



GCCGCTGAGG 
GAGCTCTCTA 
AGTTAACATC 
TTGGCAGAGC 
GGCTGTGGTT 
GGATTCTAAT 
TAGTACATTG 
GATGCTGCCT 
AGCAGCTGCT 
TTGATGGAAC 
AATTTGTATG 
CAATCTAG3A 
AAAAGCTCGA 
AGCAACAAGT 
TGTCAAAGAT 
TGGATCCATC 
ACTAAGTATG 
CAGTCACTTA 
CGAGAAATGC 
CAACGAACAC 
CATTGACAAG 
GGTTCTTTTT 
AACAAAGGAA 
TTTCAGGTAA 
AGGGCAGTCA 
GTCCTGGCTA 
TTTGAGGGTC 
GTTCATTCTC 
TTTCATGTTG 
GTTCATTATT 
ACTTTATTTT 
ATGTTATGTA 
CAGTTTATGG 
AGGCAAACAT 
GTCCTTTTCA 
TACTCATAAT 
TAAATATGTA 
TTTTTTTAAA 
TTTAATTTTT 
AAGAATATAG 
AAAAAAAAAA 



BLAST Results 



Entry G37115 from database EMBL: 
SHGC-56899 Human Homo sapiens STS genomic. 
Score = 446, P = 4.6e-14, identities = 90/91 
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Medline entries 



99109950: 

The metabolism of 6-deoxyhexoses in bacterial and animal 
cells . 



Peptide information for frame 1 



ORF from 73 bp to 1074 bp; peptide length: 334 
Category: similarity to known protein 



1 MVGREKELSI HFVPGSCRLV EEEVNIPNRR VLVTGATGLL GRAVHKEFQQ 

51 NNWHAVGCGF RRARPKFEQV NLLDSNAVHH IIHDFQPHVI VHCAAERRPD 

101 VVENQPDAAS QLNVDASGNL AKEAAAVGAF LIYISSDYVF DGTNPPYREE 

151 DIPAPLNLYG KTKLDGEKAV LENNLGAAVL RIPILYGEVE KLEESAVTVM 

201 FDKVQFSNKS ANMDHWQQRF PTHVKDVATV CRQLAEKRML DPS IKGTFHW 

251 SGNEQMTKYE MACAIADAFN LPSSHLRPIT DSPVLGAQRP RNAQLDCSKL 

301 ETLGIGQRTP FRIGIKESLW PFLIDKRWRQ TVFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_6b24 , frame 1 

PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans, N = 1, Score = 293, P = 6.4e-26 

TREMBL:SSU51197_21 gene: "rhsD"; product: 

"dTDP-6-deoxy-L-mannose-dehydrogenase"; Sphingomonas S88 sphingan 
polysaccharide synthesis (spsG) , (spsS), (spsR) , glycosyl transferase 
(spsQ) , (spsl), glycosyl transferase (spsK), glycosyl transferase 
(spsL), (spsj), (spsF), (spsD), (spsC), (spsE|, Urf 32, Urf 26, 
ATP-binding cassette trans>., N - 1, Score = 291, P = le-25 

SWISSPROT:RFBD_RHISN PROBABLE DTDP-4-DEHYDRORHAMNOSE REDUCTASE (EC 
1.1.1.133) (DTDP-4-KETO- L-RHAMNOSE REDUCTASE) (DTDP-6-DEOXY-L-MANNOSE 
DEHYDROGENASE) (DTDP-L- RHAMNOSE SYNTHETASE)., N = 1, Score = 283, P = 
7.4e-25 



>PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans 
Length = 294 

HSPs: 



Score 


= 293 


(44.0 bits), Expect = 6.4e-26, P = 6.4e-26 




Identities = 


= 89/276 (32%), Positives = 151/276 (54%) 




Query: 


30 


RVLVTGATGLLGRAVHKEFQQNNWHAVGCGFRRARPKFEQVNLLDSNAVHHIIHDFQPHV 


89 






R+L+TGA G LGR++ K N + V F ++++ + + V II F+P+V 




Sbjct: 


3 


RLLITGAGGQLGRSLAKLLVDNGRYEV LALDFSELDITNKDMVFSIIDSFKPNV 


56 


Query: 


90 


IVHCAAERRPDVVENQPDAASQLNVDASGNLAKEAAAVGAFLI YISSDYVFDG-TNPPYR 


148 






I++ AA D E + +A +NV LA+ A + + + + l-S+DYVFDG + Y+ 




Sbjct: 


57 


IINAAAYTS VDQAELEVSSAYSVNVRGVQYLAEAAIRHNSAILHVSTDYVFDGYKSGKYK 


116 


Query: 


149 


EEDI PAPLNLYGKTKLDGEKAVLENNLGAAVLRI PILYGEVEKLEESAVTVMFDKVQFSN 


208 






E DI PL +YGK+K +GE+ +L + + +LR +GE + V M ++ + 




Sbjct: 


117 


ETDIIHPLCVYGKSKAEGERLLLTLSPKSIILRTSWTFGEYGN NFVKTML-RLAKNR 


172 


Query: 


209 


KSANMDHWQQRFPTHVKDVATVCRQLAEKRMLDPSIK-GTFHWSGNEQMTKYEMACAIAD 


2 67 






+ Q PT+ D+A+V Q+AEK ++ ++K G +H++G ++ Y+ A AI D 




Sbjct: 


173 


DI LGVVADQIGGPT YSGDI ASVLIQIAEKI IVGETVKYGI YH FTGEPCVSWYDFAIAI FD 


232 


Query : 


268 


AF NLPSSHLRPI TDS PVLGAQRPRNAQLDCSKLE-TLGI 305 








N+P + D P L A+RP N+ LD +K++ GI 




Sbjct: 


233 


EAVAQKVLENVPLVNAITTADYPTL-AKRPANSCLDLTKIQQAFGI 277 
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Pedant information for DKFZphf br2_6b2 4, frame 1 



Report for DKFZphf br2_6b24 . 1 



[LENGTH] 


334 


[MW] 


37551 . 98 


[pi] 


6 . 90 


r woMriT. l 


PIR:T00104 probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133) - 


Actinobacillus 


actinomycetemcoraitans 6e-25 


[FUNCAT] 


01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YGLOOlc] 


6e-04 




[EC] 


1.1.1.133 dTDP-4-dehydrorhamnose reductase 2e-16 


[PIRKW] 


lipopolysaccharide biosynthesis 2e-16 


[PIRKW] 


NADP 2e-16 


[PIRKW] 


oxidoreductase 2e-16 


[PIRKW] 


streptomycin biosynthesis le-19 


[SUPFAM] 


dTDP-dihydrostreptose synthase le-20 


[PROSITE] 


MYRI ST YL 1 


[PROSITE] 


CK2 PHOSPHO SITE 4 


[PROSITE] 


PKC PHOSPHO SITE 3 


[PROSITE] 


ASN_GLYCOSYLATION 1 


[KW] 


Alpha_Beta 



SEQ MVGREKELSIHFVPGSCRLVEEEVNI PNRRVLVTGATGLLGRAVHKEFQQNNWHAVGCGF 

PRD cccccceeeccccccceeeeecccccccceeeeeccccchhhhhhhhhhhccceeeeecc 

SEQ RRARPKFEQVNLLDSNAVHHIIHDFQPHVIVHCAAERRPDVVENQPDAASQLNVDASGNL 

PRD cccccccccccccchhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhhhhccchhh 

SEQ AKEAAAVGAFLI YISSDYVFDGTNPPYREEDIPAPLMLYGKTKLDGEKAVLENNLGAAVL 

PRD hhhhhhhhheeeeeeccccccccccccccccccccccccchhhhhhhhhccccccceeee 

SEQ RIPILYGEVEKLEESAVTVMFDKVQFSNKSANMDHWQQRFPTHVKDVATVCRQLAEKRML 

PRD eeeeeecccccccchhhhhhhhhhhhhccceeeccccccccccchhhhhhhhhhhhhhhh 

SEQ DPSIKGTFHWSGNEQMTKYEMACAIADAFNLPSSHLRPITDSPVLGAQRPRNAQLDCSKL 

PRD cccccceeeeccccccchhhhhhhhhhhhhcccccccccccccccccccccccchhhhhh 

SEQ ETLGIGQRTPFRIGIKESLWPFLIDKRWRQTVFH 

PRD hhhhccccchhhhhhhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphf br2_6b24 . 1 



ES00001 


208- 


•>212 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00005 


li 


i->19 


PKC" 


PHOSPHO 


SITE 


PDOC0Q005 


PS00005 


207- 


>210 


PKC" 


PHOSPHO" 


~STTF, 


PDOC00005 


PS00005 


243- 


>246 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00006 


162- 


>166 


CK2" 


"PHOSPHO" 


SITE 


PDOC00006 


PS00006 


251- 


>255 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


257- 


>261 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


298- 


>302 


CK2" 


~PHOSPHO~ 


"site 


PDOC00006 


PS00008 


314- 


>320 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_6b24 . 1 ) 
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DKFZphfbr2_6i20 



group: brain derived 

DKFZphfbr2_6i20 encodes a novel 296 amino acid protein with similarity to ribosomal protein 
L15 precursor of S. cerevisiae mitochondria. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to ribosomal protein L15 precursor, mitochondrial 

complete cDNA, complete cds, EST hits 
potential miochondrial L15 ribosomal protein 

Sequenced by AGOWA 

Locus: /map="377.5 cR from top of Chr8 linkage group" 
Insert length: 1122 bp 

Poly A stretch at pos . 1099, polyadenylation signal at pos . 1071 

1 gggggccctt gaaagttctt ggatctgcgg gttatggccg gtcccttgca 
51 gggcggtggg gcccgggccc tggacctact ccggggcctg ccgcgtgtga 
101 gcctggccaa cttaaagccg aatcccggct ccaagaaacc ggagagaaga 
151 ccaagaggtc ggagaagagg tagaaaatgt ggcagaggcc ataaaggaga 
2 01 aaggcaaaga ggaacccggc cccgcttggg ctttgaggga ggccagactc 
251 cattttacat ccgaatccca aaatacgggt ttaacgaagg acatagtttc 
301 agacgccagt ataagcctat gagtctcaat agactgcagt atcttattga 
351 tttgggtcgt gttgatccta gtcaacctat tgacttaacc cagcttgtca 
" 401 atgggagagg tgtgaccatc cagccactta aaagggatta tgatgtccag 
451 ctggttgagg agggtgctga cacctttacg gcaaaagtta atattgaagt 
501 acagttggct tcagaactag ctattgctgc cattgaaaaa aatggtggtg 
551 ttgttactac agccttctat gatccaagaa gtctggacat tgtatgcaaa 
601 cctgttccat tctttcttcg tggacaaccc attccaaaaa gaatgcttcc 
651 accagaagaa ctggtaccat attacactga tgcaaagaac cgtgggtacc 
701 tggcggatcc tgccaaattt cctgaagcac gacttgaact cgccaggaag 
751 tatggttata tcttacctga tatcactaaa gatgaactct tcaaaatgct 
801 ctgtactagg aaggatccaa ggcagatttt ctttggtctt gctccaggat 
851 gggtggtgaa tatggccgat aagaaaatcc taaaacctac agatgaaaat 
901 ctccttaagt attatacctc atgaattccc gtccaaggaa gcagagttgt 
951 taaagagtac tggaataggg gctgaaggat ctatattccc ttattgcatt 

1001 TTCCTTATGT ATAATTTTCC AGATGGTGAT GTTACTTTTC AGTGTACTCA 
1051 TATGTCTCAT TTTCATCTAA AATTAAATGG CAGGAAACAA GGACTGCATA 
1101 GAGAAAAAAA AAAAAAAAAA AA 

BLAST Results 



Entry HS500354 from database EMBL: 
human STS WI-12392. 
Length =42 6 
Minus Strand HSPs : 

Score = 1791 (268.7 bits), Expect = l.le-74, P = l.le-74 

Identities = 375/384 (97%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 34 bp to 921 bp; peptide length: 296 
Category: strong similarity to known protein 



1 MAGPLQGGGA RALDLLRGLP RVSLANLKPN PGSKKPERRP RGRRRGRKCG 
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51 RGHKGERQRG TRPRLGFEGG QTPFYIRIPK YGFNEGHSFR RQYKPMSLNR 

101 LQYLIDLGRV DPSQPIDLTQ LVNGRGVTIQ PLKRDYDVQL VEEGADTFTA 

151 KVNIEVQLAS ELAIAAIEKN GGVVTTAFYD PRSLDIVCKP VPFFLRGQPI 

201 PKRMLPPEEL VPYYTDAKNR GYLADPAKFP EARLELARKY GYILPDITKD 

251 ELFKMLCTRK DPRQIFFGLA PGWVVNMADK KILKPTDENL LKYYTS 

BLASTP hits 

Entry S63258 from database PIR: 

ribosomal protein L15 precursor, mitochondrial - yeast (Saccharomyces 

cerevisiae) 

Length = 322 

Score = 259 (91.2 bits), Expect = 2.0e-22, P = 2.0e-22 
Identities = 71/200 (35%), Positives = 106/200 (53%) 

Entry H70161 from database PIR: 

ribosomal protein L15 (rplO) - Lyme disease spirochete 
Length = 145 

Score - 173 (60.9 bits), Expect = 4.8e-13, P = 4.8e-13 
Identities = 45/140 (32%), Positives = 73/140 (52%) 



Alert BLASTP hits for DKFZphfbr2_6i20 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6i20, frame 1 



Report for DKFZphf br2_6i20 . 1 



[LENGTH] 29 6 

[MW] 33495.98 

[pi] 9.98 

[HOMOL] TREMBL:AF067212_1 gene: "F37F2.1"; Caenorhabditis elegans cosmid F37F2 . le 

[FUNCAT] 05.01 ribosomal proteins (S. cerevisiae, YNL284c] 7e-15 

[FUNCAT] 30.16 mitochondrial organization [S. cerevisiae, YNL2B4C] 7e-15 

[FUNCAT] j mrna translation and ribosome biogenesis [M. genitalium, MG169] le-06 

[BLOCKS] BL00475D 

[BLOCKS] BL00475B Ribosomal protein L15 proteins 

[PIRKW] ribosome 2e-13 

[PIRKW] mitochondrion 2e-13 

[PIRKW] protein biosynthesis 2e-13 

[SUPFAM] Escherichia coli ribosomal protein L15 4e-06 

[PROSITE] MYRISTYL 3 

[PROSITE] AMIDATION 2 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] AlphaBeta 

[KW] LOW_COMPLEXITY 12.50 % 

SEQ MAGPLQGGGARALDLLRGLPRVSLANLKPNPGSKKPERRPRGRRRGRKCGRGHKGERQRG 
SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxx . . . 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TRPRLGFEGGQTPFYI RIPKYGFNEGHS FRRQYKPMSLNRLQYLIDLGRVDPSQPI DLTQ 

SEG 

PRD ccccccccccccceeeeeccccccccccccccccccchhhhhhhhhccccccccccccee 

SEQ LVNGRGVTIQPLKRDYDVQLVEEGADTFTAKVNIEVQLASELAIAAIEKNGGVVTTAFYD 

SEG 

PRD ecccceeeeccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhccceeeeeecc 

SEQ PRSLDIVCKPVPFFLRGQPIPKRMLPPEELVPYYTDAKNRGYLADPAKFPEARLELARKY 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ GYILPDITKDELFKMLCTRKDPRQIFFGLAPGWVVNMADKKILKPTDENLLKYYTS 

SEG 

PRD cccccccchhhhhhhhhcccccceeeeeccccceeeeccceeecccchhhhhcccc 



Prosite for DKFZphf br2_6i20 . 1 

PS00005 33->36 PKC_PHOSPHO_SITE PDOC00005 

PS00005 88->91 PKC_PHOSPHO_SITE PDOC00005 
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PS00005 
PS00005 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



149->152 
258->261 
248->252 
258->262 
8->14 
171->177 
268->274 
41->45 
45->49 



PKC_PHOS PHO_S ITE 

PKC_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DATION 

AMIDATION 



PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphf br2_6i20 . 1 ) 
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DKFZphfbr2_6ol7 



group: nucleic acid management 

DKFZphfbr2_6ol7 encodes a novel 455 amino acid protein with strong similarity to DEAD-box ATP- 
dependent RNA helicases YHR065c and T26G10.1. 

The S. cerevisiae protein YHR065c is required for maturation of the 35S RNA primary 
transcript . 

The new protein can find application in modulating rRNA maturation. 



strong similar to RNA helicases 
complete cDNA, complete cds, EST hits 

probable start at Bp 27 matchs kozak consensus ANNatgG 
involved in maturation of r-RNA ?? 

YHR065c/Rrp3p is involved in maturation of the 35S primary transcript 
Drslp cold-sensitive mutation has slow 27S to 25S pre-rRNA 
conversion and is deficient in 60S ribosomal subunits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1840 bp 

Poly A stretch at pos. 1815, polyadenylation signal at pos . 1793 



1 GGGGACTTCC GGAGACCTCA CACAAGATGG CGGCACCCGA GGAACACGAT 
51 TCTCCGACCG AAGCGTCCCA GCCGATTGTG GAAGAGGAGG AAACTAAAAC 
101 ATTTAAAGAC CTGGGTGTGA CAGATGTGTT GTGTGAAGCT TGTGACCAGT 
151 TGGGATGGAC AAAACCCACC AAGATTCAGA TTGAAGCTAT TCCTTTGGCC 
201 TTACAAGGTC GTGATATCAT TGGGCTTGCA GAAACTGGCT CTGGAAAGAC 
251 AGGCGCCTTT GCTTTGCCCA TTCTAAACGC ACTGCTGGAG ACCCCGCAGC 
301 GTTTGTTTGC CCTAGTTCTT ACCCCGACTC GGGAGCTGGC CTTTCAGATC 
351 TCAGAGCAGT TTGAAGCCCT GGGGTCCTCT ATTGGAGTGC AGAGTGCTGT 
401 GATTGTAGGT GGAATTGATT CAATGTCTCA ATCTTTGGCC CTTGCAAAAA 
451 AACCACATAT AATAATAGCA ACTCCTGGTC GACTGATTGA CCACTTGGAA 
501 AATACGAAAG GTTTCAACTT GAGAGCTCTC AAATACTTGG TCATGGATGA 
551 AGCCGACCGA ATACTGAATA TGGATTTTGA GACAGAGGTT GACAAGATCC 
601 TCAAAGTGAT TCCTCGAGAT CGGAAAACAT TCCTCTTCTC TGCCACCATG 
651 ACCAAGAAGG TTCAAAAACT TCAGCGAGCA GCTCTGAAGA ATCCTGTGAA 
701 ATGTGCCGTT TCCTCTAAAT ACCAGACAGT TGAAAAATTA CAGCAATATT 
751 ATATTTTTAT TCCCTCTAAA TTCAAGGATA CCTACCTGGT TTATATTCTA 
801 AATGAATTGG CTGGAAACTC CTTTATGATA TTCTGCAGCA CCTGTAATAA 
851 TACCCAGAGA ACAGCTTTGC TACTGCGAAA TCTTGGCTTC ACTGCCATCC 
901 CCCTCCATGG ACAAATGAGT CAGAGTAAGC GCCTAGGATC CCTTAATAAG 
951 TTTAAGGCCA AGGCCCGTTC CATTCTTCTA GCAACTGACG TTGCCAGCCG 
1001 AGGTTTGGAC ATACCTCATG TAGATGTGGT TGTCAACTTT GACATTCCTA 
1051 CCCATTCCAA GGATTACATC CATCGAGTAG GTCGAACAGC TAGAGCTGGG 
1101 CGCTCCGGAA AGGCTATTAC TTTTGTCACA CAGTATGATG TGGAACTCTT 
1151 CCAGCGCATA GAACACTTAA TTGGGAAGAA ACTACCAGGT TTTCCAACAC 
1201 AGGATGATGA GGTTATGATG CTGACAGAAC GCGTCGCTGA AGCCCAAAGG 
1251 TTTGCCCGAA TGGAGTTAAG GGAGCATGGA GAAAAGAAGA AACGCTCGCG 
1301 AGAGGATGCT GGAGATAATG ATGACACAGA GGGTGCTATT GGTGTCAGGA 
1351 ACAAGGTGGC TGGAGGAAAA ATGAAGAAGC GGAAAGGCCG TTAATCACTT 
1401 TTATGAAGGC TCGAGTTCTG CTGTTCTGTA AAAGAAAATT GGAGAATGAA 
1451 ACCTGCTCCA ACAGAGATCA TGAGACTGAA ATTGGTCAGA ATTGTGTCCA 
1501 GAATGTGCTC AGCTAATTCA GTATTCTTCC CCATTCTGGG TTGGAGTTTA 
1551 CTGCAGAGTA ATTCTTACAG TGCTGATGTC AAGACTGTTA CTGTTCTTCG 
1601 ACTTTGATTC CTTGCTCATG ACATGAGTAG GGTGTGCTCT TCTGTCACTT 
1651 CACACAGACC TTTTGCCTTT TTTAGCTGCA AGTCAAGGAC TAGGTTGATG 
1701 ATGCCCATGA CCTGTAATTG TAAAGAAGCT TGGACATCTG CAAATGATAT 
1751 TTAAACCATC TTGGCTTGTG CTTTATTCAA ACTAATGTGA AACAATAAAT 
1801 TTAAATATTA TTTTTAAAAG AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 27 bp to 1391 bp; peptide length: 455 
Category: strong similarity to known protein 



1 MAAPEEHDSP 
51 QIEAIPLALQ 
101 TRELAFQISE 
151 GRLIDHLENT 
201 TFLFSATMTK 
251 DTYLVYILNE 
301 KRLGSLNKFK 
351 VGRTARAGRS 
4 01 ERVAEAQRFA 
451 KRKGR 



TEASQPIVEE 
GRDIIGLAET 
QFEALGSSIG 
KGFNLRALKY 
KVQKLQRAAL 
LAGNSFMIFC 
AKARSILLAT 
GKAITFVTQY 
RMELREHGEK 



EETKTFKDLG 
GSGKTGAFAL 
VQSAVIVGGI 
LVMDEADRIL 
KNPVKCAVSS 
STCNNTQRTA 
DVASRGLDTP 
DVELFQRIEH 
KKRSREDAGD 



VTDVLCEACD 
PILNALLETP 
DSMSQSLALA 
NMDFETEVDK 
KYQTVEKLQQ 
LLLRNLGFTA 
HVDVVVNFDI 
LIGKKLPGFP 
NDDTEGAIGV 



QLGWTKPTKI 
QRLFALVLTP 
KKPHI1IATP 
ILKVIPRDRK 
YYIFIPSKFK 
IPLHGQMSQS 
PTHSKDYIHR 
TQDDEVMMLT 
RNKVAGGKMK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf br2_6ol7, frame 3 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans, N - 1, Score = 1497, P = 1.6e-153 

PIR:S46713 hypothetical protein YHR0 65c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 1154, P = 3.6e-117 

TREMBL:ATH010462_1 gene: "RH10"; product: "RNA helicase"; Arabidopsis 
thaliana mRNA for dead box rna helicase, RH10, N — 1, Score = 1122, P = 
8.9e-114 



TREMBL:AC002985_2 product: "R27090_2"; Human DNA from chromosome 

19-specific cosmid R27090, genomic sequence, complete sequence., N = 1, 
Score = 950, P = 1.5e-95 

>PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans 

Length = 489 



HSPs: 



Score = 1497 (224.6 bits), Expect = 1.6e-153, P = 1.6e-153 
Identities - 283/442 (64%), Positives = 364/442 (82%) 



Query: 


19 


EEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAIPLALQGRDIIGLAETGSGKTGAF 


78 






E+ + K+F +LGV+- LC+AC +LGW KP+KIQ A+P ALQG+D+IGLAETGSGKTGAF 




Sbjct: 


39 


EDVKEKSFAELGVSQPLCDACQRLGWMKPSKIQQAALPHALQGKDVIGLAETGSGKTGAF 


98 


Query: 


79 


ALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIGVQSAVIVGGIDSMSQSLA 


138 






A+P+L +LL+ PQ F LVLTPTRELAFQI +QFEALGS IG+ +AVIVGG+D +Q++A 




Sbjct: 


99 


AIPVLQSLLDHPQAFFCLVLTPTRELAFQIGQQFEALGSGIGLIAAVIVGGVDMAAQAMA 


158 


Query: 


139 


LAKKPHII I ATPGRLI DHLENTKGFNLRALKYLVMDEADRILNMDFETEVDKILKVIPRD 


198 






LA++PHII+ATPGRL+DHLENTKGFNL+ALK+L+MDEADRILNMDFE E+DKILKVIPR+ 




Sbjct: 


159 


LARRPHII VATPGRLVDHLENTKGFNLKALKFLIMDEADRILNMDFEVELDKILKVIPRE 


218 


Query: 


199 


RKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQYYIFI PSKFKDTYLVYIL 


258 






R+T+LFSATMTKKV KL+RA+L++P + +VSS+Y+TV+ L+Q+YIF+P+K+K+TYLVY+L 




Sbjct: 


219 


RRTYLFSATMTKKVSKLERASLRDPARVSVSSRYKTVDNLKQHYIFVPNKYKETYLVYLL 


278 


Query: 


259 


NELAGNSFMIFCSTCNNTQRTALLLRNLGFTAIPLHGQMSQSKRLGSLNKFKAKARSILL 


318 






NE AGNS ++FC+TC T + A++LR LG A+PLHGQMSQ KRLGSLNKFK+KAR IL+ 




Sbjct: 


279 


NEHAGNSAIVFCATCATTMQIAVMLRQLGMQAVPLHGQMSQEKRLGSLNKFKSKAREILV 


338 


Query: 


319 


ATDVASRGLDIPHVDVVVNFDIPTHSKDYIHRVGRTARAGRSGKAITFVTQYDVELFQRI 


378 






TDVA+RGLDIPHVD+V+N+D+P+ SKDY+HRVGRTARAGRSG AIT VTQYDVE +Q+I 




Sbjct: 


339 


CTDVAARGLDIPHVDMVINYDMPSQSKDYVHRVGRTARAGRSGIAITVVTQYDVEAYQKI 


398 


Query: 


379 


EHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEKKK RSREDAGDNDD 


433 






E +GKKL + ++EVM+L ER EA AR+E++E EKKK R +D GD + + 




Sbjct: 


399 


EANLGKKLDEYKCVENEVMVLVERTQEATENARIEMKEMDEKKKSGKKRRQNDDFGDTEE 


458 



Query: 434 TEGAIGVRNKVAGGKMKKRKGR 455 
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+ G + K GG+ GR 
Sbjct: 459 SGGRFKMGI KSMGGRGGSGGGR 480 



Pedant information for DKFZphfbr2_6ol7, frame 3 



Report for DKFZphfbr2_6ol7 . 3 



[LENGTH] 


455 






[MW] 


50S46.80 






[pi ] 


9.18 






[HOMOL ] 


PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis elegans 


le-1 67 








[ FUNCAT] 


04.01.04 rrna processing [S. cerevisiae, YHR065c] le-127 






t c U W U A 1 J 


30.10 nuclear organization [S. cerevisiae, YHR065c] le-127 






t r UNCAT J 


04.99 other transcription activities [S. cerevisiae, YHR169w] 


2e- 


■79 


L £ UNCA1 J 


06.10 assembly of protein complexes [S. cerevisiae, YLL008wJ 


le- 


•71 


[ FUNCAT] 


04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 


4e- 


•66 


[ FUNCAT] 


j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] le-63 


[ FUNCAT] 


09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 


le- 


■58 


[ FUNCAT] 


04.05.03 mrna processing (splicing) [S. cerevisiae, YDL084w] 


le- 


■55 


[ FUNCAT] 


05.04 translation (initiation, elongation and termination) [S 


. cerevisiae, 


YOR204w] 5e- 


-55 






[ FUNCAT] 


30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 


5e- 


-55 


[FUNCAT] 


1 genome replication, transcription, recombination and repair 




[H. 


influenzae. 


HI0892] 9e-48 






[ FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YLR276c] 


2e- 


-45 


[ FUNCAT] 


30.16 mitochondrial organization [S. cerevisiae, YDR194c] 


4e- 


•42 


[ FUNCAT] 


99 unclassified proteins [S. cerevisiae, YGL064c] 7e-16 






[ FUNCAT] 


03.19 recombination and dna repair [S. cerevisiae, YMR190c] 


7e- 


•12 


[ FUNCAT] 


11.10 cell death [S. cerevisiae, YMR190c] 7e-12 






[ FUNCAT ] 


r general function prediction [M. jannaschii, MJ1401] 5e-06 


[BLOCKS] 


BL00175B Phosphoglycerate mutase family phosphohistidine proteins 




[BLOCKS] 


BL00039D DEAD-box subfamily ATP-dependent helicases proteins 






[ BLOCKS] 


BL00039C DEAD-box subfamily ATP-dependent helicases proteins 






[ BLOCKS] 


BL00039B DEAD-box subfamily ATP-dependent helicases proteins 






[BLOCKS] 


BL00039A DEAD-box subfamily ATP-dependent helicases proteins 






[ PIRKW] 


nucleus 4e-60 






[ PIRKW ] 


RNA binding 7e-69 






[ PIRKW] 


DEAD box 7e-69 






[PIRKW] 


transmembrane protein 9e-41 






[ PIRKW] 


DNA binding 3e-55 






L PIRKW J 


recF recombination pathway 3e-ll 






[ PIRKW] 


ATP le-126 






I PIRKW j 


purine nucleotide binding 7e-69 






( PIRKW] 


P-loop le-126 






I PIRKW J 


hydrolase le-55 






[ P I RKW ] 


protein biosynthesis 7e-69 






[ P I RKW ] 


ATP binding 3e-61 






IbUrr AMJ 


ATP-dependent RNA helicase eIF-4A 8e-0 6 






[ S U P fc AM J 


WW repeat homology 4e-58 






L o Ur r An J 


translation initiation factor eIF-4A 7e-69 






LoUrf HP] J 


DEAD/H box helicase homology le-126 






[SUPFAM] 


recQ helicase homology 5e-12 






[SUPFAM] 


ATP-dependent RNA helicase homology 8e-06 






[SUPFAM] 


unassigned DEAD/H box helicases le-126 






[SUPFAM] 


ATP-dependent RNA helicase DBP1 4e-60 






[SUPFAM] 


ATP-dependent RNA helicase DHH1 le-58 






[SUPFAM] 


recQ protein 3e-ll 






[SUPFAM] 


tobacco ATP-dependent RNA helicase DB10 4e-58 






[SUPFAM] 


Bloom's syndrome helicase 5e-12 






[PROSITE] 


DEAD ATP HELICASE 1 






[PROSITE] 


ATP GTP A 1 






[ PROSITE] 


MYRISTYL 5 






[PROSITE] 


AMI DAT I ON 1 






[PROSITE] 


CAMP PHOSPHO SITE 1 






[PROSITE] 


CK2 PHOSPHO SITE 6 






[PROSITE] 


PKC PHOSPHO SITE 9 






[PROSITE] 


ASN GLYCOSYLATION 1 






[PFAM] 


Helicases conserved C-terminal domain 






[PFAM] 


DEAD and DEAH box helicases 






[KW] 


Alpha_Beta 







SEQ MAAPEEHDSPTEASQPI VEEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAI PLALQ 

PRD cccccccccccccccchhhhhhhhhhhccccchhhhhhhhhhcccccccccccccccccc 

SEQ GRDIIGLAETGSGKTGAFALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIG 

PRD ccceeeeeccccccceeehhhhhhhhcccccceeeeeeccchhhhhhhhhhhhhhhhhcc 
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SEQ VQSAVIVGGIDSMSQSLALAKKPHIIIATPGRLIDHLENTKGFNLRALKYLVMDEADRIL 

PRD eeeeeeeccchhhhhhhhhhccceeeeeccccccccccccccccccccceeehhhhhhhh 

SEQ NMDFETEVDKILKVIPRDRKTFLFSATMTKKVQKLQRAALKNPVKCAVSSKYQTVEKLQQ 

PRD hhcchhhhhhhhhhcccchhhhhhhhccchhhhhhhhhhhccceeeeeecccccchhhhh 

SEQ YYIFIPSKFKDTYLVYILNELAGNSFMIFCSTCNNTQRTALLLRNLGFTAIPLHGQMSQS 

PRD hhhhhhhhhhhhhhhhhhhhhccceeeeeeecchhhhhhhhhhhhcccceeeccccchhh 

SEQ KRLGSLNKFKAKARSILLATDVASRGLDIPHVDVVVNFDIPTHSKDYIHRVGRTARAGRS 

PRD hhhhhhhhhhhhhhhcchhhhhhhhcccccceeeeeecccccccceeeeecccccccccc 

SEQ GKAITFVTQYDVELFQRIEHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEK 

PRD cceeeeeecchhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KKRSREDAGDNDDTEGAIGVRNKVAGGKMKKRKGR 

PRD hhhhccccccccccccccccccccccccccccccc 



Prosite for DKFZph£br2_6ol7 . 3 



PS00001 


274->278 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


421->425 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


25->28 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


72->75 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


209->212 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


229->232 


PKC PHOSPHO" 


SITE 


PDOC0000 5 


PS00005 


276->279 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


300->303 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


354->357 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


360->363 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


400->403 


PKC PHOSPHO 


"site 


PDOCC'0005 


PS00006 


9->13 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


25->29 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


186->190 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


368->372 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


391->395 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


424->428 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


66->72 


MYRISTYL 




PDOC00008 


PS00008 


71->77 


MYRISTYL 




PDOC00008 


PS00008 


116->122 


MYRISTYL 




PDOC00008 


PS00008 


120->126 


MYRISTYL 




PDOC00008 


PS00008 


128->134 


MYRISTYL 




PDOC00008 


PS00009 


382->386 


AMIDATION 




PDOC00009 


psooon 


68->76 


ATP GTP A 




PDOCC0017 


PS00039 


172->181 


DEAD ATP HELICASE 


PDOC00039 



Pfam for DKFZphf br2_6o!7 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



DEAD and DEAH box helicases 



30 



*gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
G ++ ++++++++G++KPT+IQ +AIP++L+GRD+++ A TGSGKT+AF 
GVTDVLCEACDQLGWTKPTKIQIEAIPLALQGRDIIGLAETGSGKTGAF 



78 



1 1 PMLQHIDwdPWpqp PQdPrALI LAPTRELAMQI QEEcRkFgkHMngI R 
++P+L ++++P + ++AL+L+PTRELA QI+E+++++G++++ ++ 

79 ALPILNALLETP QR-LFALVLTPTRELAFQISEQFEALGSSIG-VQ 122 

Imcl YGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIER. gtldLDrleML 
+++I+GG + + Q L+++P HI+IATPGRLIDH+E+ ++L+++++L 
123 SAVIVGGIDSMSQSLALAKKP-HIIIATPGRLIDHLENTKGFNLRALKYL 171 

VMDEADRMLDMGFI DQI Rr IMrql PMpwNRQTMMFSATMPde IqELARrF 
VMDEADR+L+M+F+ ++++I++ IP ++R T +FSATM++++Q+L+R+ 
172 VMDEADRILNMDFETEVDKILKVIP — RDRKTFLFSATMTKKVQKLQRAA 219 

MRNPIRlnldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
++NP+ ++ ++++T++ ++Q+YI+++ + K +L+++++ 
220 LKNPVKCAVSSKYQTVE-KLQQYYIFIP-SKFKDTYLVYILN 259 



HMM_NAME Helicases conserved C-terminal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
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++ + L+NLG++++ +HG+M+Q +R+ +++F++ +L++TDV++R 
Query 277 QRTALLLRNLGFTAIPLHGQMSQSKRLGSLNKFKAKARSILLATDVASR 325 

HMM GIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

G+DIP V++V+N+D+P ++ +YI+R+GRT+R+G 
Query 326 GLDIPHVDVVVNFDIPTHSKDYIHRVGRTARAG 358 
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DKFZphfbr2_71o20 



group: brain derived 

DKFZphfbr2_71o20 encodes a novel 232 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC006186 (3 exons) 

Sequenced by GBF 

Locus: /map=" 10q22 . 1 " 

Insert length: 1768 bp 

Poly A stretch at pos. 1742, polyadenylation signal at pos. 1726 



1 GGGGGCAGCA GGCCAAGGGG GAGGTGCGAG CGTGGACCTG GGACGGGTCT 
51 GGGCGGCTCT CGGTGGTTGG CACGGGTTCG CACACCCATT CAAGCGGCAG 
101 GACGCACTTG TCTTAGCAGT TCTCGCTGAC CGCGCTAGCT GCGGCTTCTA 
151 CGCTCCGGCA CTCTGAGTTC ATCAGCAAAC GCCCTGGCGT CTGTCCTCAC 
201 CATGCCTAGC CTTTGGGACC GCTTCTCGTC GTCGTCCACC TCCTCTTCGC 
251 CCTCGTCCTT GCCCCGAACT CCCACCCCAG ATCGGCCGCC GCGCTCAGCC 
301 TGGGGGTCGG CGACCCGGGA GGAGGGGTTT GACCGCTCCA CGAGCCTGGA 
351 GAGCTCGGAC TGCGAGTCCC TGGACAGCAG CAACAGTGGC TTCGGGCCGG 
401 AGGAAGACAC GGCTTACCTG GATGGGGTGT CGTTGCCCGA CTTCGAGCTG 
451 CTCAGTGACC CTGAGGATGA ACACTTGTGT GCCAACCTGA TGCAGCTGCT 
501 GCAGGAGAGC CTGGCCCAGG CGCGGCTGGG CTCTCGACGC CCTGCGCGCC 
551 TGCTGATGCC TAGCCAGTTG GTAAGCCAGG TGGGCAAAGA ACTACTGCGC 
601 CTGGCCTACA GCGAGCCGTG CGGCCTGCGG GGGGCGCTGC TGGACGTCTG 
651 CGTGGAGCAG GGCAAGAGCT GCCACAGCGT GGGCCAGCTG GCACTCGACC 
701 CCAGCCTGGT GCCCACCTTC CAGCTGACCC TCGTGCTGCG CCTGGACTCA 
751 CGACTCTGGC CCAAGATCCA GGGGCTGTTT AGCTCCGCCA ACTCTCCCTT 
801 CCTCCCTGGC TTCAGCCAGT CCCTGACGCT GAGCACTGGC TTCCGAGTCA 
851 TCAAGAAGAA GCTGTACAGC TCGGAACAGC TGCCCATTGA GGAGTGTTGA 
901 ACTTCAACCT GAGGGGGCCG ACAGTGCCCT CCAAGACAGA GACGACTGAA 
951 CTTTTGGGGT GGAGACTAGA GGCAGGAGCT GAGGGACTGA TTCCAGTGGT 
1001 TGGAAAACTG AGGCAGCCAC CTAAAGTGGA GGTGGGGGAA TAGTGTTTCC 
1051 CAGGAAGCTC ATTGAGTTGT GTGCGGGTGG CTGTGCATTG GGGACACATA 
1101 CCCCTCAGTA CTGTAGCATG AAACAAAGGC TTAGGGGCCA ACAAGGCTTC 
1151 CAGCTGGATG TGTGTGTAGC ATGTACCTTA TTATTTTTGT TACTGACAGT 
1201 TAACAGTGGT GTGACATCCA GAGAGCAGCT GGGCTGCTCC CGCCCCAGCC 
1251 TGGCCCAGGG TGAAGGAAGA GGCACGTGCT CCTCAGAGCA GCCGGAGGGA 
1301 AGGGGGAGGT CGGAGGTCGT GGAGGTGGTT TGTGTATCTT ACTGGTCTGA 
1351 AGGGACCAAG TGTGTTTGTT GTTTGTTTTG TATCTTGTTT TTCTGATCGG 
1401 AGCATCACTA CTGACCTGTT GTAGGCAGCT ATCTTACAGA CGCATGAATG 
1451 TAAGAGTAGG AAGGGGTGGG TCTCAGGGAT CACTTGGGAT CTTTGACACT 
1501 TGAAAAATTA CACCTGGCAG CTGCGTTTAA GCCTTCCCCC ATCGTGTACT 
1551 GCAGAGTTGA GCTGGCAGGG GAGGGGCTGA GAGGGTGGGG GCTGGAACCC 
1601 CTTCCCGGGA GGAGTGCCAT CTGGGTCTTC CATCTAGAAC TGTTTACATG 
1651 AAGATAAGAT ACTCACTGTT CATGAATACA CTTGATGTTC AAGTATTAAG 
1701 ACCTATGCAA TATTTTTTAC TTTTCTAATA AACATGTTTG TTAAAACAAA 
1751 AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry AC006186 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 10 clone 
CRI-JC2048 map 10q22.1; HTGS phase 1, 4 unordered pieces. 
Score = 6512, P = 0.0e+00, identities = 1326/1345 

3 exons 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 697 bp; peptide length: 232 
Category: putative protein 



1 MPSLWDRFSS SSTSSSPSSL PRTPTPDRPP RSAWGSATRE EGFDRSTSLE 

51 SSDCESLDSS NSGFGPEEDT AYLDGVSLPD FELLS DPEDE HLCANLMQLL 

101 QESLAQARLG SRRPARLLMP SOLVSQVGKE LLRLAYSEPC GLRGALLDVC 

151 VEQGKSCHSV GQLALDPSLV PTFQLTLVLR LDSRLWPKIQ GLFSSANSPF 

201 LPGFSOSLTL STGFRVIKKK LYSSEQLPIE EC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_71o20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_71o20, frame 1 



Report for DKFZphf br2_71o20 . 1 



[LENGTH] 232 

[MW] 25354.60 

[pi] 4.37 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] GLYCOS AMI MOGLYCAN 1 

[ PROSITE] PKC_PHOSPHO_SITE 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 17.67 i 

SEQ MPSLWDRFSSSSTSSSPSSLPRT PTPDRPPRSAWGSATREEGFDRSTSLESSDCESLDSS 
SEG XXXXXXXXXXXXXXXXXXXXXXXX xxxxxxxxxxxxxxx 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ NSGFGPEEDT AYLDGVSLPDFELLSDPEDEHLCANLMQLLQESLAQARLGSRRPARLLMP 

SEG xx 

PRD cccccccccccccocccccceeeccccccchhhhhhhhhhhhhhhhhhccccccceeecc 

SEQ SQL VSQVGKELLRLAYSE PC GLRGALLDVC VEQGKSCHSVGQLALDPSLVPTFQLTLVLR 

SEG 

PRD ccccchhhhhhhhhhhcccccchhhhhhhhccccccccccccccccccccchhhhhhccc 

SEQ LDSRLWPKIQGLFSSANSPFLPGFSQSLTLSTGFRVIKKKLYSSEQLPIEEC 

SEG 

PRD cccccccccccccccccccccccccceeeecccccccccccccccccccccc 



Prosite for DKFZphfbr2_71o20 . 1 



PS00002 


62->66 


GLYCOS AMI NOGLYC AN 


PDOC00002 


PS00005 


111-V114 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


3->7 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


38->42 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


47->51 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


52->56 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


77->81 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


85->89 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


141->147 


MYRISTYL 




PDOC00008 


PS00008 


191->197 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_71o20 . 1 ) 
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DKFZphfbr2_72bl8 



group: nucleic acid management 

DKFZphfbr2_72bl8 encodes a novel 715 amino acid protein with similarity to E. coli DNA-damage- 
inducibile protein dinP and other proteins induced by DNA-damage. 

The novel protein is similar to dinP of E. coli, yqjH of B. subtilis, dinP of M. tuberculosis 
and T19K24.15 of ft. thaliana. The dinB/P pathway is a second SOS-pathway in E. coli. Therefore 
the new gene seems to be involved in DNA repair. 

The new protein can find application in modulating DNA repair and mutagenesis. 



similarity to DNA damage induced genes 

complete cDNA, complete cds, potential start at Bp 49, EST hits 
localisation primer site B is missing! 

Sequenced by LMU 

Locus: /map="416.0 cR from top of Chrl8 linkage group"?? 
Insert length: 2475 bp 

Poly A stretch at pos . 2452, polyadenylation signal at pos . 2431 



1 GGGGGAGGAA GGCGGCGGCG ACGACGAGGA AGACGCCGAG GCCTGGGCCA 
51 TGGAACTGGC GGACGTGGGG GCGGCAGCCA GCTCGCAGGG AGTTCATGAT 
101 CAAGTGTTGC CCACACCAAA TGCTTCATCC AGAGTCATAG TACATGTGGA 
151 TCTGGATTGC TTTTATGCAC AAGTAGAAAT GATCTCAAAT CCAGAGCTAA 
201 AAGACAAACC TTTAGGGGTT CAACAGAAAT ATTTGGTGGT TACCTGCAAC 
251 TATGAAGCTA GGAAACTTGG AGTTAAGAAA CTTATGAATG TCAGAGATGC 
301 AAAAGAAAAG TGTCCACAGT TGGTATTAGT TAATGGAGAA GACCTGACCC 
351 GCTACAGAGA AATGTCTTAT AAGGTTACAG AATTACTGGA AGAATTTAGT 
401 CCAGTTGTTG AGAGACTTGG ATTTGATGAA AATTTTGTGG ATCTAACAGA 
4 51 AATGGTTGAG AAGAGACTAC AGCAGCTGCA AAGTGATGAA CTTTCTGCGG 
501 TGACTGTGTC GGGTCATGTA TACAATAATC AGTCTATAAA CCTGCTTGAC 
551 GTCTTGCACA TCAGACTACT TGTTGGATCT CAGATTGCAG CAGAGATGCG 
601 GGAAGCCATG TATAATCAGT TGGGGCTCAC TGGCTGTGCT GGAGTGGCTT 
651 CTAATAAACT GTTGGCAAAA TTAGTTTCTG GTGTCTTTAA ACCAAATCAA 
701 CAAACAGTCT TATTACCTGA AAGTTGTCAA CATCTTATTC ATAGTTTGAA 
751 TCACATAAAG GAAATACCTG GTATTGGCTA TAAAACTGCC AAATGTCTTG 
801 AAGCACTGGG TATCAATAGT GTGCGTGATC TCCAAACCTT TTCACCCAAA 
851 ATTTTAGAAA AAGAATTAGG AATTTCAGTT GCTCAGCGTA TCCAAAAGCT 
901 CAGTTTTGGA GAGGATAACT CCCCTGTGAT ACTCTCAGGA CCACCTCAGT 
951 CCTTTAGTGA AGAAGATTCA TTTAAAAAAT GTACATCTGA AGTTGAAGCT 
1001 AAAAATAAGA TTGAAGAACT ACTTGCTAGT CTTTTAAACA GAGTATGCCA 
1051 AGATGGAAGG AAGCCTCATA CAGTGAGATT AATAATCCGT CGGTATTCCT 
1101 CTGAGAAGCA CTATGGTCGT GAGAGTCGTC AGTGCCCTAT TCCTTCACAT 
1151 GTAATTCAGA AATTAGGGAC AGGAAATTAT GATGTGATGA CCCCAATGGT 
1201 TGATATACTT ATGAAACTTT TTCGAAATAT GGTGAATGTG AAGATGCCAT 
1251 TTCACCTTAC CCTTCTAAGT GTGTGCTTCT GCAACCTTAA AGCACTAAAT 
1301 ACTGCTAAGA AAGGGCTTAT TGATTATTAT TTAATGCCAT CATTATCAAC 
1351 TACTTCACGC TCTGGCAAGC ACAGTTTTAA AATGAAAGAC ACTCATATGG 
1401 AAGATTTTCC CAAAGACAAA GAAACAAACC GGGATTTCCT ACCAAGTGGA 
1451 AGAATTGAAA GTACAAGAAC TAGGGAGTCT CCACTAGATA CCACAAATTT 
1501 TTCTAAAGAA AAAGACATTA ATGAATTCCC ACTCTGTTCA CTTCCTGAAG 
1551 GTGTTGACCA AGAAGTCTCC AAGCAGCTTC CAGTAGATAT TCAAGAAGAA 
1601 ATCCTTTCTG GAAAATCTAG GGAAAAATTT CAAGGGAAAG GAAGTGTGAG 
1651 TTGTCCATTA CATGCCTCTA GAGGAGTATT ATCTTTCTTT TCTAAAAAAC 
17 01 AAATGCAAGA TATTCCCATA AATCCTAGAG ATCATTTATC CAGTAGCAAA 
1751 CAGGTATCCT CTGTATCTCC TTGTGAACCG GGAACATCAG GCTTTAATAG 
1801 CAGTAGTTCT TCTTACATGT CTAGCCAAAA GGATTATTCA TATTATTTAG 
1851 ATAATAGATT AAAAGATGAA CGAATAAGTC AAGGACCTAA AGAACCTCAA 
1901 GGATTCCACT TTACAAATTC AAACCCTGCT GTGTCTGCTT TTCATTCATT 
1951 TCCAAACTTG CAGAGTGAGC AACTTTTCTC CAGAAACCAC ACTACAGATA 
2001 GCCATAAGCA AACAGTAGCA ACAGACTCTC ATGAAGGACT TACAGAAAAT 
2051 AGAGAGCCAG ATTCTGTTGA TGAGAAAATT ACTTTCCCTT CTGACATTGA 
2101 TCCTCAAGTT TTCTATGAAC TACCAGAAGC AGTACAAAAG GAACTGCTGG 
2151 CAGAGTGGAA GAGAACAGGA TCAGATTTCC ACATTGGACA TAAATAAGCA 
2201 TATTCAGCAA AAAGGTCTGA AAAGCAAGGG AATACCATTA TTTTCGGATT 
2251 AGCGGTTTAT TAAGCTCTTC TATATTAAAC ACTAATAGAT ATTCAATAAC 
2301 GGAGTAAACT GTTCCAGATA AAGCAAGAAT AGTTGCAAGA AGTAAATTCT 
2 351 GGCACAAAGC GTAAAAATAT AACAGAAGAA ATAATGTAAA ATACTATCTT 
2401 TTATGTCTAA AGCCATTTTA TATTACTTTT CAATAAAAAG AATATCATGG 
2451 TCAAAAAAAA AAAAAAAAAA AAAAC 



BLAST Results 
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Entry HS086339 from database EMBL: 
human STS WI-11064. 
Score = 1523, P = 3.0e-64, identities = 327/343 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2194 bp; peptide length: 715 
Category: similarity to known protein 



1 MELADVGAAA 
51 KDKPLGVQQK 
101 RYREMSYKVT 
151 VTVSGHVYNN 
201 SNKLLAKLVS 
251 EALGINSVRD 
301 SFSEEDSFKK 
351 SEKHYGRESR 
401 FHLTLLSVCF 
451 EDFPKDKETN 
501 GVDQEVSKQL 
551 QMQDIPINPR 
601 DNRLKDERIS 
651 SHKQTVATDS 
701 AEWKRTGSDF 



SSQGVHDQVL 
YLVVTCNYEA 
ELLEEFSPVV 
QSINLLDVLH 
GVFKPNQQTV 
LQTFSPKILE 
CTSEVEAKNK 
QCPIPSHVIQ 
CNLKALNTAK 
RDFLPSGRIE 
PVDIQEEILS 
DHLSSSKQVS 
QGPKEPQGFH 
HEGLTENREP 
HIGHK 



PTPNASSRVI 
RKLGVKKLMN 
ERLGFDENFV 
IRLLVGSQIA 
LLPESCQHLI 
KELGISVAQR 
IEELLASLLN 
KLGTGNYDVM 
KGLIDYYLMP 
STRTRESPLD 
GKSREKFQGK 
SVSPCEPGTS 
FTNSNPAVSA 
DSVDEKITFP 



VHVDLDCFYA 
VRDAKEKCPQ 
DLTEMVEKRL 
AEMREAMYNQ 
HSLNHIKEIP 
IQKLSFGEDN 
RVCQDGRKPH 
TPMVDTLMKI, 
SLSTTSRSGK 
TTNFSKEKDI 
GSVSCPLHAS 
GFNSSSSSYM 
FHSFPNLQSE 
SDIDPQVFYE 



QVEMISNPEL 
LVLVNGEDLT 
QQLQSDELSA 
LGLTGCAGVA 
GIGYKTAKCL 
SPVILSGPPQ 
TVRLIIRRYS 
FRNMVNVKMP 
HSFKMKDTHM 
NEFPLCSLPE 
RGVLSFFSKK 
SSQKDYSYYL 
QLFSRNHTTD 
LPEAVQKELL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72bl8, frame 2 

PIR:H64747 DNA-damage-inducibile protein dinP - Escherichia coli, N = 
2, Score = 212, P = 4.2e-27 

PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis, 
N = 2, Score = 230, P = 5.2e-26 



>PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis 
Length = 414 

HSPs : 



Score 


= 230 


(34.5 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 




Identities ■ 


= 47/112 (41%), Positives = 73/112 (65%) 




Query: 


27 


SRVIVHVDLDCFYAQVEMISNPELKDKPLGV QQKYLVVTCNYEARKLGVKKLMNV 


81 






SR+I H+D++ FYA VEM +P L+ KP+ V ++K +VVTCH-YEAR GVK M V 




Sbjct: 


5 


SRIIFHI DMNSFYASVEMAYDPALRGKPVAVAGNVKERKGI VVTCSYEARARGVKTTMPV 


64 


Query: 


82 


RDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFSPVVERLGFDENFVDLTE 134 








AK CP+L++4- + RYR S + +L E++ +VE + DE + + D+T+ 




Sbjct: 


65 


WQAKRHCPELIVLP-PNFDRYRNSSRAMFTILREYTDLVEPVSIDEGYMDMTD 116 




Score 


= 137 


(20.6 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 




Identities = 


= 43/148 (29%), Positives = 75/148 (50%) 




Query: 


178 


QIAAEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIK 


237 






+ A E++ + +L L G+A NK LAK+ S + KP T+L ++ L + 




Sbjct: 


125 


ETAKEIQSRLQKELLLPSSIGIAPNKFLAKMASDMKKPLGITILRKRQVPDILWPLP-VG 


183 


Query: 


238 


EIPGIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSG 


297 






E+ G+G KTA+ L+ LGI+++ +L L++ LGI+ R++ + G ++PV 




Sbjct: 


184 


EMHGVGKKTAEKLKGLGIHTIGELAAADEHSLKRLLGIN-GPRLKNKANGIHHAPV 


238 


Query: 


298 


PPQSFSEEDSFKKCTSEVEAKNKIEELL 325 





P+ E S ++ + EELL 
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Sbjct: 239 DPERI YEFKSVGNSSTLSHDSSDEEELL 266 

Pedant information for DKFZphfbr2_72bl8, frame 2 
Report for DKFZphfbr2_72bl8 . 2 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 



715 

80300.63 
6.37 

TREMBL : S PBC1 6A3 



S.pombe chromosome II cosmid c!6A3. 5e-30 



11 gene: "SPBC16A3.il"; product: "hypothetical protein" 



[ FUNCAT ] 

repair) 

[FUNCAT] 

genitalium, 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YDR419w) 2e-15 

1 genome replication, transcription, recombination and repair [M. 
MG360] 3e-13 

SOS mutagenesis 2e-ll 
DNA repair 2e-ll 
induced mutagenesis 2e-ll 
umuC protein 3e-29 
MYRISTYL 6 
AMIDATION 1 
CAMP_PHOSPHO_SITE 2 
CK2_PHOSPHO_SITE 15 
PROKAR_LIPOPROTEIN 1 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 21 
ASN_GLYCOS YLATION 5 
Alpha_Beta 

LOW COMPLEXITY 4.20% 



SEQ MELADVGAAASSQGVHDQVLPTPNASSRVI VHVDLDCFYAQVEMI SNPELKDKPLGVQQK 

SEG 

PRD ccceeeeeeecccccceeeccccccceeeeeeeccchhhhhhhhhccccccccceeeecc 

SEQ YLVVTCNYEARKLGVKKLMNVRDAKEKCPQLVLVNGEDLTRYREMSYKVTELLEEFSPVV 

SEG 

PRD ceeeehhhhhhhhhhcccchhhhhhhhccceeeeccccccchhhhhhhhhhhhhhhccce 

SEQ ERLGFDENFVDLTEMVEKRLQQLQSDELSAVTVSGHVYNNQSINLLDVLHIRLLVGSQIA 

SEG 

PRD eeeccchhhhhhhhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhhhhhhhhh 

SEQ AEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIKEIP 

SEG 

PRD hhhhhhhhhhhcceeeeccchhhhhhhhhhhhhcccceeeeecchhhhhhhhhccccccc 

SEQ GIGYKTAKCLEALGINSVRDLQTFSPKILEKELGISVAQRIQKLSFGEDNSPVILSGPPQ 

SEG 

PRD ccchhhhhhhhhhccccchhhhhhhhhhhhhhccchhhhhhhhhhcccccceeeeccccc 

SEQ SFSEEDSFKKCTSEVEAKNKIEELLASLLNRVCQDGRKPHTVRLI IRRYSSEKHYGRESR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccccceeeehhhhhhhhhhhcccc 

SEQ QCPIPSHVIQKLGTGNYDVMT PMVDILMKLFRNMVNVKMPFHLTLLSVCFCNLKALNTAK 

SEG 

PRD ccccccceeeeccccccccchhhhhhhhhhhhhhhhhcccceeeeeeeeechhhhhhhhh 

SEQ KGLI DYYLMPSLSTTSRSGKHSFKMKDTHMEDFPKDKETNRDFLPSGRIESTRTRESPLD 

SEG 

PRD hhhheeeecccccccccccccceeeccccccccccccccccccccccccccccccccccc 

SEQ TTNFSKEKDINEFPLCSLPEGVDQEVSKQLPVDIQEEILSGKSREKFQGKGSVSCPLHAS 

SEG 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhcccceeeeecccccccchhhh 

SEQ RGVLSFFSKKQMQDI PINPRDHLSSSKQVSSVSPCEPGTSGFHSSSSS YMSSQKDYSYYL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . 

PRD hcccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhh 

SEQ DNRLKDERISQGPKEPQGFHFTNSNPAVSAFHSFPNLQSEQLFSRNHTTDSHKQTVATDS 

SEG 

PRD hhhhhhhhhhcccccccceeeeccccceeecccccccchhhhhhhccccccceeeeeecc 

SEQ HEGLTENREPDSVDEKITFPSDIDPQVFYELPEAVQKELLAEWKRTGSDFHIGHK 

SEG 

PRD ccccccccccccccccccccccccceeehhhhhhhhhhhhhhhhhcccccccccc 
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Prosite for DKFZphf br2_72bl8 . 2 



PbUUUUl 


2< 


l->28 


7AQM fZl VPTlQVT AT TAH 




nc? a n A A 1 
Pb U UU U X 


160- 


•>164 


acM (~t vrnQVT btthm 

/\oLN \j±j I ^{JO 1 JjH 1 ItJPJ 


pnnrfi n n n i 


PbUUUUl 


483- 


•>487 




pnnrfi n n n i 

trU\J\* \J UUUI 


PS00001 


583- 


->587 


11 CM |~T VrnCVT 7ITTAM 

AjW \jxj iUUo i Lin 1 J. UN 


trU\Jvj\J UUU -1- 


a a a a i 
PbUUUUl 


645- 


■>650 


AbiN CL I LUb X Lin I 1UN 


tr U vJL- U U \J U J. 


PSUUUU4 


309- 


■>313 


f— 7\ K/Ti U r"\C DUi"l CTTC 

LAMr rnUb rtiU ollt 


onrir nnnnd 


PbUUUU4 


347- 


->351 


CAMP rilUbfHU bile. 




PbUUUUD 


26->29 


PKC FnObrriO bllb 


IrlJUL-UUUUD 


Fbuuu uo 


106- 


•>109 


PKC FnObFriU bile, 


onnr nnnn^ 

cf\J\J\^,\J UU U D 


FbUUUUO 


201- 


•>204 


PKC rnUb rnU bl in 




PSUUUUo 


24 5- 


•>249 


PKC PHObPHO bl It, 


on#~»r" A A A A R 
FLJUL. U U U U J 


FbUUUUO 


257- 


•>260 


PKC rnUbrnU bl 1 1, 


ir LJUUUUUUO 


PS00005 


265- 


■>268 


P1\L FnUbrliU bi it 


rUUL UUUUj 


PSUUUUo 


307- 


>310 


PKL PHObPHU bilh 


FDUt-UUUUD 


t» o a a a a c 
PbUUUUD 


341- 


■>344 


P1S.L PrlUbrHU bile. 


DHAr A A A A R 


PS00005 


351- 


>354 


PKL PHOSPHO SILL 


r»ri/~»A' A A A A C 


PbUUUUD 


418- 


>421 


F1\L rHUbrnU bilh. 


trlJUL. UUUUj 


PbUUUUD 


435- 


>438 


PKL. PHObPHO bile. 


nnAA A A A A c 


PS00005 


438- 


■>441 


PKL PHObPHO 51 lb 


r>r\r\r* A A A A c 


PS00005 


442- 


■>445 


PKC PHObPHO SITE 


PDOCUUUUD 


PS00005 


459- 


•>462 


PKL_PnObPHO_bI I b 


DnAArt A A A c 
PDOCUUUUD 


PS00005 


466- 


>469 


PKC PHOSPHO SITE 


nnAp A A A A c 

PUOCUUUUO 


PS00005 


471- 


■>474 


PKL PHOSPHO bl IL 


nnAAA A A A C 

pdoluuuud 


PS00005 


520- 


>523 


PKC PHOSPHO SITE 


Ol-i A A A A C 

PDOCUUUUD 


PS00005 


548- 


■>551 


PKC PH0SPH0_SITE 


r"\j"\/^/^A AA A C 

PDOCOOUOb 


PS00005 


565- 


>568 


PKC fnubrHU b 1 I c. 


T"i r"\ /""i f A AAAR 

PDOCUUtJUD 


PS00005 


592- 


■>595 


PKC PHOSPHO SITE 


rirs/^A 1 A A A A c 

PDOCU UU Ub 


PS00005 


651- 


>654 


PKC PHOSPHO SITE 


r\j^\/^ A a a A 
PDOCU UUUj 


PS00006 


46->50 


CKz PHObPHO bllE 


r\ t-\ /-^. r"' A A A A £T 


PS00006 


257- 


■>261 


CKz PHObPnU bl lb 


PDOCUUUUD 


PS00006 


285- 


■>289 


n»n nunc HUH p Tmn 

LKz PHObPHO blib 


nnAArt A A A £ 

PDOCU UL U b 


PS00006 


301- 


■>305 


CKiJ PHOSPHO SITE 


T~n"^/~\r-*A AAA 

PDOCU UUU b 


PS00006 


303- 


•>307 


CK2 PHOSPHO SITE 


T»nAr , A A A A €L 

PDOCUUUUD 


PS00006 


313- 


>317 


CK2 PHOSPHO SITE 


PDOCUUUU b 


PS00006 


448- 


■>452 


CK^ PHObPHO bl lb 


nriAAA A A A £ 

rDUCUUU U O 


PSUUUUo 


459- 


■>463 


CK^ PHObPHO bITE 


PDuCUUb U b 


PS0000 6 


477- 


■>481 


CK^: PHOSPtiO bITE 


DnArA A A A C 

PDOCUUUUD 


PSOOO-06 


497- 


>501 


CK<£ PHOSPHO SITE 


PDUCUUUUo 


n o a a a a c 

PSUUUU C 


573- 


■>577 


CKZ fhUornj ollb 


PDOCU UUUD 


C A A A A £ 

PSUUUUo 


592- 


•>596 


CKZ PnUbPrlU bl Lb 


DnATfl A A A a 


PS00006 


672- 


■>676 


CK2~PH0SPH0 SITE 


PDOC00C06 


PS00006 


681- 


>685 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


706- 


>710 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


101- 


>108 


TYR PHOSPHO SITE 


PDOC0 00 0 7 


PS00007 


348- 


>356 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 




'->13 


MYRISTYL 


PDOC00008 


PS00008 


17 6- 


>182 


MYRISTYL 


PDOC00008 


PS00008 


192- 


>198 


MYRISTYL 


PDOC0000B 


PS00008 


198- 


>204 


MYRISTYL 


PDOC0 0C0 8 


PS00008 


274- 


>280 


MYRISTYL 


PDOC00008 


PS00008 


663- 


>669 


MYRISTYL 


PDOC00008 


PS00009 


335- 


>339 


AMI DAT I ON 


PDOC00009 


PS00013 


186- 


•>197 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphfbr2_72bl8 . 2) 
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DKFZphfbr2_72dl3 



group: brain derived 

DKFZphfbr2_72dl3 encodes a novel 165 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

seems to be testis specific 9 of 10 EST hits are from testis librarys 
Sequenced by LMU 
Locus : unknown 
Insert length: 723 bp 

Poly A stretch at pos. 704, no polyadenylation signal found 

1 AGGGGGGGTA TGGGGGAGGG GGAGACTCTG CAGGAGCCTA ATTCCCCACT 
51 CTGAGCTCAC CCTTCTGTCT GCCCGGGCCC TACCCCTTCC CCTACTCTCA 
101 CCCTTATAAT CCTTTTCAGC ACTAGGTCTT CCCGTCACCT CCACCTCTCT 
151 CCATGACCCG GCTCTGCTTA CCCAGACCCG AAGCACGTGA GGATCCGATC 
2 01 CCAGTTCCTC CAAGGGGCCT GGGTGCTGGG GAGGGGTCAG GTAGTCCAGT 

2 51 GCGTCCACCT GTATCCACCT GGGGCCCTAG CTGGGCCCAG CTCCTGGACA 
301 GTGTCCTATG GCTGGGGGCA CTAGGACTGA CAATCCAGGC AGTCTTTTCC 

3 51 ACCACTGGCC CAGCCCTGCT GCTGCTTCTG GTCAGCTTCC TCACCTTTGA 

4 01 CCTGCTCCAT AGGCCCGCAG GTCACACTCT GCCACAGCGC AAACTTCTCA 
4 51 CCAGGGGCCA GAGTCAGGGG GCCGGTGAAG GTCCTGGACA GCAGGAGGCT 
501 CTACTCCTGC AAATGGGTAC AGTCTCAGGA CAACTTAGCC TCCAGGACGC 
551 ACTGCTGCTG CTGCTCATGG GGCTGGGCCC GCTCCTGAGA GCCTGTGGCA 
601 TGCCCTTGAC CCTGCTTGGC CTGGCTTTCT GCCTCCATCC TTGGGCCTGA 
651 GAGCCCCTCC CCACAACTCA GTGTCCTTCA AATATACAAT GACCACCCTT 
7 01 CTTCAAAAAA AAAAAAAAAA AAC 



BLAST Results 



Entry HS860F19 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone B60F19 
Score = 2059, P = l.le-85, identities = 423/434 
2 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 153 bp to 647 bp; peptide length: 165 
Category: putative protein 
Classification: no clue 



1 MTRLCLPRPE AREDPIPVPP RGLGAGEGSG SPVRPPVSTW GPSWAQLLDS 

51 VLWLGALGLT IQAVFSTTGP ALLLLLVSFL TFDLLHRPAG HTLPQRKLLT 

101 RGQSQGAGEG PGQQEALLLQ MGTVSGQLSL QDALLLLLMG LGPLLRACGM 

151 PLTLLGLAFC LHPWA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72dl3, frame 3 
No Alert BLASTP hits found 
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Pedant information for DKFZphfbr2_72dl3, frame 3 



Report for DKFZphf br2_72dl3 . 3 

[LENGTH] 165 

[MW] 17393.73 

[pi] 7.80 

[BLOCKS] BL00068A Malate dehydrogenase proteins 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 29.70 % 

SEQ MTRLCLPRPEAREDPIPVPPRGLGAGEGSGSPVRPPVSTWGPSWAQLLDSVLWLGALGLT 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhcccccc 
MEM 

SEQ IQAVFSTTGPALLLLLVSFLTFDLLHRPAGHTLPQRKLLTRGQSQGAGEGPGQQEALLLQ 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxx . . . . 

PRD eeeecccccchhhhhhhhhhhhhhccccccccccccccccccccccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MGTVSGQLSLQDALLLLLMGLGPLLRACGMPLTLLGLAFCLHPWA 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccchhhhhhhhhhhhccchhhhhcccccchhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMM 

(No Prosite data available for DKFZphf br2_72dl3 . 3 ) 
(No Pfam data available for DKFZphfbr2_72dl3 . 3 i 



314 



12/13/10, EAST Version: 2.4.2 



WO 01/12659 

DKFZphfbr2_72112 



PCT/IB00/01496 



group: nucleic acid management 

Summary DKFZphf br2_72112 encodes a novel 344 amino acid protein with similarity to YDR126w and 
other S. cerevisiae proteins. 

The novel protein contains a myc-type, helix-loop-helix dimerization domain signature. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, the 
protein could be a novel DNA-binding protein. 

The new protein can application in modulating gene expression. 



similarity to YDR126w ; 
membrane regions: 2 

similarity to YDR126w 

complete cDNA complete cds, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 1270 bp 

Poly A stretch at pos . 1251, no polyadenylation signal found 



1 GGGGGCGCCC GGGAGGCGCC 

51 TCCTGGAAGA ACCATGTCCG 

101 CTGCCCAAGA GGAGCTGCTG 

151 AATGCCAGAG CTGCCGGCTG 

201 ATGGACTTTC TGGTCCTCTT 

251 TCTTGTTCTT ATCTGCGTCT 

301 CCAGGGGAGG AGCACAGATA 

351 AGAGCCGTGC ATGGATTGCT 

401 CTTCATTGTC CTGCACCTGG 

451 CCTGGGAAGT ATTTGGCTAC 

501 CTTCTTCTGC CCTATCTGCT 

551 GACTTGTGGA ACCAATCCTG 

601 TTCTTCATGT TTATGAATTT 

651 TGCTCTACTT GTGATTTAAG 

701 GTGTAACTGG TGTGTGCACC 

751 ACTGCATCGG GGCCTGGAAC 

801 TTGACGGCCT CGGCTGCCAC 

851 CCACTTGGTG GTGATGTCAG 

901 TTGGACACCT CCATGTTATG 

951 CTGACTTTTC CACGGATTGT 

1001 CTTCCTCCTG GGTGGCTACC 

1051 ACCAGACTAC TAACGAGTGG 

1101 TGTCCCCTTG TGGCCTGGCC 

1151 CATTCACTCC CATGGGCTTC 

1201 CCTTTCCATG TCATGAGAGG 

1251 CAAAAAAAAA AAAAAAAAAC 



GGAGCCCAGC GGCTGGCGCC AGATCCAGGC 
GCAGCTACTG GTCATGCCAG GCACACACTG 
TTTGAATTAT CTGTGAATGT TGGGAAGAGG 
AAAATTACCC AACCAAGAGA AATCTGCAGG 
CTTGTTCTAC CTGGCTTCGG TGCTGATGGG 
GCTCGAAAAC CCATAGCTTG AAAGGCCTGG 
TTTTCCTGTA TAATTCCAGA ATGTCTTCAG 
TCATTACCTT TTCCATACGA GAAACCACAC 
TCTTGCAAGG GATGGTTTAT ACTGAGTACA 
TGTCAGGAGC TGGAGTTGTC CTTGCATTAC 
GCTAGGTGTA AACCTGTTTT TTTTCACCCT 
GCATTATAAC AAAAGCAAAT GAATTATTAT 
GATGAAGTGA TGTTTCCAAA GAACGTGAGG 
GAAACCAGCT CGA7CCAAGC ACTGCAGTGT 
GTTTCGACCA TCACTGTGTT TGGGTGAACA 
ATCAGGTACT TCCTCATCTA CGTCTTGACC 
CGTCGCCATT GTGAGCACCA CTTTTCTGGT 
ATTTATACCA GGAGACTTAC ATCGATGACC 
GACACGGTCA TTC7TATTCA GTACCTGTTC 
CTTCATGCTG GGCTTTGTCG TGGTCCTGAG 
TGTTGTCTGT CCTGTATCTG GCGGCCACCA 
TACAGAGGTG TCTGGGCCTG GTGCCAGCGT 
TCCGTCAGCA GAGCCCCAAG TCCACCGGAA 
GGAGCAACCT TCAAGAGATC TTTCTACCTG 
AAGAAACAAG AATGACAAGT GTATGACTGC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 201 bp to 1232 bp; peptide length: 344 
Category: similarity to unknown protein 
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1 MDFLVLFLFY LASVLMGLVL ICVCSKTHSL KGLARGGAQI FSCIIPECLQ 

51 RAVHGLLHYL FHTRNHTFI V LHLVLQGMVY TEYTWEVFGY CQELELSLHY 

101 LLLPYLLLGV NLFFFTLTCG TNPGIITKAN ELLFLHVYEF DEVMFPKNVR 

151 CSTCDLRKPA RSKHCSVCNW CVHRFDHHCV WVNNCIGAWN IRYFLIYVLT 

201 LTASAATVAI VSTTFLVHLV VMSDLYQETY IDDLGHLHVM DTVILIQYLF 

251 LTFPRIVFML GFVVVL5FLL GGYLLSVLYL AATNQTTNEW YRGVWAWCQR 

301 CPLVAWPPSA EPQVHRNIHS HGLRSNLQEI FLPAFPCHER KKQE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_72112 / frame 3 

TREMBL : SPBC13G1_7 gene: "SPBC13G1 . 07" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl., N = 2, Score = 247, P = 1.4e-22 

TREMBL :CED2 0 2 1_3 gene: "D2021.2"; Caenorhabditis elegans cosmid 
D2021., N = 1, Score = 209, P = 9e-17 

TREMBL : CEC4 3H6_2 gene: "C43H6.7"; Caenorhabditis elegans cosmid 
C43H6., N = 1, Score = 206, P = 5.2e-15 

PIR:S52691 probable membrane protein YDR126w - yeast (Saccharomyces 
cerevisiae) , N — 1 , Score = 207, p = 8.4e-15 

PIR:E71607 metal binding protein ( DHHC domain) PFB0725c - malaria 
parasite (Plasmodium falciparum), N = 1, Score = 182, P = l.le-13 



>TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl. 
Length = 356 

HSPs: 

Score = 247 (37.1 bits), Expect = 1.4e-22, Sum P(2) = 1.4e-22 
Identities = 55/148 (37%), Positives = 85/148 (57%) 

Query: 52 AVEIGLLHYLFHTRNH--TFIVLHLVLQGM VYTEYTWEVFGYCQELELSLHYLLLPY 105 

A+ L +Y+ + H F+ L L+ G+ +Y + F + + L +LLPY 

Sbjct: 64 AMRSLSNYVLYKNNPLVVFLYLALITIGIASFFIYGSSLTQKFSIIDWISV-LTSVLLPY 122 

Query: 106 LLLGVNLFFFTLTCGTNPGIITKANELLFLHVYEFD-EVMFPKNVRCSTCDLRKPARSKH 164 

++L+ + +NPG I N + +D ++ FP +CSTC KPARSKH 

SbjCt: 123 ISLY IAAKSNPGKIDLKNWNEASRRFPYDYKIFFPN — KCSTCKFEKPARSKH 173 

Query: 165 csvcnwcvhrfdhhcvwvnncigawniryfliyvl 199 

C +CN CV +FDHHC+W+NNC+G N RYF +++L 
Sbjct: 174 CRLCNICVEKFDHHCIWINNCVGLNNARYFFLFLL 208 



Score = 43 (6.5 bits). Expect = 1.4e-22, Sum P(2) 
Identities = 10/35 (28%), Positives = 17/35 (48%) 



1.4e-22 



Query: 257 VFMLGFVV-VLSFLLGGYLLSVLYLAATNQTTNEW 290 

VF+ + + VL L GY ++Y T + +W 
Sbjct: 254 VFLISLICSVLVLCLLGYEFFLVYAGYTTNESEKW 288 



Pedant information for DKFZphfbr2_72112, frame 3 



Report for DKFZphfbr2_72112 . 3 



[LENGTH] 34 4 

[MW] 39677.23 

[pi] 7.26 

[HOMOL j TREMBL:SPBC13G1_7 gene: "SPBC13G1.07"; product: "hypothetical protein"; S.pombe 
chromosome II cosmid C13G1. 3e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YDR126w] le-16 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YDR264c] 8e-05 

[FUNCAT] 10.05.99 other pheromone response activities [S. cerevisiae, YDR2 64c] 

8e-05 

[PIRKW] transmembrane protein 4e-15 

[SUPFAM] ankyrin repeat homology le-10 

[SUPFAM] unassigned ankyrin repeat proteins le-10 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PHOSPHO_SITE 3 
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[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] SIGNAL_PEPTIDE 30 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 16.57 % 



SEQ MDFLVLFLFYLASVLMGLVLICVCSKTHSLKGLARGGAQIFSCIIPECLQRAVHGLLHYL 

SEG 

PRD ccchhhhhhhhhhhhhhheeeeeeccccceeeeecccceeeeeeehhhhhhhhhhhheee 

MEM 

SEQ FHTRNHTFIVLHLVLQGMVYTEYTWEVFGYCQELELSLHYLLLPYLLLGVNLFFFTLTCG 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhccchhhhhhhheeeeccceeehhhhhhhhhhhhhhcccceeeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TNPGI ITKANELLFLHVYEFDEVMFPKNVRCSTCDLRKPARSKHCSVCNWCVHRFDHHCV 

SEG 

PRD ccccccccccchhhhhhhhhcccccccceeeecccccccccccccccceeeecccccccc 

MEM M MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ WVNNCIGAWNIRYFLIYVLTLTASAATVAIVSTTFLVHLVVMSDLYQETYIDDLGHLHVM 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhccchhhhhhhhhhhhhhhhhccccccccccccccccchh 

MEM 

SEQ DTVILIQYLFLTFPRIVFMLGFVVVLSFLLGGYLLSVLYLAATNQTTNEWYRGVWAWCQR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccccccccceeecccchhhhhhhhhcccchhhhhhhhhhhcccc 

MEM 

SEQ CPLVAWPPSAEPQVHRNIHSHGLRSNLQEIFLPAFPCHERKKQE 

SEG 

PRD cccccccccccccceeecccccccccceeeeecccccccccccc 

MEM 



Prosite for DKFZphfbr2_72112 . 3 



PS00001 


65->69 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


284->288 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


29->32 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


152->156 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


229->233 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


286->290 


CK2 PHOSPHO SITE 


PDOC000C6 


PS00008 


32->38 


MYRISTYL 


PDOC00008 


PS00008 


77->83 


MYRISTYL 


PDOC00008 


PS00008 


120->126 


MYRISTYL 


PDOC00008 


PS0000B 


322->328 


MYRISTYL 


PDOC000Q8 



(No Pfam data available for DKFZphfbr2_72H2 . 3 ) 
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DKFZphfbr2_72ml6 



PCT/IB00/01496 



group: unknown 

DKFZphfbr2_72ml6 encodes a novel 287 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by LMU 

Locus: /map="26.2 cR from top of Chrl6 linkage group" 
Insert length: 1462 bp 

Poly A stretch at pos. 1441, polyadenylation signal at pos. 1421 



1 GGGGAGGACC GGAGGACCGA GGACAGAAAG ATTGGTGGAC AGGAGCAGCG 

51 GCCGGTGGGG AGGGCGCTCG GCGGCGGCCT GCGGCCATGG CCACCGTGAT 

101 GGCAGCGACG GCGGCGGAGC GGGCGGTGCT GGAGGAGGAG TTCCGCTGGC 

151 TGCTGCACGA CGAGGTGCAC GCTGTGTTGA AGCAGCTGCA GGACATCCTC 

201 AAGGAGGCCT CTCTGCGCTT CACTCTGCCG GGCTCCGGCA CTGAGGGGCC 

251 CGCCAAGCAA GAGAACTTCA TCCTAGGCAG CTGTGGCACA GACCAGGTGA 

301 AGGGTGTGCT GACTCTGCAG GGGGATGCCC TCAGCCAGGC GGATGTGAAC 

351 CTGAAGATGC CCCGGAACAA CCAGCTGCTG CACTTCGCCT TCCGGGAGGA 

401 CAAGCAGTGG AAGCTGCAGC AGATCCAGGA TGCCAGAAAC CATGTGAGCC 

451 AAGCCATTTA CCTGCTTACC AGCCGGGACC AGAGCTACCA GTTCAAGACG 

501 GGCGCTGAGG TCCTCAAGCT GATGGACGCA GTGATGCTGC AGCTGACCAG 

551 AGCCCGAAAC CGGCTCACCA CCCCCGCCAC CCTCACCCTC CCCGAGATCG 

601 CCGCCAGCGG CCTCACGCGG ATGTTCGCCC CTGCCCTGCC GTCCGACCTG 

651 CTGGTCAACG TCTACATCAA CCTCAACAAG CTCTGCCTCA CGGTGTACCA 

701 GCTGCATGCC CTGCAGCCCA ACTCCACCAA GAACTTCCGC CCAGCTGGGG 

751 GCGCGGTGCT GCATAGCCCT GGGGCCATGT TCGAGTGGGG CTCTCAGCGC 

801 CTGGAGGTGA GCCACGTGCA CAAAGTGGAG TGCGTGATCC CCTGGCTCAA 

851 CGACGCCCTG GTCTACTTCA CCGTCTCCCT GCAGCTCTGC CAGCAGCTTA 

901 AGGACAAGAT CTCCGTGTTC TCCAGCTACT GGAGCTACAG ACCCTTCTGA 

951 TCACAGCACC CAGGAGCTTG TCTCCAGGAA GGCGGCCCCG TCCCCTACTC 

1001 ATACCCACCA CAGAGCACCA GCCAGTGCCA ACGCCAGGCT GCTATTTATC 

1051 TCCCTATCCC ACCCCCTACC CCACCTAACA CATTTGCACT GCCGGGAATG 

1101 GACACTGGAA GTGCCAGGAG GAAGGAAGGC TGGTTTGGTG GGGTAGTGGG 

1151 GAGGTCAGGG AGGCGGGGCC AAGGGTGTCC CACATTCCCA ACACCGCCCT 

1201 CTGATCACCA TGGGAATCTT TGGACTCAGG ACAGGGCCAG GCGCAGGGCT 

1251 CTCCCTCCTC TCCCCTTCGC TGTCCCCTCC CCCTGGAGGG CATGGTGTCG 

1301 GGGGGTGGCA CTGAGCTATG AGTCCCGGGG ATGGTGAGGA ACGCCACAGA 

1351 CAGAGCCACC CTAGGAGTGA GTATAGTGCT GGTGACTGTG TTTCATAGCC 

1401 CCAGTCCAGG GCTGTCTAAG AAATAAAGAT CATCAGACTC CAAAAAAAAA 

1451 AAAAAAAAAA AC 



BLAST Results 



Entry HS604351 from database EMBL: 
human STS WI-18474. 
Score = 1178, P = 1.5e-48, identities = 250/268 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 87 bp to 947 bp; peptide length: 287 
Category: similarity to unknown protein 
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1 MATVMAATAA ERAVLEEEFR WLLHDEVHAV LKQLQDILKE ASLRFTLPGS 
51 GTEGPAKQEN FILGSCGTDQ VKGVLTLQGD ALSQADVNLK MPRNNQLLHF 
101 AFREDKQWKL QOIODARNHV SQAIYLLTSR DQSYQFKTGA EVLKLMDAVM 
151 LQLTRARNRL TTPATLTLPE IAASGLTRMF APALPSDLLV NVYINLNKLC 
201 LTVYQLHALQ PNSTKNFRPA GGAVLHSPGA MFEWGSQRLE VSHVHKVECV 
251 IPWLNDALVY FTVSLQLCQQ LKDKISVFSS YW5YRPF 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_72ml6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_72ml 6, frame 3 



Report for DKFZphf br2_72ml6 . 3 



[LENGTH] 287 

[MW] 32254.40 

[pi] 8.30 

[ HOMOL ) TREMBL:AF0254 59_2 gene: "H14A12.3"; Caenorhabditis elegans cosmid H14A12. 3e-14 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] PKC_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] AlphaBeta 

[KW] LOW_COMPLEXITY 6.27 % 



SEQ MATVMAATAAERAVLEEEFRWLLHDEVHAVLKQLQDILKEASLRFTLPGSGTEGPAKQEN 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhh 

SEQ FILGSCGTDQVKGVLTLQGDALSQADVNLKMPRNNQLLHFAFREDKQWKLQQIQDARNHV 

SEG 

PRD hhccccccceeeeeeeeccccchhhhhhhcccccchhhhhhhhhchhhhhhhhhhhhchh 

SEQ SQAIYLLTSRDQSYQFKTGAEVLKLMDAVMLQLTRARNRLTTPATLTLPEIAASGLTRMF 

SEG 

PRD hhhhhhhhccccceeecchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccc 

SEQ APALPSDLLVNVYINLNKLCLTVYQLHALQPNSTKNFRPAGGAVLHSPGAMFEWGSQRLE 

SEG 

PRD cccccccceeeeehhhhhhhhhhheeeecccccccccccccceeecccccccccccccee 

SEQ VSHVHKVEC VI PWLNDALVYFT VSLQLCQQLKDKI SVFSS YWSYRPF 

SEG 

PRD eeeeeeeeeeeecccceeeeeeehhhhhhhhhhhhheeeeeeeeccc 



Prosite for DKFZphfbr2_72ml6. 3 



PS00001 


212->216 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00005 


42->45 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


128->131 


PKC 


"PHOSPHO 


"SITE 


PDOC000C5 


PS00005 


213->216 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


236->239 


PKC~ 


"PHOSPHO^ 


|SITE 


PDOC000C5 


PS00005 


283->286 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00006 


8->12 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


50->54 


CK2~ 


"PHOSPHO 


SITE 


PDOC00006 


PS00006 


83->87 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


128->132 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


138->142 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


167->171 


CK2 


PHOSPHO" 


"site 


PDOC00006 


PS00008 


64->70 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_72ml6. 3 ) 
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DKFZphfbr2_72nl2 



group: brain derived 

DKFZphfbr2_72nl2 encodes a novel 117 amino acid protein with similarity to a protein with 
conserved sequence in bacteria and eukariota. 

The novel protein is very similar to human MM4 6, human and rat gangliosiode expression factor- 
2 (GEF2), C. elegans 14.8 kD protein C32D5.9 and Laccaria bicolor symbiosis-related protein 
LBU93506_1. The function of this highly conserved proteins is not known. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



strong similarity to rat GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2) 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="12" 
Insert length: 1880 bp 

Poly A stretch at pos . 1859, polyadenylation signal at pos . 1830 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



GGGGGCCGGT 
CCCGAGATCC 
GCCCCCGGAG 
GCTGAGGGAG 
GGATCTCGGA 
CATCCCTTTG 
TCCGGACAGG 
CTGATCTGGA 
CAGTTCTACT 
CTTATTCTTC 
GCCAACTGTA 
TACAGTGATG 
TGGGAGCACC 
GGGAAAGAGG 
GAAACATTAC 
TATTTTTTGC 
TGTTGGATTG 
TCCTGGGGTT 
AAGTTGGGGC 
TTCTCTTTGG 
GAAAGGGGAC 
CTGTAATTGA 
TCTCCCATCC 
GGTGGCAGTA 
GGAAAATAGG 
GTTATTAAAT 
TCTAACCTGC 
ATTTAATAAG 
ATCTCCTCTA 
AGGGGGCAAG 
CTTGCTCCCA 
ATGATGGCCA 
TTAGAAAGGT 
ATTTATTTTC 
TGCACTCAGA 
TTCTCTTGGG 
ATAATGAAGT 
TAAAAATCGA 



ATTTCTCCAT 
CCGCCCCGAA 
CGGACGTTTC 
CGGGACAGGG 
AGCCCTGCGG 
AGTATCGGAA 
GTCCCCGTGA 
CAAGAGGAAG 
TCTTAATCCG 
TTTGTCAACA 
TGAGGACAAT 
AGAGTGTCTA 
TGGACTTGGG 
GTGGCTCCCA 
ACCACACACA 
TGCTTCCTCG 
GCTTTGATAG 
TAATTGTTGT 
CAGAGATGAT 
GCAGAGATTC 
TTGTGGTAGT 
TGCATTGTGG 
CGGTTGCAAT 
GACAACAACC 
GGTTAGGCAT 
AGCATTAAAC 
TCTTTCTCTT 
TCTCAGGCAT 
GTATTTTCCA 
TATGAAGTAA 
TGCTGCTGTC 
GCTGCTTCCC 
TTGGAGGGAT 
ACTTTTGGGA 
CATGACATTT 
GGAAATGTGT 
CAATGCCATC 
AAAAAAAAAA 



CTGGCTCTCC 
CCCCCCCTGC 
TGCAGCTATT 
TCAGCGGCGA 
TGCATCATGA 
AAAGGAAGGA 
TTGTAGAGAA 
TACCTAGTGC 
GAAGAGAATC 
ACACCATCCC 
CATGAGGAAG 
TGGGAAATGA 
GGTAGGGGAG 
CCGCAAGGAG 
CCGTCATCAC 
GCCCAGGGAG 
AGGAATGGGG 
GCAGTTTCAT 
GGCAGTCCAG 
TATTTTTGAC 
GGACCATACC 
CCCCTGATCT 
CTCACTCAGA 
CAGAAATTTA 
GAAGGTGGTT 
TGGAATTGAC 
TGGTGCCCCT 
TTCCAATTGT 
TGTATCAGGA 
GGTAATTATA 
CCTTCAGGCT 
TCCTTGGTTA 
GACTTTTAGT 
TTTTGTGGGG 
CAATTCATCT 
GTGTCAGTTC 
AGGCCAAGGA 
AAAAAAAAAC 



TCTACCTCCA 
ACACTCGGCC 
CTGAGCACAC 
AGGAGGCAGG 
AGTTCCAGTA 
GAAAAGATCC 
GGCTCCAAAA 
CCTCTGACCT 
CACCTGAGAC 
TCCCACCAGT 
ACTATTTTCT 
GTGGTTGGAA 
GGGTGTGTGT 
ACAGAAGGTG 
ATTTTCACAT 
AAAGCATGTC 
ATGATGTAAG 
AGATGGGTCA 
CAGCAACTCC 
ATTTGCACAA 
TGGGGACCAA 
TCCCTGTCTC 
CATCACAGTA 
GACAGGGATC 
GTGATTAAGA 
AAGAGTGTTG 
TATCTCACCC 
AGACTAAAAC 
AAGAGGTGTC 
TACTACTCTC 
CACATGCACA 
TCATCCACTG 
AAATCATGGG 
TGGGAGTGGG 
CTGCTAATGA 
TGTCAGCTGC 
AATAAAATAA 



GGCAGGCTCA 
CAGCGCTGTT 
CTTGACGTCG 
CCCCGCGCGG 
CAAGGAGGAC 
GGAAGAAATA 
GCCAGGGTGC 
TACTGTTGGC 
CTGAGGACGC 
GCTACCATGG 
GTATGTGGCC 
GCCCAGCAGA 
GCGCGACATG 
AAGACATCTA 
GCTCAATTGA 
AGGACAGAGC 
TTTACAGTAT 
GGAGGTGGAC 
CTGTGCTCCC 
GACAGGTAGG 
AAGAGACCCA 
ACACTTCTTT 
CCACCCCAGG 
TCTTACCTTT 
AGATGGTTTT 
AGCATCCCTG 
CTTCCTTGGA 
CACTCTTAGC 
TTATGTAGGG 
ATTCAGGATT 
GGAATGCTAC 
CAGCTGCTAG 
GATTTTATTG 
GAGCAGGAAT 
AAAGGGTTCT 
AAGTTCTTGT 
TTGCTTACCT 



BLAST Results 



Entry HS418210 from database EMBL : 
human STS SHGC-10496. 
Score = 1916, P = 4.0e-80, identities = 394/400 

Entry AC006514 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens; HTGS phase 1, 68 unordered 
pieces . 

Score = 610, P = 2.7e-16, identities = 128/134 
4 exons 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 227 bp to 577 bp; peptide length: 117 
Category: strong similarity to known protein 



1 MKFQYKEDHP FEYRKKEGEK IRKKYPDRVP VI VEKAPKAR VPDLDKRKYL 
51 VPSDLTVGQF YFLIRKRIHL RPEDALFFFV NNTIPPTSAT MGQLYEDNHE 
101 EDYFLYVAYS DESVYGK 

BLASTP hits 

Entry YQD9_CAEEL from database SWISSPROT; 

HYPOTHETICAL 14.8 KD PROTEIN C32D5.9 IN CHROMOSOME II. 

Score = 496, P = 1.8e-47, identities = 91/116, positives = 105/116 

Entry SYRP_LACBI from database SWISSPROT: 
SYMBIOSIS-RELATED PROTEIN. 

Score = 390, P = 3.1e-36, identities = 68/117, positives = 94/117 
Entry LBU93506_1 from database TREMBL: 

product: "symbiosis-related protein"; Laccaria bicolor 
symbiosis-related protein mRNA, partial cds . 

Score = 390, P = 3.1e-36, identities = 68/117, positives - 94/117 

Entry GEF2_RAT from database SWISSPROT: 
GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2). 

Score = 373, P = 2.0e-34, identities = 71/116, positives = 88/116 



Alert blastp hits for DKFZphfbr2_72nl2, frame 2 

TREMBLNEW : AF0 4 4 67 1_1 product: "MM46"; Homo sapiens MM46 mRNA, complete 
cds., N = 1, Score = 549, P = 4.7e-53 

SWISSPROT:GEF2_HUMAN GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2)., N = 1, 
Score = 373, P - 2.1e-34 



>TREMBLNEW:AF044 671_1 product: "MM46"; Homo sapiens MM46 mRNA, complete 
cds . 

Length = 117 

HSPs: 

Score = 549 (82.4 bits), Expect = 4.7e-53, P = 4.7e-53 
Identities = 101/116 (87%), Positives = 110/116 (94%) 

Query: 1 MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVI VEKAPKARVPDLDKRKYLVPSDLTVGQF 60 

MKF YKE+HPFE R+ EGEK I RKKY P DRVPVI VEKAPKAR+ DLDK+KYLVPSDLTVGQF 
Sbjct: 1 MKFVYKEEHPFEKRRSEGEKI RKKY P DRVPVI VEKAPKARIGDLDKKKYLVPSDLTVGQF 60 

Query: 61 YFLIRKRIHLRPEDALFFFVNNTIFPTSATMGQLYEDNH3EDYFLYVAYSDESVYG 116 

YFLIRKRIHLR EDALFFFVNN IPPTSATMGQLY+++HEED+FLY+AYSDESVYG 
Sbjct: 61 YFLIRKRIHLRAEDALFFFVNNVIPPTSATMGQLYQEHHEEDFFLYIAYSDESVYG 116 



Pedant information for DKFZphfbr2_72nl2, frame 2 



Report for DKFZphfbr2_72nl2 .2 



[LENGTH] 117 

[MW] 14044.07 

[pi] 8.67 

[HOMOL] TREMBL:AF044671_1 product: "MM46"; Homo sapiens MM46 mRNA, complete cds. le-56 
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[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YBL078c] 4e-36 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YBL078c] 4e-36 

[FUNCAT] 06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YBL078c] 4e-36 

[SUPFAM] hypothetical protein YBL078C 8e-35 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 



SEQ MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 

PRD cccccccccchhhhhhhhhhhhhhccccceeeeeccccccccccccceeecccccchhhh 

SEQ YFLIRKRIHLRPEDALFFFVNNTIPPTSATMGQLYEDNHEEDYFLYVAYSDESVYGK 

PRD hhhhhhhhhhccccceeeeecccccccchhhhhhhhhccccceeeeeeecccccccc 



Prosite for DKFZphf br2_72nl2 . 2 
PS00001 81->85 ASN GL YCOS YLAT I ON PDOC00001 



(No Pfam data available for DKFZphfbr2_72nl2 . 2 ) 
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DKFZphfbr2_78c24 



group: signal transduction 

DKFZphfbr2_78c2 4 encodes a novel 563 amino acid protein with strong similarity to guanylate- 
binding proteins (GBPs) . 

GBPs were originally described as proteins that are strongly induced by interferons and are 
capable of binding to agarose-immobilized guanine nucleotides. hGBPl, the first of two members 
of this protein family in humans, represents a novel type of GTPase. The novel protein 
contains an ATP/GTP-binding site motif A (P-loop) and a RGD cell attachment site. It seems to 
be a new member of the GBP-family and shows a splicing pattern not described previously. 

The new protein can find application in modulating/blocking the response of cells to 
interferons . 



strong similarity to guanine nucleotide-binding protein 1/2 
but different "splice variant" aa 211-245 of GBP1/2 missing 



Sequenced by MediGenomix 
Locus: unknown 



Insert length: 2952 bp 

Poly A stretch at pos. 2927, polyadenylation signal at pos . 2914 



1 CAGTTTCATT AGGCTCTGAA GCCATTACAA AGGTTGCTTA ACTTCTAATT 
51 ATTTGATCAC TGAGGAAAAT CCAGAAAGCT ACACAACACT GAAGGGGTGA 
101 AATAAAAGTC CAGCGATCCA GCGAAAGAAA AGAGAAGTGA CAGAAACAAC 
151 TTTACCTGGA CTGAAGATAA AAGCACAGAC AAGAGAACAA TGCCCTGGAC 
201 ATGGCTCCAG AGATCCACAT GACAGGCCCA ATGTGCCTCA TTGAGAACAC 
251 TAATGGGGAA CTGGTGGCGA ATCCAGAAGC TCTGAAAATC CTGTCTGCCA 
301 TTACACAGCC TGTGGTGGTG GTGGCAATTG TGGGCCTCTA CCGCACAGGA 
351 AAATCCTACC TGATGAACAA GCTAGCTGGG AAGAATAAGG GCTTCTCTCT 
4 01 GGGCTCCACA GTGAAATCTC ACACCAAAGG AATCTGGATG TGGTGTGTGC 
451 CTCACCCCAA AAAGCCAGAA CACACCTTAG TCCTGCTTGA CACTGAGGGC 
501 CTGGGAGATG TAAAGAAGGG TGACAACCAG AATGACTCCT GGATCTTCAC 
551 CCTGGCCGTC CTCCTGAGCA GCACTCTCGT GTACAATAGC ATGGGAACCA 
601 TCAACCAGCA GGCTATGGAC CAACTGTACT ATGTGACAGA GCTGACACAT 
651 CGAATCCGAT CAAAATCCTC ACCTGATGAG AATGAGAATG AGGATTCAGC 
701 TGACTTTGTG AGCTTCTTCC CAGATTTTGT GTGGACACTG AGAGATTTCT 
751 CCCTGGACTT GGAAGCAGAT GGACAACCCC TCACACCAGA TGAGTACCTG 
801 GAGTATTCCC TGAAGCTAAC GCAAGGTAAC AGGAAGCTTG CCCAGCTTGA 
851 GAAACTACAA GATGAAGAGC TGGACCCTGA ATTTGTGCAA CAAGTAGCAG 
901 ACTTCTGTTC CTACATCTTT AGCAATTCCA AAACTAAAAC TCTTTCAGGA 
951 GGCATCAAGG TCAATGGGCC TTGTCTAGAG AGCCTAGTGC TGACCTATAT 
1001 CAATGCTATC AGCAGAGGGG ATCTGCCCTG CATGGAGAAC GCAGTCCTGG 
1051 CCTTGGCCCA GATAGAGAAC TCAGCCGCAG TGCAAAAGGC TATTGCCCAC 
1101 TATGACCAGC AGATGGGCCA GAAGGTGCAG CTGCCCGCAG AAACCCTCCA 
1151 GGAGCTGCTG GACCTGCACA GGGTTAGTGA GAGGGAGGCC ACTGAAGTCT 
1201 ATATGAAGAA CTCTTTCAAG GATGTGGACC ATCTGTTTCA AAAGAAATTA 
1251 GCGGCCCAGC TAGACAAAAA GCGGGATGAC TTTTGTAAAC AGAATCAAGA 
1301 AGCATCATCA GATCGTTGCT CAGCTTTACT TCAGGTCATT TTCAGTCCTC 
1351 TAGAAGAAGA AGTGAAGGCG GGAATTTATT CGAAACCAGG GGGCTATTGT 
14 01 CTCTTTATTC AGAAGCTACA AGACCTGGAG AAAAAGTACT ATGAGGAACC 
1451 AAGGAAGGGG ATACAGGCTG AAGAGATTCT GCAGACATAC TTGAAATCCA 
1501 AGGAGTCTGT GACCGATGCA ATTCTACAGA CAGACCAGAT TCTCACAGAA 
1551 AAGGAAAAGG AGATTGAAGT GGAATGTGTA AAAGCTGAAT CTGCACAGGC 
1601 TTCAGCAAAA ATGGTGGAGG AAATGCAAAT AAAGTATCAG CAGATGATGG 
1651 AAGAGAAAGA GAAGAGTTAT CAAGAACATG TGAAACAATT GACTGAGAAG 
1701 ATGGAGAGGG AGAGGGCCCA GTTGCTGGAA GAGCAAGAGA AGACCCTCAC 
1751 TAGTAAACTT CAGGAACAGG CCCGAGTACT AAAGGAGAGA TGCCAAGGTG 
1801 AAAGTACCCA ACTTCAAAAT GAGATACAAA AGCTACAGAA GACCCTGAAA 
1851 AAAAAAACCA AGAGATATAT GTCGCATAAG CTAAAGATCT AAACAACAGA 
1901 GCTTTTCTGT CATCCTAACC CAAGGCATAA CTGAAACAAT TTTAGAATTT 
1951 GGAACAAGTG TCACTATATT TGATAATAAT TAGATCTTGC ATCATAACAC 
2 001 TAAAAGTTTA CAAGAACATG CAGTTCAATG ATCAAAATCA TGTTTTTTCC 
2 051 TTAAAAAGAT TGTAAATTGT GCAACAAAGA TGCATTTACC TCTGTACCAA 
2101 CAGAGGAGGG ATCATGAGTT GCCACCACTC AGAAGTTTAT TCTTCCAGAC 
2151 GACCAGTGGA TACTGAGGAA AGTCTTAGGT AAAAATCTTG GGACATATTT 
2201 GGGCACTGGT TTGGCCAAGT GTACAATAGG TCCCAATATC AGAAACAACC 
2251 ATCCTAGCTT CCTAGGGAAG ACAGTGTACA GTTCTCCATT ATATCAAGGC 
2301 TACAAGGTCT ATGAGCAATA ATGTGATTTC TGGACATTGC CCATGGATAA 
2 351 TTCTCACTGA TGGATCTCAA GCTAAAGCAA ACCATCTTAT ACAGAGATCT 
2401 AGAATCTTAT ATTTTCCATA GGAAGGTAAA GAAATCATTA GCAAGAGTAG 
2451 GAATTGAATC ATAAACAAAT TGGCTAATGA AGAAATCTTT TCTTTCTTGT 
2501 TCAATTCATC TAGATTATAA CCTTAATGTG ACACCTGAGA CCTTTAGACA 
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2551 GTTGACCCTG AATTAAATAG TCACATGGTA ACAATTATGC ACTGTGTAAT 

2 601 TTTAGTAATG TATAACATGC AATGATGCAC TTTAACTGAA GATAGAGACT 

2 651 ATGTTAGAAA ATTGAACTAA TTTAATTATT TGATTGTTTT AATCCTAAAG 

2701 CATAAGTTAG TCTTTTCCTG ATTCTTAAAG GTCATACTTG AAATCCTGCC 

27 51 AATTTTCCCC AAAGGGAATA TGGAATTTTT TTTGACTTTC TTTTGAGCAA 

2801 TAAAATAATT GTCTTGCCAT TACTTAGTAT ATGTAGACTT CATCCCAATT 

2851 GTCAAACATC CTAGGTAAGT GGTTGACATT TCTTACAGCA ATTACAGATT 

2901 ATTTTTGAAC TAGAAATAAA CTAAACTAGA AACAAAAAAA AAAAAAAAAA 

2951 AA 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 201 bp to 1889 bp; peptide length: 563 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (272-275) 
ATP GTP A (45-53) 



1 MAPEIHMTGP MCLIENTNGE LVANPEALKI LSAITQPVVV VAI VGLYRTG 

51 KSYLMNKLAG KNKGFSLGST VKSHTKGIWM WCVPHPKKPE HTLVLLDTEG 

101 LGDVKKGDNQ NDSWIFTLAV LLSSTLVYNS MGTINQQAMD QLYYVTELTH 

151 RIRSKSSPDE NENEDSADFV SFFPDFVWTL RDFSLDLEAD GQPLTPDEYL 

201 EYSLKLTQGN RKLAQLEKLQ DEELDPEFVQ QVADFCSYIF SNSKTKTLSG 

251 GIKVNGPCLE SLVLTYINAI SRGDLPCMEN AVLALAQIEN SAAVQKAIAH 

301 YDQQMGQKVQ LPAETLQELL DLHRVSEREA TEVYMKNSFK DVDHLFQKKL 

351 AAQLDKKRDD FCKQNQEASS DRCSALLQVI FSPLEEEVKA GIYSKPGGYC 

401 LFIQKLQDLE KKYYEEPRKG IQAEEILQTY LKSKESVTDA ILQTDQILTE 

451 KEKEIEVECV KAESAQASAK MVEEMQIKYQ QMMEEKEKSY QEHVKQLTEK 

501 MERERAQLLE EQEKTLTSKL QEQARVLKER CQGESTQLQN EIQKLQKTLK 

551 KKTKRYMSHK LKI 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2J78c24, frame 3 

PIR:A41268 guanine nucleotide-binding protein 1 - human, N = 2, Score = 
1306, P = 4.9e-238 

PIR:A46459 macrophage-activation gene-1 protein mag-1 - mouse, N - 2, 
Score = 942, P = 8.9e-184 

PIR:S70524 guanine nucleotide-binding protein 2 - human, N = 2, Score = 
1131, P = 4.1e-210 

TREMBL:AF077007_1 gene: "Gbp2"; product: "interf eron-induced guanylate 
binding protein GBP-2"; Mus musculus interf eron-induced guanylate 
binding protein GBP-2 (Gbp2) mRNA, complete cds., N = 2, Score = 904, P 
= 1.2e-179 

>PIR:A41268 guanine nucleotide-binding protein 1 - human 
Length = 592 

HSPs : 

Score = 130S (195.9 bits), Expect = 4.9e-238, Sum P(2) = 4.9e-238 
Identities = 264/332 (79%), Positives = 288/332 (86%) 

Query: 211 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIKVNGPCLESLVLTYINAI 270 

RKLAQLEKLQDEELDPEFVQQVADFCSYI FSNSKTKTLSGGI +VNGP LESLVLTY+NAI 
Sbjct: 245 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIQVNGPRLESLVLTYVNAI 304 
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Query: 271 SRGDLPCMENAVLALAQIENSAAVQKAIAHYDQQMGQKVQLPAETLQELLDLHRVSEREA 330 

S GDLPCMENAVLALAQIENSAAVQKAIAHY+QQMGQKVQLP E+LQELLDLHR SEREA 
Sbjct: 305 SSGDLPCMENAVLALAQIENSAAVQKAIAHYEQQMGQKVQLPTESLQELLDLHRDSEREA 364 

Query: 331 TEVYMKNSFKDVDHLFQKKLAAQLDKKRDDFCKQNQEASSDRCSALLQVIFSPLEEEVKA 390 

EV++++SFKDVDHLFQK+LAAQL+KKRDDFCKQNQEASSDRCS LLQVI FSPLEEEVKA 
Sbjct: 365 I EVFIRSSFKDVDHLFQKELAAQLEKKRDDFCKQNQEASSDRCSGLLQVI FSPLEEEVKA 424 

Query: 391 GIYSKPGGYCLFIQKLQDLEKKYYEEPRKGIQAEEILQTYLKSKESVTDAILQTDQILTX 450 

GIYSKPGGY LF+QKLQDL+KKYYEEPRKGIQAEEILQTYLKSKES+TDAILQTDQ LT 
Sbjct: 425 GIYSKPGGYRLFVQKLQDLKKKYYEEPRKGIQAEEILQTYLKSKESMTDAILQTDQTLTE 484 

Query: 451 XXXXXXXXXXXXXSAQASAKMVEEMQIKYQQMMEEKEKSYQEHVKQLTEKMXXXXXXXXX 510 

SAQASAKM++EMQ K +QMME+KE+SYQEH+KQLTEKM 
Sbjct: 485 KEKEIEVERVKAESAQASAKMLQEMQRKNEQMMEQKERSYQEHLKQLTEKMENDRVQLLK 544 

Query: 511 XXXKTLTSKLQEQARVLKERCQGESTQLQNEI 542 

+TL KLQEQ ++LKE Q ES ++NEI 
SbjCt: 545 EQERTLALKLQEQEQLLKEGFQKESRIMKNEI 576 

Score = 1012 (151.8 bits), Expect = 4.9e-238, Sum P(2) = 4.9e-238 
Identities = 194/211 (91%), Positives = 200/211 (94%) 

Query: 1 MAPEIHMTGPMCLIENTNGELVANPEALKILSAITQPVVVVAI VGLYRTGKSYLMNKLAG 60 

MA EIHMTGPMCLI ENTNG L+ANPEALXI LSAITQP+VVVAI VGLYRTGKSYLMNKLAG 
Sbjct; 1 MAS E I HMTGPMCL I ENTNGRLMANPEALKILSAITQPMVVVAI VGLYRTGKSYLMNKLAG 60 

Query: 61 KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWIFTLAV 120 

K KGFSLGSTV+SHTKGIWMWCVPHPKKP H LVLLDTEGLGDV+KGDNQNDSWIF LAV 
Sbjct: 61 KKKGFSLGSTVQSHTKGIWMWCVPHPKKPGHILVLLDTEGLGDVEKGDNQNDSWIFALAV 120 

Query: 121 LLSSTLVYNSMGTTNQQAMDQLYYVTELTHRIRSKSSPDENENE— DSADFVSFFPDFVW 178 

LLSST VYNS+GTINQQAMDQLYYVTELTHRIRSKSSPDENENE DSADFVSFFPDFVW 
Sbjct: 121 LLSSTFVYNSIGTINQQAMDQLYYVTELTHRIRSKSSPDENENEVEDSADFVSFFPDFVW 180 

Query: 17 9 TLRDFSLDLEADGQPLTPDEYLEYSLKLTQG 209 

TLRDFSLDLEADGQPLTPDEYL YSLKL +G 
Sbjct: 181 TLRDFSLDLEADGQPLTPDEYLTYSLKLKKG 211 

Pedant information for DKFZphfbr2_78c24 , frame 3 



Report for DKFZphfbr2_78c24 . 3 

[LENGTH] 563 

[MW] 64127.72 

[pi] 5.45 

[HOMOL] PIR:A41268 guanine nucleotide-binding protein 1 - human 0.0 

[SUPFAM] guanine nucleotide-binding protein 1 0.0 

[PROSITE] ATP_GTP_A 1 

[PROSITE] RGD 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 6.75 % 

[KW] COILED_COIL 10.48 % 

SEQ MAPEI HMTGPMCL I ENTNGELV AN PEALKILSAITQPVVVVAI VGLYRTGKSYLMNKLAG 

SEG 

PRD cccccccccceeeeeccccchhhhhhhhhhhhhhhcceeeeeeeecccccchhhhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 

SEQ KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWIFTLAV 

SEG 

PRD cccccccccccccccceeeeeecccccccceeeeeeeccccccccccccccchhhhhhhh 

COILS 

MEM 

SEQ LLSSTLVYNSMGTINQQAMDQLYYVTELTHRIRSKSSPDENENEDSADFVSFFPDFVWTL 

SEG 

PRD hhhhheeeccccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeccceeeeh 

COILS 

MEM 

SEQ RDFSLDLEADGQPLTPDEYLEYSLKLTQGNRKLAQLEKLQDEELDPEFVQQVADFCSYIF 

SEG 

PRD hhhhhhhhccccccccchhhhhhhhhhccchhhhhhhhhhhhhcccchhhhhhhhhhhhc 

COILS 
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MEM 

SEQ SNSKTKTLSGGIKVNGPCLESLVLTYINAISRGDLPCMENAVLALAQIENSAAVQKAIAH 

SEG 

PRD cccceeeccccccccccchhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ YDQQMGQKVQLPAETLQELLDLHRVSEREATEVYMKNSFKDVDHLFQKKLAAQLDKKRDD 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ FCKQNQEASSDRCSALLQVIFSPLEEEVKAGIYSKPGGYCLFIQKLQDLEKKYYEEPRKG 

SEG 

PRD hhhhhhchhhhhhhhhhhhhhhhhhhhhhcccccccccceeehhhhhhhhhhhhhccccc 

COILS 

MEM 

SEQ IQAEEILQTYLKSKESVTDAILQTDQILTEKEKEIEVECVKAESAQASAKMVEEMQIKYQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ QMMEEKEKSYQEHVKQLTEKMERERAQLLEEQEKTLTSKLQEQARVLKERCQGESTQLQN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ EIQKLQKTLKKKTKRYMSHKLKI 

SEG ..xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhccc 

COILS CCCCCCC 

MEM 



Prosite for DKFZphfbr2_78c24 . 3 

PS00016 272->275 RGD PDOC00016 

PS00017 45->53 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphfbr2_78c24 . 3) 
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DKFZphfbr2J7 8dl3 



group: brain derived 

DKFZphfbr2_78dl3 encodes a novel 259 amino acid protein with similarity to C. elegans putative 
protein from cosmid K08B12. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to C. elegans K08B12.3 
Sequenced by MediGenomix 

Locus: /map="338.4 cR from top of ChrlB linkage group" 
Insert length: 2195 bp 

Poly A stretch at pos. 2175, polyadenylation signal at pos . 2156 



1 CGTCCGTCGG GCAGCAGCGG GGCTGTCTAT CCCGGCTGAG GACCCGCGGC 
51 CAGTGCGGGT GGCTGGCTTT GCCATTAGCG GGGGCCTTTC CTGAGGACGG 
101 CGTACGGAGT GTGGGGAATG AAGGATGGCA GCATGCCGTG CATTAAAAGC 
151 TGTTTTGGTA GATCTCAGTG GCACACTTCA CATTGAAGAT GCAGCTGTGC 
201 CAGGCGCACA GGAAGCTCTT AAAAGGTTAC GTGGTGCTTC TGTAATCATT 
251 AGGTTTGTGA CCAATACAAC CAAAGAGAGC AAGCAAGACC TGTTAGAAAG 
301 GTTGAGAAAA TTGGAATTTG ATATCTCTGA AGATGAAATA TTCACATCTC 
351 TGACTGCAGC CAGAAGTTTA CTAGAGCGGA AACAAGTCAG ACCCATGCTG 
401 CTAGTTGATG ATCGGGCACT ACCTGATTTC AAAGGAATAC AAACAAGTGA 
451 TCCTAATGCT GTGGTCATGG GATTGGCACC AGAACATTTT CATTATCAAA 
501 TTCTGAATCA AGCATTCCGG TTACTCCTGG ATGGAGCACC TCTGATAGCA 
551 ATCCACAAAG CCAGGTATTA CAAGAGGAAA GATGGCTTAG CCCTGGGGCC 
601 TGGACCATTT GTGACTGCTT TAGAGTATGC CACAGATACC AAAGCCACAG 
651 TCGTGGGGAA ACCAGAGAAG ACGTTCTTTT TGGAAGCATT GCGGGGCACT 
7 01 GGCTGTGAAC CTGAGGAGGC TGTCATGATA GGAGATGATT GCAGGGATGA 
751 TGTTGGTGGG GCTCAAGATG TCGGCATGCT GGGCATCTTA GTAAAGACTG 
801 GGAAATATCG AGCATCAGAT GAAGAAAAAA TTAATCCACC TCCTTACTTA 
851 ACTTGTGAGA GTTTCCCTCA TGCTGTGGAC CACATTCTGC AGCACCTATT 
901 GTGAAGCAAT GTGTGCATCT GAAGCAACTT GAAATGCAGC TTCTTATTGT 
951 CTGGAATGAA TCCCTTACCA ACTCAGTGCC AGCATCGGTA GACACCAGTC 
1001 AGTGCTGATC GCTTTTTAAC CCTCTTTTGT TGTGCATTAA TTAGAAAGAA 
1051 AGGTATTGAA TTGCGGCTAG CCAGTAAGCC TTGCTAATCT CTTTTATTTT 
1101 GTAACTGAAG ATGAGACCCA AAGAAAGGCA AAGCTGAGAT TTTGTGCCAT 
1151 TCCTTTTAAA ATATTCATCA GGTTAGGTGG GGCTGTGGGG GAAAAGCTAC 
1201 TACAGGGAAG AGTGTTCTCT GCTGTCTCTT CACTGGAAAA CAGGGAGGGG 
1251 GGATTTCAGA CTGTGAAGAA AGTTGAATGG TGGTTTTTAA ATTATAAAGT 
1301 AATGTATTAA AAGGTGCATT AGGCTGTAGT TCTAATATTG AGTTCAACTG 
1351 TGAAATCCAT CAGATGTGCC AAATGGAGAA GACAGAAAGC AACAAAGTGA 
14 01 ATTGTTCTTT AGCCCAAGTG GTACAGTGAA TTTGCTTTAA CAGATGTTGA 

14 51 AAACTAAATT TTCTACTGTA TTCCCAGCAC GGGTGACTTC TTTTTCTCTT 

15 01 CATTAGCCAG AGATGACTAA TTTAAATTTA GAACCAGATT TTAATTTAAA 
15 51 TTAATATTTC CATTAATAAC CTACTCATTG CAGATACCTA TTATACTGTG 
1601 TAACAGTTGT TTTGGAAATT TTATGTAAAA TTAAAACTAT CAGTATTTTA 
1651 CAGATGTTTT AATTAGACAT TGTTATTAAC AGGAACAGTG CAGAAACTAG 
1701 AATCAAGCCT TATAATATCT TATAGACCAT GCATTTTTGA AGTTAGTGTC 
1751 CACTAGGGTC CTATTAACTG TACATTTGCA AGATTTCATT ATTTTTGCCT 
1801 CTGACACTAT GGGAAAAATT TTTTAGAAGC TATTGGGACA GATTCAAGCT 
1851 TTTATGCACT TGGTTACTAC AGCTGTAAAA TGAAATCTCG TCTTGTAGCA 
1901 TGGATTATTC TTCTCATGTT AAACCCACCA AAATAAAGGG GACTAAATAG 
1951 GTAATGATTT TCCTAGTGCA TTTGCATACT GTGATAATCC TGGGCCTTGC 
2001 AATAGTTCTA CAGGGCTCTT GGGCATTGAA TTATTAGGAT GTAATTGTAC 
2051 ATCATTGTAG TGTTCACCTT ATTGAAGCTC ACTCTGATGT TAATGAGCTT 
2101 CGGGTTTTGA TGCTTGTTTA GAGATCAGCA GTCTTGGATG GGAGGGAACA 
2151 AAGCTAAATA AATGTTAGTT TGGTGAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS599355 from database EMBL: 
human STS WI-13484. 
Score = 1262, P = 3.6e-52, identities = 274/289 



Medline entries 
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No Medline entry 



Peptide information for frame 2 



ORF from 125 bp to 901 bp; peptide length: 259 
Category: similarity to unknown protein 
Classification: no clue 



1 MAACRALKAV 
51 ESKQDLLERL 
101 DFKGIQTSDP 
151 RKDGLALGPG 
201 MIGDDCRDDV 
251 VDHILQHLL 



LVDLSGTLHI 
RKLEFDISED 
NAVVMGLAPE 
PFVTALEYAT 
GGAQDVGMLG 



EDAAVPGAQE 
EI FTSLTAAR 
HFHYQILNQA 
DTKATVVGKP 
ILVKTGKYRA 



ALKRLRGASV 
SLLERKQVRP 
FRLLLDGAPL 
EKTFFLEALR 
SDEEKINPPP 



IIRFVTNTTK 
MLLVDDRALP 
IAIHKARYYK 
GTGCEPEEAV 
YLTCESFPHA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78dl3, frame 2 

TREMBL:CEUK08B12_1 gene: "K08B12 . 3 " ; Caenorhabditis elegans cosmid 
K08B12., N = 1, Score = 609, P = 2.2e-59 



TREMBL:CEC13C4_5 gene: "C13C4.4" 
N = 1, Score = 408, P = 4.4e-38 



Caenorhabditis elegans cosmid C13C4, 



Caenorhabditis elegans cosmid 



>TREMBL:CEUK08B12_1 gene: "K08B12.3" 
K08B12. 

Length = 257 

HSPs : 

Score = 609 (91.4 bits), Expect = 2.2e-59, P = 2.2e-59 
Identities = 132/251 (52%), Positives = 172/251 (68%) 

Query: 7 LKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERLRKLEFD 66 

+ +VL+DLSGT+HIE+ A+PGAQ AL+ LR + + +FVTNTTKESK+ L +RL F 
Sbjct: 4 ISSVLIDLSGTIHIEEFAIPGAQTALELLRQHAKV-KFVTNTTKESKRLLHQRLINCGFK 62 

Query: 67 ISEDEIFTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPEHFHYQI 126 

+ ++E I FTSLTAAR L+ + Q RP +VDDRA+ DF+GI T DPNAVV+GLAPE F+ 
Sbjct: 63 VEKEEI FTSLTAARDLI VKNQYRPFFI VDDRAMEDFEGISTDDPNAVVIGLAPEKFNDTT 122 

Query: 127 LNQAFRLLLDG-APLIAIHKARY YKRKDGLALGPGPFVTALEYATDTKATVVGKPEKTFF 185 

L AFRL+ + A LIAI+K RY++ GL LGPG +V LEY+ +AT+VGKP K FF 
SbjCt: 123 LTHAFRLIKEKKASLIAINKGRYHQTNAGLCLGPGTYVAGLEYSAGVEATIVGKPNKLFF 182 

Query: 186 LEALRGTG — CEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYP.ASDEEKINPPPYLT 243 

AL+ + AVMIGDD DD GA +GM ILVKTGK+R DE K + 
SbjCt: 183 ESALQSLNENVDFSSAVMIGDDVNDDALGAIKIGMRAILVKTGKFRDGDELKVKN V 238 

Query: 244 CESFPHAVDHILQH 257 

SF AV+ I+++ 
Sbjct: 239 ANSFVDAVNMIIEN 252 



Pedant information for DKFZphfbr2_78dl3, frame 2 



Report for DKFZphf br2_78dl3 . 2 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

62 

[FUNCAT] 
[SUPFAM] 
[KW] 



259 

28536.04 
5.84 

TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid K08B12. 



r general function prediction 
nagD protein 4e-18 
Alpha_Beta 



[M. jannaschii, MJ1437] 3e-05 
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SEQ MAACRALKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERL 

PRD ccccccceeeeeecccceeeecccccchhhhhhhhhhccceeeeeeccccchhhhhhhhh 

SEQ RKLEFDI SEDEI FTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPE 

PRD hhhccccccceeeehhhhhhhhhhhhccceeeeeechhhhhhccccccccceeeeecccc 

SEQ HFHYQILNQAFRLLLDGAPLIAIHKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKP 

PRD chhhhhhhhhhhhhhccceeeeeccccccccccccccccccchhhhhhhhccceeeeccc 

SEQ EKTFFLEALRGTGCEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKINPPP 

PRD cchhhhhhhhhhccccceeeeecccchhhhhhhhhccceeeeeeeccccccccccccccc 

SEQ YLTCESFPHAVDHILQHLL 

PRD cccccchhhhhhhhhhccc 



(No Prosite data available for DKFZphfbr2_78dl3 .2) 
(No Pfam data available for DKFZphf br2_78dl3 . 2) 
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DKFZphfbr2_78k24 



group: metabolism 

DKFZphfbr2_78k24 encodes a novel 372 amino acid protein with similarity to Mus musculus 
ubiquitin specific protease UBP43. 

The novel protein contains a Prosite ubiquitin carboxyl-terminal hydrolases family 2 signature 
2. Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are 
thiol proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The new protein can find application in modulation of protein stability/degradation in cells. 



Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 



strong similarity to mouse ubiquitin specific protease UBP43 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 1874 bp 

Poly A stretch at pos. 1852, polyadenylation signal at pos . 1836 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



AGTCCCGACG 
GAGAGATTCC 
CGTGCTGTCC 
TGATCACGAA 
CATCCTGGCT 
AAGAAGACAG 
TGGGACTACC 
CTGCCTTAAC 
GGATATTGAA 
AGCGTCCCTT 
GCAGAAAGCA 
ACGTGCCCTT 
TGGAACCTGA 
GCAGGCCCTG 
GTGCCATGGA 
CTTTTTGATG 
CTGCTTCTTC 
AGAACTGTGG 
TTGCCCCAGA 
ACAGACGAGA 
TCAGCCAGAT 
TCTGGAGGGC 
AGACTCCGGT 
GGTTCTGCTT 
CAGTGTACCT 
TCTGGTTTAC 
AGATTGACAC 
TCTAAGAGAT 
GAGCCTTATT 
CCTCAGGTCC 
ATGTGGCTGC 
TTAGTTATGA 
GGCAGTGGGA 
GTATTATACA 
CTGTTTGTAA 
AGTGTTTTGT 
TCTTCTCCAT 
CCACAAAAAA 



TGGAACTCAG 
ATCGTGCCTG 
TGAACGCGGG 
TGAGCAAGGC 
GAGTCCTCGC 
CAACATGAAG 
CTCATGGCCT 
TCCTTGATTC 
GAGGATCACG 
TCCAGATGCT 
GTGCGGCCCC 
GTTTGTCCAA 
TTAAGGACCA 
TATACGATCC 
GAGTAGCAGA 
TGGACTCAAA 
CAGCCCAGGG 
GAAGAAGACC 
CCCTGACAAT 
AAGATCTGCC 
CCTTCCAATG 
AGTATGAGCT 
CATTACTGTG 
CAATGACTCC 
ACGGAAATCC 
ATGAAGATGG 
GCTGTCATTT 
TTTGCAATGA 
TATAATTAGG 
TGATCAGTCA 
TCGGTCCTGG 
GCCTGTGGGA 
GGCATCTGGG 
ACTGCTGTGA 
TTTTTCACTT 
AACTGCTATT 
AAGATAGTGT 
AAAAAAAAAA 



CAGCGGAGGC 
GCTCACATAA 
CCAGGCAGCT 
GTTTGGGCTC 
AGTCCCCGGC 
AGAGAGCAGC 
GGTTGGTTTA 
AGGTGTTCGT 
GTGCCCAGGG 
TCTGCTGCTG 
TGGAGCTGGC 
CATGATGCTG 
GATCACTGAT 
GGGTGAAGGA 
AACAGCAGCA 
GCCCCTGAAG 
AGTTATCAAG 
CGTGGGAAAC 
CCACCTCATG 
ACTCCCTGTA 
AAGCGAGAGT 
TTTTGCTGTG 
TCTACATCCG 
AATATTTGCT 
TAACTACCAC 
AGTGCTAATG 
TCCATTTCCG 
GGAGAAGCAT 
GATATTATCA 
GAATGGATGC 
GTGCTCGCTG 
ACTTCAGGGG 
GGCCAAAGGT 
CCAGACTTGT 
TGAGAACCAA 
CATTTATTCA 
GATAAACACA 
AAAA 



TGGACGCTTG 
GCGCTTCCTG 
GCGGCCTGGG 
CTGAGGCAAA 
AGATCTTGAA 
CCAGAGAGCG 
CACAACATTG 
AATGAATGTG 
GAGCTGACGA 
GAGAAGATGC 
CTACTGCCTG 
CCCAACTGTA 
GTGCACTTGG 
CTCCTTGATT 
TGCTCACCCT 
ACACTGGAGG 
CAAAAGCAAG 
AGGTCTTGAA 
CGATTCTCCA 
CTTCCCCCAG 
CTTGTGATGC 
ATTGCGCACG 
GAATGCTGTG 
TGGTGTCCTG 
TGGCAGGAAA 
GAAATGCCCA 
TTCCTGGATC 
TGTTTTCAAA 
AAATATGTAA 
TTTCACCAGC 
CTGTGCAAGA 
TTCCCAGTGG 
CAGTGGCAGG 
ATACTGGCTG 
CATTAATTCC 
GCAAATATTT 
GTCATGAATA 



CATGGCGCTT 
GAAGTGAAGT 
GGTTTTGGAG 
TCTGTCAGTC 
GAAAAGAAGG 
TCCCAGGGCC 
GACAGACCTG 
GACTTCACCA 
GCAGAGGAGA 
AGGACAGCCG 
CAGAAGTGCA 
CCTCAAACTC 
TGGAGAGACT 
TGCGTTGACT 
CCCACTTTCT 
ACGCCCTGCA 
TGCTTCTGTG 
GCTGACCCAT 
TCAGGAATTC 
AGCTTGGATT 
TGAGGAGCAG 
TGGGAATGGC 
GATGGAAAAT 
GGAAGACATC 
CTGCATATCT 
AAACCTTCAG 
TACGGAGTCT 
CTATATAACT 
CCATGAGGCC 
AGACCCGGCC 
CATTAGCCCT 
GGAGAGCAGT 
GGGTATTTCA 
AATATCAGTG 
ATATGAATCA 
ATTGATCATC 
AAGTTATTTT 



BLAST Results 



Entry AC005500 from database EMBL: 
, complete sequence. 

Score = 859, P = 5.7e-143, identities = 175/179 
8 exons matching Bp 317-1230 
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Medline entries 



99182491: 

A novel ubiquitin-specif ic protease, UBP43, cloned from leukemia 
fusion protein AMLl-ETO-expressing mice, functions in 
hematopoietic cell differentiation. 



Peptide information for frame 1 



ORF from 160 bp to 1275 bp; peptide length: 372 
Category: strong similarity to known protein 
Classification: Protein management 
Prosite motifs: UCH 2 2 (302-320) 



1 MSKAFGLLRQ ICQSILAESS QSPADLEEKK EEDSNMKREQ PRERPRAWDY 

51 PHGLVGLHNI GQTCCLNSLI QVFVMNVDFT RILKRITVPR GADEQRRSVP 

101 FQMLLLLEKM QDSRQKAVRP LELAYCLQKC NVPLFVQHDA AQLYLKLWNL 

151 IKDQITDVHL VERLQALYTI RVKDSLICVD CAMESSRNSS MLTLPLSLFD 

201 VDSKPLKTLE DALHCFFQPR ELSSKSKCFC ENCGKKTRGK QVLKLTHLPQ 

251 TLTIHLMRFS IRNSQTRKIC HSLYFPQSLD FSQILPMKRE SCDAEEQSGG 

301 QYELFAVIAH VGMADSGHYC VYIRNAVDGK HFCFNDSNIC LVSWEDIOCT 
351 YGNPNYHWQE TAYLLVYMKM EC 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_78k24 , frame 1 

TREMBLNEW : AFO 69502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds., N = 1, 
Score = 1367, P = le-139 

SWISSPROT:UBPE_DROME UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 64E (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 64E) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 64E) (DEUBIQUITINATING ENZYME 64E) . , N = 2, Score = 248, P = 
5.3e-33 



>TREMBLNEW: AF069502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds. 
Length = 368 

HSPs : 



Score = 1367 (205.1 bits), Expect = 1.0e-139, P = 1.0e-139 
Identities = 262/369 (71%), Positives = 295/369 (79%) 



Query: 


1 


MSKAFGLLRQICQS I LAESSQSPADLEEKKEEDSNMKREQPRERPRAWDY PHGLVGLHNI 


60 






M K FGLLR+ CQS++AE Q A LEE E KR R+ AWD PHGLVGLHNI 




Sbjct: 


1 


MGKGFGLLRKPCQSVVAEPQQYSA-LEE— ERTMKRKRVLSRDLCSAWDSPHGLVGLHNI 


57 


Query: 


61 


GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 


120 






GQTCCLNSL+QVF+MN+DF I LKRITVPR A+E++RSVPFQ+LLLLEKMQDSRQKA+ P 




Sbjct: 


58 


GQTCCLNSLLQVFMMNMDFRMILKRITVPRSAEERKRSVPFQLLLLLEKMQDSRQKALLP 


117 


Query: 


121 


LELAYCLQKCNVPLFVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTIRVKDSLICVD 


180 






EL CLQK NVFLFVQHDAAQLYL +WNL KDQITD L ERLQ L+TI ++SLICV 




Sbjct: 


118 


TELVQCLQKYNVPLFVQHDAAQLYLTIWNLTKDQITDTDLTERLQGLFTIWTQESLICVG 


177 


Query: 


181 


CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 


240 






C ESSR S +LTL L LFD D+KPLKTLEDAL CF QP+EL+S C CE CG+KT K 




Sbjct : 


178 


CTAESSRRSKLLTLSLPLFDKDAKPLKTLEDALRCFVQPKELASSDMC-CETCGEKTPWK 


236 


Query: 


241 


QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 


300 






QVLKLTHLPQTLTIHLMRFS RNS+T KICHS+ FPQSLDFSQ+LP + + D +EQS 




Sbjct: 


237 


QVLKLTHLPQTLTIHLMRFSARNSRTEKICHSVNFPQSLDFSQVLPTEEDLGDTKEQSEI 


296 


Query: 


301 


QYELFAVIAHVGMADSGHYCVYIRNAVDGKWFCFNDSNICLVSWEDIQCT YGNPNYHWQE 


360 






YELFAVIAHVGMAD GHYC YIRN VDGKWFCFNDS++C V+W+D+QCTYGN Y W+E 




Sbjct: 


297 


HYELFAVIAHVGMADFGHYCAYIRNPVDGKWFCFNDSHVCWVTWKDVQCTYGNHRYRWRE 


356 


Query: 


361 


TAYLLVYMK 369 
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TAYLLVY K 
Sbjct: 357 TAYLLVYTK 365 



Pedant information for DKFZphfbr2_78k24, frame 1 



Report for DKFZphfbr2_78k24 . 1 



[LENGTH] 


372 






[MW] 


43011.12 






[pi] 


8.05 






[HOMOL] 


TREMBLNEW:AF0 69502 1 product: "ubiquitin specific protease UBP43"; Mus 


musculus 


ubiquitin specific protease UBP43 mRNA, complete cds. le-151 






[FUNCAT] 


06.13 proteolysis [S. cerevisiae, YMR304w] 3e-19 






L r u iN^rt i j 


0 £ IT 01 r~vt~ 1 A =;m ir" Hpnr^Hat" i nn [ Q rprpui ap Y1TT1 1 97 w 1 ^p- 1 






[FUNCAT] 


06.07 protein modification (glycolsylation, acylation, myristylation, 






palmitylation, 


f arnesylation and processing) [S. cerevisiae, YMR223w] le-15 






[ FUNCAT] 


04.05.01.04 transcriptional control [S. cerevisiae, YNL186w] 6e-12 






[ FUNCAT ] 


03.10 sporulation and germination [S. cerevisiae, YDR069c] 9e-ll 






[ FUNCAT ] 


10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 


9e- 


ll 


[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YDR069c] 9e-ll 






[ FUNCAT ] 


30.03 organization of cytoplasm [S. cerevisiae, YDR069c] 9e-ll 






[FUNCAT] 


09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 


Se- 


11 


[BLOCKS] 


BL00582A Ribosomal protein L33 proteins 






[BLOCKS] 


BL00972E 






[BLOCKS] 


BL00972D 






[BLOCKS] 


BL00972A 






[EC] 


2.4.2.29 Queuine tRNA-ribosyltransf erase le-06 






[PIRKW] 


pentosyltransferase le-06 






[PIRKW] 


glycosyltransferase le-06 






[PIRKW] 


tRNA modification le-06 






[PIRKW] 


alternative splicing 7e-ll 






[PIRKW] 


hydrolase 7e-06 






[SUPFAM] 


deubiquinating enzyme SSV7 2e-09 






[PROSITE] 


UCH_2_2 1 






[PFAM] 


Ubiquitin carboxyl-terminal hydrolases family 2 






[PFAM] 


Ubiquitin carboxyl-terminal hydrolases family 2 






[KW] 


Alpha_Beta 







SEQ MSKAFGLLRQICQS I LAES SQS PADLEEKKEEDSNMKREQPRERPRAWDYPHGLVGLHN I 

PRD cccceeechhhhhhhhcccccccchhhhhhhhcccccccccccccccccccccccccccc 

SEQ GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 

PRD cceeehhhhhhhhhcccchhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhccccc 

SEQ LELAYCLQKCNVPLFVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTIRVKDSLICVD 

PRD hhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhheeeee 

SEQ CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhcccccccceeecccccccccc 

SEQ QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 

PRD cceeeecccchhhhhhhhhhhccchhhhhccccccccccccccccccccccccccccccc 

SEQ QYELFAVIAHVGMADSGHYCVYIRNAVDGKWFCFNDSNICLVSWEDIQCTYGNPNYHWQE 

PRD eeeeeeeeeeeccccccceeeeeecccccceeeeccceeeeeecccccccccccccchhh 

SEQ TAYLLVYMKMEC 

PRD hhhhhhhhhccc 



Prosite for DKFZphfbr2_78k24 . 1 
PS00973 302->320 UCH_2_2 PDOC00750 



Pfam for DKFZphf br2_78k24 . 1 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM *GIqNlGNTCYMNSIIQCL* 

G+ N+G TC +NS+IQ+ 
Query 56 GLHNIGQTCCLNSLIQVF 73 
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HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * YdLYgVICHYGntldyGHYWaYVKNenhHRWkWYYFDDEtV* 

Y+L++VI H G D+GHY +Y++N ++KW++F+D+++ 
Query 302 YELFAVIAHVG-MADSGHYCVYIRNAV— DGKWFCFNDSNI 339 
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DKFZphfbr2_78n23 



group: brain derived 

DKFZphfbr2_78n23 encodes a novel 329 amino acid protein with similarity to A.thaliana 
F26P21.80 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A.thaliana F26P21.30 
Sequenced by MediGenomix 

Locus: /map-"89.1 cR from top of Chrl9 linkage group" 
Insert length: 1447 bp 

Poly A stretch at pos . 1374, polyadenylation signal at pos . 1353 



1 TACAACTTCC GGCTGTAAAG ATGGCGGCTT CCTAGTGAGT CGGCGGCTGA 

51 CTTAGAAGGA GGTTCAGGCT ACGGTGAGCC GAAGCCACAC AGGAGCCATG 

101 GAAGTGGCAG AGCCCAGCAG CCCCACTGAA GAGGAGGAGG AGGAAGAGGA 

151 GCACTCGGCA GAGCCTCGGC CCCGCACTCG CTCCAATCCT GAAGGGGCTG 

201 AGGACCGGGC AGTAGGGGCA CAGGCCAGCG TGGGCAGCCG CAGCGAGGGT 

251 GAGGGTGAGG CCGCCAGTGC TGATGATGGG AGCCTCAACA CTTCAGGAGC 

301 CGGCCCTAAG TCCTGGCAGG TGCCCCCGCC AGCCCCTGAG GTCCAAATTC 

351 GGACACCAAG GGTCAACTGT CCAGAGAAAG TGATTATCTG CCTGGACCTG 

401 TCAGAGGAAA TGTCACTGCC AAAGCTGGAG TCGTTCAACG GCTCCAAAAC 

451 CAACGCCCTC AATGTCTCTC AG AA GAT GAT TGAGATGTTC GTGCGGACAA 

501 AACACAAGAT CGACAAAAGC CACGAGTTTG CACTGGTGGT GGTGAACGAT 

551 GACACGGCCT GGCTGTCTGG CCTGACCTCC GACCCCCGCG AGCTCTGTAG 

601 CTGCCTCTAT GATCTGGAGA CGGCCTCCTG TTCCACCTTC AATCTGGAAG 

651 GACTTTTCAG CCTCATCCAG CAGAAAACTG AGCTTCCGGT CACAGAGAAC 

701 GTGCAGACGA TTCCCCCGCC ATATGTGGTC CGCACCATCC TTGTCTACAG 

751 CCGTCCACCT TGCCAGCCCC AGTTCTCCTT GACGGAGCCC ATGAAGAAAA 

801 TGTTCCAGTG CCCATATTTC TTCTTTGACG TTGTTTACAT CCACAATGGC 

851 ACTGAGGAGA AGGAGGAGGA GATGAGTTGG AAGGATATGT TTGCCTTCAT 

901 GGGCAGCCTG GATACCAAGG GTACCAGCTA CAAGTATGAG GTGGCACTGG 

951 CTGGGCCAGC CCTGGAGTTG CACAACTGCA TGGCGAAACT GTTGGCCCAC 

1001 CCCCTGCAGC GGCCTTGCCA GAGCCATGCT TCCTACAGCC TGCTGGAGGA 

1051 GGAGGATGAA GCCATTGAGG TTGAGGCCAC TGTCTGAACC ATCCCTGTAC 

1101 ATCTGCACCT TCTTGTGCAA GGAAGTCCTT GGCCTAAAGC CTTGGTTCTC 

1151 AAACTGGGTT CCTTGGGACC TCCGGGGTGG GGGGGTTCCA GGAGGCACGT 

1201 AGGGTACCTT GCAGGGTCCT AGGAGGGAAA CCCAGGATTC CAGGAGGGAT 

1251 CCCAGGAACT GTGGGCACCC ATTTTCTGTG TCTCCCAGCC CATTTCCACT 

1301 CCTAGTTTGT CATGGATAAT TTTTGTTCTT CCCTGTGTGA TTTTTGCCAT 

1351 CAAAATAAAA ATTTGAGACT CGTTAAAAAA AAAAAAAAAA AAAAAAAAAA 

1401 AAAAAAAAAA AAAAAAAAAA AAAAAAGAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS806352 from database EMBL: 
human STS EST192543. 
Score = 1285, P = 2.5e-51, identities = 263/266 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 98 bp to 1084 bp; peptide length: 329 
Category: similarity to unknown protein 
Classification: no clue 

1 MEVAEPSSPT EEEEEEEEHS AEPRPP.TRSN PEGAEDRAVG AQASVGSRSE 
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51 GEGEAASADD GSLNTSGAGP 
101 LSEEMSLPKL ESFNGSKTNA 
151 DDTAWLSGLT SDPRELCSCL 
201 NVQTIPPPYV VRTILVYSRP 
251 GTEEKEEEMS WKDMFAFMGS 
301 HPLQRPCQSH ASYSLLEEED 



KSWQVPPPAP EVQIRTPRVN CPEKVIICLD 
LNVSQKMIEM FVRTKHKIDK SHEFALVVVN 
YDLETASCST FNLEGLFSLI QQKTELPVTE 
PCQPQFSLTE PMKKMFQCPY FFFDVVYIHN 
LDTKGTS YKY E VAL AG PALE LHNCMAKLLA 
EAIEVEATV 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78n23, frame 2 

PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana, N = 
1, Score = 142, P = 1.5e-07 



>PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 
Length =2 64 

HSPs: 

Score = 142 (21.3 bits). Expect = l.Se-07, P = 1.5e-07 
Identities = 56/216 (25%), Positives = 97/216 (44%) 

Query: 93 EKVIICLDL-SEEMSLPKLESFNGSKTNALNVSQKMIEMFVRTKHKIDKSHEFALVVVND 151 

E ++IC+D+ +E M K NG + ++ I +F+ K 1+ H FA + 

Sbjct: 26 EDILICIDVDAESMVEMKTTGTNGRPLIRMECVKQAIILFIHNKLSINPDHRFAFATLAK 85 

Query: 152 DTAWLSG-LTSDPRELCSCLYDLE-TASCSTFNLEGLFSLIQQKTELPVTENVQTIPPPY 209 

AWL TSD + L L S S +L LF Q+ ++ +N 
SbjCt: 86 SAAWLKKEFTSDAESAVASLRGLSGNKSSSRMLTI.L-RAAAQEAKVSRAQN R 138 

Query: 210 VVRTILVYSRPPCQPQFSLTEPMKKMFQCPYFFFDVVYIHNGTEEKEEEMSWKDMF-AFM 268 

+ R IL+Y R +P P+ + F DV+Y+H ++ + +D++ + + 
SbjCt: 139 IFRVILI YCRSSMRPTHEW--PLNQKL FTLDVMYLH DKPSPDNCPQDVYDSLV 189 

Query: 269 GSLD--TKGTSYKYEVALAGPALELHNCMAKLLAHPLQRPCQ 308 

+++ ++ Y +E G A + M+ LL HP QR Q 
Sbjct: 190 DAVEHVSEYEGYIFESG-QGLARSVFKPMSMLLTHPQQRCAQ 230 



Pedant information for DKFZphfbr2_78n23, frame 2 



Report for DKFZphfbr2_78n23 . 2 



[LENGTH] 329 

[MW] 36560.10 

[pi] 4.60 

[HOMOL] PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 7e-07 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 9.73 % 

SEQ MEVAEPSSPTEEEEEEEEHSAEPRPRTRSNPEGAEDRAVGAQASVGSRSEGEGEAASADD 

SEG . xxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhccccccccccccccc 

SEQ GSLNTSGAGPKSWQVPPPAPEVQIRTPRVNCPEKVIICLDLSEEMSLPKLESFNGSKTNA 

SEG 

PRD ccccccccccccccccccccceeeccccccccceeeeeccccccccccccccccccccee 

SEQ LNVSQKMIEMFVRTKHKIDKSHEFALVVVNDDTAWLSGLTSDPRELCSCL YDLETASCST 

SEG 

PRD ehhhhhhhhhhhhhhhccccccceeeeeeccchhhhhcccccchhhhhhhhhcccccccc 

SEQ FNLEGLFSLIQQKTELPVTENVQTIPPPYVVRTILVYSRPPCQPQFSLTEPMKKMFQCPY 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccceeeeeeecccccccccccchhhhhheeee 

SEQ FFFDVVYIHNGTEEKEEEMSWKDMFAFMGSLDTKGTSYKYEVALAGPALELHNCMAKLLA 

SEG 

PRD eeeeeeeeccccchhhhhhhhhhhhhhhhcccccccceeeeecccccchhhhhhhhhhhh 

SEQ HPLQRPCQSHASYSLLEEEDEAIEVEATV 

SEG xxxxxxxxxx . . . 

PRD hcccccccccchhhhhhhhhhhhhhhccc 
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(No Prosite data available for DKFZphfbr2_78n23.2) 
(No Pfam data available for DKFZphf br2_78n23 . 2) 
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DKFZphfbr2_7a24 



group: brain derived 

DKFZphfbr2_7a24 encodes a novel 142 amino acid protein with similarity to the C-terminal part 
of transforming growth factor-beta activated kinases. 

The novel protein shows only similarity to the C-terminus of such kinases; no kinase domain is 
present . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



similarity to C-terminus of TGF-beta-activated kinase 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1597 bp 

No poly A stretch found, no polyadenylation signal found 



1 GGGGAGAGAG GGGTTGTGAA GGGAAGCGGA AGGGAAGGGA AGGGAGGTCC 

51 CGTGGGACGC TGGGGTCTGG GGTAGAGCAG GTAGCAGCGT GCTGCCCTGA 

101 CAGCTGTCTC CGCTCCTCAG ATTGTCAGTG GCTGCTATGC AGCAGGTGCA 

151 GCCTGGTCTC TCACTGAGTC TCTACTCCAC AAAGGCAACG ACTGGCCAAG 

201 GCAGTGGCTG GCTCTGGGTT ACACAAGTGC AGACACTCAA CTAAGTGAGC 

251 TGGAAGACCC AGGAGAAGGC GGAGGCTCAG GTGCCCACAT GATCAGCACA 

301 GCCAGGGTAC CTGCTGACAA GCCTGTACGC ATCGCCTTTA GCCTCAATGA 

351 CGCCTCAGAT GATACACCCC CTGAAGACTC CATTCCTTTG GTCTTTCCAG 

401 AATTAGACCA GCAGCTACAG CCCCTGCCGC CTTGTCATGA CTCCGAGGAA 

4 51 TCCATGGAGG TGTTCAGACA GCACTGCCAA ATAGCAGAAG AATACCTTGA 

501 GGTCAAAAAG GAAATCACCC TGCTTGAGCA AAGGAAGAAG GAGCTCATTG 

551 CCAAGTTAGA TCAGGCAGAA GAGGAGAAGG TGGATGCTGC TGAGCTGGTT 

601 CGGGAATTCG AGGCTCTGAC GGAGGAGAAT CGGACGTTGA GGTTGGCCCA 

651 GTCTCAATGT GTGGAACAAC TGGAGAAACT TCGAATACAG TATCAGAAGA 

701 GGCAGGGCTC GTCCTAACTT TAAATTTTTC AGTGTGAGCA TACGAGGCTG 

751 ATGACTGCCC TGTGCTGGCC AAAAGATTTT TATTTTAAAT GAATAGTGAG 

801 TCAGATCTAT TGCTTCTCTG TATTACCCAC ATGACAACTG TCTATAATGA 

851 GTTTACTGCT TGCCAGCTTC TAGCTTGAGA GAAGGGATAT TTTAAATGAG 

901 ATCATTAACG TGAAACTATT ACTAGTATAT GTTTTTGGAG ATCAGAATTC 

951 TTTTCCAAAG ATATATGTTT TTTTCTTTTT TAGGAAGATA TGATCATGCT 

1001 GTACAACAGG GTAGAAAATG GTAAAAATAG ACTATTGACT GACCCAGCTA 

1051 AGAATCGCGG GCTGAGCAGA GTTAAACCAT GGGACAAACC CATAACATGT 

1101 TCACCATAGT TTCACGTATG TGTATTTTTA AATTTCATGC CTTTAATATT 

1151 TCAAATATGC TCAAATTTAA ACTGTCAGAA ACTTCTCTGC ATGTATTTAT 

1201 ATTTGCCAGA GTATAAACTT TTATACTCTG ATTTTTATCC TTCAATGATT 

1251 GATTATACTA AGAATAAATG GTCACATATC CTAAAAGCTT CTTCATGAAA 

1301 TTATTAGCAG AAACCATGTT TGAAACCAAA GCACATTTGC CAATGCTAAC 

1351 TGGCTGTTGT AATAATAAAC AGATAAGGCT GCATTTGCTT CATGCCATGT 

1401 GACCTCACAG TAAACATCTC TGCCTTTGCC TGTGTGTGTT CTGGGGGAGG 

14 51 GGGGACATGG AAAAATATTG TTTGGACATT ACTTGGGTGA GTGCCCATGA 

1501 AGACATCAGT GAACTTGTAA CTATTGTTTT GTTTTGGATT TAAGGAGATG 

1551 TTTTAGATCA GTAACAGCTA ATAGGAATAT GCGAGTAAAT TCAGAATTGA 

1601 AACAATTTCT CCTTGTTCTA CCTATCACCA CATTTTCTCA AATTGAACTC 

1551 TTTGTTATAT GTCCATTTCT ATTCATGTAA CTTCTTTTTC ATT AAA C 



BLAST Results 



No BLAST result 



Medline entries 



98130593: 

Role of TAK1 and TAB1 in BMP signaling in early Xenopus 

development . 
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Peptide information for frame 1 



ORF from 289 bp to 714 bp; peptide length: 142 
Category: similarity to known protein 



1 MISTARVPAD KPVRIAFSLN DASDDTPPED SIPLVFPELD QQLQPLPPCH 
51 DSEESMEVFR QHCQIAEEYL EVKKEITLLE QRKKELIAKL DQAEEEKVDA 
101 AELVREFEAL TEENRTLRLA QSQCVEQLEK LRIOYOKRQG SS 

BLASTP hits 

Entry U92030_l from database TREMBL : 

product: "TAK1"; Xenopus laevis TGF-beta-activated kinase TAK1 mRNA, 
complete cds. 

Score = 343, P = 1.3e-30, identities = 69/143, positives = 104/143 
Entry AB009356_1 from database TREMBL : 

product: "TGF-beta activated kinase la"; Homo sapiens mRNA for 
TGF-beta activated kinase la, complete cds. 

Score = 339, P = 2 . 6e-30, identities = 67/143, positives = 104/143 
Entry MMPK_1 from database TREMBL : 

product: "TAK1 (TGF-beta-activated kinase)"; Mouse mRNA for TAK1 
(TGF-beta-activated kinase), complete cds. 

Score = 339, P = 2.6e-30, identities = 67/143, positives = 104/143 
Entry AB009357_1 from database TREMBL : 

product: "TGF-beta activated kinase lb"; Homo sapiens mRNA for 
TGF-beta activated kinase lb, complete cds. 

Score = 339, P - 3.2e-30, identities = 67/143, positives = 104/143 
Entry AB009358_1 from database TREMBL : 

product: "TGF-beta activated kinase lc"; Homo sapiens mRNA for 
TGF-beta activated kinase lc, complete cds. 

Score = 144, P = 3.8e-09, identities = 30/67, positives = 47/67 



Alert BLASTP hits for DKFZphfbr2_7a24, frame 1 

PIR:JC5955 transforming growth factor-beta activated kinase (EC 
-.-.-.-) la - Human, N = 1, Score = 339, P = 3e-30 

>PIR:JC5955 transforming growth factor-beta activated kinase (EC -.-.-.-) la 
- Human 

Length = 579 

HSPS : 

Score = 339 (50.9 bits), Expect = 3.0e-30, P = 3.0e-30 
Identities = 67/143 (46%), Positives = 104/143 (72%) 

Query: 1 MISTARVPADKPVRI-AFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVF 59 

MI+T+ ++KP R ++ +D++D ++SIP+ + LD QLQPL PC +S+ESM VF 
Sbjct: 437 MITTSGPTSEKPTRSHPWTPDDSTDTNGSDNSIPMAYLTLDHQLQPLAPCPNSKESMAVF 496 

Query: 60 RQHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRL 119 

QHC++A+EY++V+ EI LL QRK+EL+A+LDQ E+++ + + LV+E + L +EN++L 
Sbjct: 497 EQHCKMAQEYMKVQTEIALLLQRKQELVAELDQDEKDQQNTSRLVQEHKKLLDENKSLST 556 

Query: 120 AQSQCVEQLEKLRIQYQKRQGSS 142 

QC +QLE +R Q QKRQG+S 
Sbjct: 557 YYQQCKKQLEVIRSQQQKRQGTS 579 

Pedant information for DKFZphfbr2_7a24, frame 1 

Report for DKFZphfbr2_7a24 . 1 

[LENGTH] 142 

[MW] 16377.53 

[pi] 4.64 

[HOMOL] TREMBL :U92030_1 product: "TAK1"; Xenopus laevis TGF-beta-activated kinase TAK1 

mRNA, complete cds. 6e-2 6 
[PROSITE] CK2_PHOSPHO_SITE 3 
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[PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 1 

[PFAM] TNFR/NGFR cysteine-rich region 

[KW] AU_Alpha 

[KW] LOWCOMPLEXITY 7.04 % 

[KW] COILED_COIL 33.10 % 



SEQ MISTARVPADKPVRIAFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVFR 

SEG xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccccccccchhhhhhh 

COILS 

SEQ QHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhh 

COILS . . .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QSQCVEQLEKLRI QYQKRQGSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccc 

COILS 



Prosite for DKFZphf br2_7a24 . 1 



PS00001 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 



114->113 
4->7 

116->119 
18->22 
26->30 
77->81 



ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC03001 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 



Pfara for DKFZphf br2_7a24 . 1 



HMM_NAME TNFR/NGFR cysteine-rich region 

HMM * CpeGt Y t DWNHvpqClpCt rCePEMGQYMvqPCTwTQNTVC * 

C++++ + + +Q C++ E+ ++++++ T + ++ 
Query 4 9 CHDSEESMEVF-RQH — CQIAEE — YLEVKKEITLLEQRKK 84 
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DKFZphfbr2_7e22 



group: brain derived 

DKFZphfbr2_7e22 .2 encodes a novel 286 amino acid protein similar to b561 cytochromes 

The new protein shows strong similarity to B561 cytochromes, but contains no heme binding 
site. In addition, a myc-type, helix-loop-helix dimerization domain domain is present. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



strong similarity to cytochrome bS61 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 4254 bp 

Poly A stretch at pos . 4234, polyadenylation signal at pos. 4217 



1 GGGGACTACC CAGAGGGCTG CCGCCGCCTC TCCAAGITCT TGTGGCCCCC 
51 GCGGTGCGGA GTATGGGGCG CTGATGGCCA TGGAGGGCTA CCGGCGCTTC 
101 CTGGCGCTGC TGGGGTCGGC ACTGCTCGTC GGCTTCCTGT CGGTGATCTT 
151 CGCCCTCGTC TGGGTCCTCC ACTACCGAGA GGGGCTTGGC TGGGATGGGA 
201 GCGCACTAGA GTTTAACTGG CACCCAGTGC TCATGGTCAC CGGCTTCGTC 
251 TTCATCCAGG GCATCGCCAT CATCGTCTAC AGACTGCCGT GGACCTGGAA 
301 ATGCAGCAAG CTCCTGATGA AATCCATCCA TGCAGGGTTA AATGCAGTTG 
351 CTGCCATTCT TGCAATTATC TCTGTGGTGG CCGTGTTTGA GAACCACAAT 
401 GTTAACAATA TAGCCAATAT GTACAGTCTG CACAGCTGGG TTGGACTGAT 
451 AGCTGTCATA TGCTATTTGT TACAGCTTCT TTCAGGTTTT TCAGTCTTTC 
501 TGCTTCCATG GGCTCCGCTT TCTCTCCGAG CATTTCTCAT GCCCATACAT 
551 GTTTATTCTG GAATTGTCAT CTTTGGAACA GTGATTGCAA CAGCACTTAT 
601 GGGATTGACA GAGAAACTGA TTTTTTCCCT GAGAGATCCT GCATACAGTA 
651 CATTCCCGCC AGAAGGTGTT TTCGTAAATA CGCTTGGCCT TCTGATCCTG 
701 GTGTTCGGGG CCCTCATTTT TTGGATAGTC ACCAGACCGC AATGGAAACG 
751 TCCTAAGGAG CCAAATTCTA CCATTCTTCA TCCAAATGGA GGCACTGAAC 
801 AGGGAGCAAG AGGTTCCATG CCAGCCTACT CTGGCAACAA CATGGACAAA 
851 TCAGATTCAG AGTTAAACAA TGAAGTAGCA GCAAGGAAAA GAAACTTAGC 
901 TCTGGATGAG GCTGGGCAGA GATCTACCAT GTAAAATGTT GTAGAGATAG 
951 AGCCATATAA CGTCACGTTT CAAAACTAGC TCTACAGTTT TGCTTCTCCT 
1001 ATTAGCCATA TGATAATTGG GCTATGTAGT ATCAATATTT ACTTTAATCA 
1051 CAAAGGATGG TTTCTTGAAA TAATTTGTAT TGATTGAGGC CTATGAACTG 
1101 ACCTGAATTG GAAAGGATGT GATTAATATA AATAATAGCA GATATAAATT 
1151 GTGGTTATGT TACCTTTATC TTGTTGAGGA CCACAACATT AGCACGGTGC 
1201 CTTGTGCAGA ATAGATACTC AATATGTGAA TATGTGTCTA CTAGTAGTTA 
1251 ATTGGATAAA CTGGCAGCAT CCCTGGCCTG TTGTCATGCA GTCATTTCCT 
1301 GTTAATTCTG GGAGACAATG ATTTCACAAC TAGAGGGAAG CAGTCCTAAA 
1351 AGTTTAAAAT CCGATAAGGA ATATCTGGGA CAGGGTTTAG ATCATGACTC 
1401 TACACAGATA CCATGATGAG AGTATATTAA AGAAATTTAG GAAAGCACCT 
1451 GGTTCCTTTC TCCCCATGCC TGCCTTCTGC TCCCTCCCCA GCTGGTTTGG 
1501 GCTCAAATTG TCCCTGGAGA CTAGGGTTTA TGTTAGGGTA TTGATAGATT 
1551 AGAGCAGGTG GTTGAAGAGA TCTTCTCTGG TCAGACTTGG AAGAATTTCC 
1601 AAAAGTGAAG TTAGCCCCAA GACTTCCCTA GGGTTGATGT ACTTTATGAT 
1651 CCAGATGCTA AACTTCTTAG AATGAAAATA TGCTTCAACA CTTAAGTAGC 
1701 ATACACTGCC CTACAAACCT CAGAGAGCAC TTTTCCCCAA GTTCTTGTTT 
1751 TTATTTTTGA AAGTACTCAC ACAGCACTTA CTATGCTCCA AACACTCCTC 
1801 TAAGCACTTT ACACATATTA GCTCATTCAG TCCCCAGACA GACGGGATGA 
1851 AGTAGGTATT GTTACTGTTC CCATTTTACA GGTGAGAGAT TTGAAGCCTG 
1901 GGGAGGCTAG TAACTCACCC CAAGGTCACA CGGCTCATAC ATGGTGGGAC 
1951 TGAGACTCAG ATGCAGGCAG TCTGGCACCT CAGTCTGGAT TCTAACCATT 
2001 TCACTAAGCT ATTTTTGTCT TGTACTACTT TGACCCACCC CTGAATAAAC 
2051 CTCAATTGCT GGAGTGGGGT GTAGTTATTA AAGGGATGCT TTTTACCTTT 
2101 TGCTGTCTGC TGTGGCAGAT TCCCCAGATA ACCAAGGAAA AGGGGCCACC 
2151 CATACCTGGA AATAGGCCAT AGGGCCCCTA CTACTGCCAA CAAGCCATGG 
2201 CCTACCTTGA CACTTGTTTG ATCTTAAAAT TGTGTCTTGG TAACAAAAGA 
2251 TTTGGACAGG CATATCTGTA GCTTTCAAGT TAATTAATTG CAATATTTTT 
2301 TTCTTCAGGA TTTTAGCTGC TGAACAACTT TCAGTTTGGA GCTAAAAGAG 
2 351 ACCTGTCTCA TGGTCTGCCC TTCCCTGGGG CAATAGCTAG GGTCTTTCCT 
24 01 GATTTTTATG GAATTTTAGG GGATATTTTG AGCTTTGGGT TCTCAGTAGT 
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24 51 GAATTGAGAC TTGGAGGTGA CTTTTCATGT TTGGAGTATC ATCTCTGTCT 
2501 GGGCTCTGGG CTGACAAATT AAAACCTAGA GTAGTGCTTA TGCTGAAATG 
2551 ATACTTTTCA TTTTTTGGTT GATTTTTTTG CCTTCCCTTC AATTTTAAAC 
2 601 TGAAGCATTT TAATGTGGGT AGAAACTCTA CACCAAATAC ACTAAACATT 
2 651 TTGGTGCTTA GTGGATTTCT TTTTAGGTAA CTGGTACTTA CTTCCAAAGA 
27 01 CTGAATACAA GCCACACTCC ATCATATCCC TTAAACTTCA TGAAAAACCA 

27 51 TTCAAGATCC CCTTGCTGCA ACACTGTTCT CTTCTTCTCT ACTAAATTCT 
2801 ATTTCCAAAA TTGGTAATAG AGCCAGAAGG ATCCCCAGTA CCCAGCCCTC 

28 51 TGCCTGGCAC AAAGTGGTAG CACAATTAAA TTCAGTATGG GTGGAGCATG 
2901 GTACAGTCTT GGTGCCATAG AAGGAGTAGT TGCATAGTCA CACATCATTT 
2951 GATAAGTTGG ATGTTCCATT ACATAGAGGA ACACAAAATT CCAGGGTTTT 
3001 TGGAGGAAGG GATTAGATAG CGACTAAGCC GCCAGAATTG AGGTGGCCAT 
3051 TCCTTTTTGT ATAGGCTAAG AAACAGGTTA TCAGTGAAAA GTTAATTATG 
3101 GCTTTGGCAC TAGAATAGCA CTGTTGCAAA GTATTTAAGC ACCCCCCATC 
3151 TCAGCCCTTT ATTTTATCTT TCATGTGGGC TAATGTGAGG ATAATCTTAC 
3201 AGATATTATA GGAATTTCTT TTCTATCTTT ATGAAAACAA C GT AT AT AAA 
3251 ATATATCTAG AAAACCTTTG TTTGAGACTC TTATTTAATG GGCTTTTGAT 
3301 TCTAATGATA ATTGTACCTT TATCTTTCAA AAGCTGATAT TTCCTACCTA 
3351 AGCATCTCCC GAGAAAAATA TCTCATTAAA AAGCCCATAA ATAATAGGGG 
3401 AGAAGAAAGC CTTAGGTATC AATTCCAAAA CAGTGATTGA AATTTCCCAA 
3451 AATAATTATG GCTTCTGTCA TCTCCAGAGA TAATCTGGCT TGGTTTACCC 
3501 CATAATCTAA TTTCAGAAAA GAAAGCTTTA TTTTAACACT CATCTGAATC 
3551 AACATTAAAG CCTTTTCTCT CAAAGCGTTT ATTGAGAAAC TCAAATGAAT 
3601 ATACTTTTTG AATTACTGTC ATCAAAAGTG TACGGCTTCC TGTGCTGCTT 
3651 GTGTCAAATG GAACCTGCCC TCTAAAGCAC TTTCTTTCCT TTACTTGCGT 
3701 GGTTTCATGT AAGCTGTGCT GTTTAGAAAC AACATCTCAG ACTTTACAAA 
3751 GAAATGACAA AGAAGGCAAT TGCACTTTTT AAGGGATATC GACAAGCAGT 
3801 TTCTGTTTTC TAAAGGACAA AATACAGAGT GTGTGTCATT TTTAATTAGA 
38 51 TTCTTTCCCC TGCTGAGTTG GAAATTCCAG TGCAGCACTG ATTGACCACA 
3901 GTTGCCAATC TAAAAGCACA AAGACAGAAG TAAAGCTTTA TGCTAATTTT 
3951 ATTTCAATAT GATAGAAAAT TTATCTTGGT ATGTCCTTTT TTAGATAACT 
4001 CCAGCAGGAA ACTGTAACTG CTATGTCTTT AGGAAAACGT AGAAGAAAGA 
4051 ACATTATTAT TCTTTAATTC CTACAAGGTA CTTGAAAACC TTAAGTGAAA 
4101 AAGATTTCTA TCTTTTTATC TTGGCGCATT TATGGAAAAA ATATTAACTG 
4151 TCCTGAATAT TTTATAATTT TGTAGGAAAA ATATGCATCT ATTTTTTCTT 
4201 GACTTCTTTT ATATAGTAAT AAAAGTTATT TTGGAAAAAA AAAAAAAAAA 
4251 AAAA 



BLAST Results 



Entry HSG20626 from database EMBL: 
human STS A005Z27. 
Score = 860, P = 3.0e-32, identities = 176/181 



Medline entries 



89030633: 

The structure of cytochrome b561, a secretory vesicle-specific electron 
transport protein. 



Peptide information for frame 2 



ORF from 74 bp to 931 bp; peptide length: 286 
Category: strong similarity to known protein 
Classification: unset 



1 MAMEGYRRFL ALLGSALLVG FLSVIFALVW VLHYREGLGW DGSALEFNWH 
51 PVLMVTGFVF IQGIAIIVYR LPWTWKCSKL LMKSIHAGLN AVAAILAIIS 
101 VVAVFENHNV NNIANMYSLH SWVGLIAVIC YLLQLLSGFS VFLLPWAPLS 
151 LRAFLMPIHV YSGIVIFGTV IATALMGLTE KLIFSLRDPA YSTFPPEGVF 
201 VNTLGLLILV FGALIFWIVT RPOWKRPKEP NSTILHPNGG TEQGARGSMP 
251 AYSGNNMDKS DSELNNEVAA RKRNLALDEA GQRSTM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_7e22, frame 2 
SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561)., N = 1, Score 



341 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



= 460, P = 1.3e-43 

PIR:S01167 cytochrome b561 - bovine, N = 1, Score = 457, P 

SWISSPROT:C561_PIG CYTOCHROME B561 (CYTOCHROME B-561)., N 
452, P = 9.1e-43 

PIR:S53321 cytochrome B561 - human, N = 1, Score = 451, P 



>SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561). 
Length = 252 

HSPs: 



Score 


= 4 60 


(69.0 bits), Expect = 1.3e-43, P = 1.3e-43 




Identities = 


= 96/218 (44%), Positives = 131/218 (60%) 




Query: 


18 


LVGFLSVIFALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVFI QGI AI I VYRLPWTWKC 


77 






L+G V W+ YR G-t- W+ SAL+FN HP+ MV G VF+QG A++VYR+ 




Sbjct: 


23 


LLGLTVVAMTGAWLGMYRGGIAWE-SALQFNVHPLCMVIGLVFLQGDALLVYRV--FRNE 


19 


Query: 


78 


SKLLMKSIHAGLNAVAAILAIISVVAVFENHNVNNIANMYSLHSWVGLIAVICYLLQLLS 


137 






+K K +H L+ A ++A++ +VAVFE+H A++YSLHSW G++ + Q L 




Sbjct: 


80 


AKRTTKVLHGLLHVFAFVIALVGLVAVFEHHRKKGYADLYSLHSWCGILVFALFFAQWLV 


139 


Query : 


138 


GFSVFLLPWAPLSLRAFLMPIHVYSGI VI FGTVI ATALMGLTEKLI FSLRDPAYSTFPPE 


197 






GFS FL P A SLR+ P HV+ G IF +ATAL+GL E L+F L YSTF PE 




Sbjct : 


140 


GFSFFLFPGASFSLRSRYRPQHVFFGAAIFLLSVATALLGLKEALLFEL-GTKYSTFEPE 


198 


Query: 


198 


GVFVNTLGLLILVFGALIFWI VTRPQWKRPKEPNSTIL 2 35 








GV N LGLL+ F ++ +I+TR WKRP + L 




Sbjct: 


199 


GVLANVLGLLLAAFATVVL YILTRADWKRPT QAEEQAT. 23 6 





= 2.7e-43 
= 1, Score = 

= 1.2e-42 



Pedant information for DKFZphf br2_7e22 , frame 2 



Report for DKFZphf br2_7e22. 2 



[LENGTH] 28 6 

[MW] 31638.58 

[pi] 9.12 

[ HOMOL] SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561). 4e-40 

[PIRKW] transmembrane protein 9e-40 

[KW] SIGNAL_PEPTIDE 40 

[KW] TRANSMEMBRANE 5 

[KW] LOW_COMPLEXITY 4 . 90 % 



SEQ MAMEGYRRFLALLGSALLVGFLSVIFALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVF 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhcchhhhhhhhhccccccccccccccccchhhhhhhhh 

MEM MMMMMMMMMMMM 

SEQ IQGIAIIVYRLPWTWKCSKLLMKSIHAGLNAVAAILAIISVVAVFENHNVNNIANMYSLH 

SEG xxxxxxxxxxxxxx 

PRD ccccceeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeecc 

MEM MMMMMMMMMMMMMM MMMMMMMMKMMMMMMMMMMMMMMMM 

SEQ SWVGLIAVICYLLQLLSGFSVFLLPWAPLSLRAFLMPIHVYSGIVIFGTVIATALMGLTE 

SEG 

PRD cccchhhhhhhhhhhhhhheeeeccccccccccccccceeeeeeeeeeehhhhhhhhhhh 

MEM . . . .MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 

SEQ KLIFSLRDPAYSTFPPEGVFVNTLGLLILVFGALIFWIVTRPQWKRPKEPNSTILHPNGG 

SEG 

PRD hhhhhhhccccccccccchhhhhhhhhhhhhhhheeeeeecccccccccccccccccccc 

MEM MMMMM^«1MMMMMMMMMMMI*IKMMMI4KM^IM 

SEQ TEQGARGSMPAYSGNNMDKSDSELNNEVAARKRNLALDEAGQRSTM 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccc 

MEM 



(No Prosite data available for DKFZphf br2_7e22 . 2 ) 
(No Pfam data available for DKFZphfbr2_7e22 . 2) 



342 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_7j4 



group: brain derived 

DKFZphfbr2_7 j4 encodes a novel 233 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, 1 EST hit 

Sequenced by GBF 

Locus : unknown 

Insert length: 1050 bp 

Poly A stretch at pos . 1027, polyadenylation signal at pos . 1007 



1 GGGGACACAA AGGGGTGGTC ACCCTGCCCT CACCTTGACC TGTAAGTTGC 

51 CTAGGACAGT GGCCTGGTCC CAGGGGCTGT TGTGGGGAGT TGAAGAACAC 

101 CCTGGCCTCC TCCATCATGT CGGCCAAGAG GGC AGAATTG AAGAAAACAC 

151 ATCTGTGCAA GAACTACAAG GCAGTTTGCC TGGAATTGAA GCCAGAGCCG 

201 ACCAAAACAT TTGATTACAA AGCAGTTAAA CAAGAAGGGC GGTTTACCAA 

251 AGCAGGAGTG ACACAGGACC TAAAGAATGA ACTCAGGGAA GTGAGAGAAG 

301 AGCTCAAGGA GAAAATGGAG GAGATAAAAC AGATAAAGGA TCTAATGGAC 

351 AAGGATTTTG ATAAACTTCA CGAATTTGTG GAAATTATGA AGGAAATGCA 

4 01 GAAAGATATG GATGAGAAGA TGGACATTTT AATAAATACA CAGAAGAACT 

4 51 ATAAGCTTCC CCTTAGAAGA GCACC.AAAGG AGCAGCAGGA ACTCAGGCTG 

501 ATGGGAAAGA CTCACAGAGA ACCACAGCTC AGGCCCAAGA AAATGGATGG 

551 AGCCAGTGGA GTCAATGGAG CACCCTGTGC TCTTCACAAG AAGACGATGG 

601 CACCACAAAA AACAAAACAG GGCTCACTGG ATCCCCTTCA TCACTGTGGG 

651 ACCTGCTGCG AGAAATGTTT GTTGTGTGCT CTAAAGAACA ACTACAATCG 

701 GGGGAACATT CCTTCAGAGG CCTCAGGCCT TTACAAAGGT GGAGAGGAGC 

751 CAGTGACCAC CCAACCTTCT GTGGGCCACG CTGTGCCTGC CCCAAAGTCC 

801 CAGACTGAGG GAAGGTGAAG CTTAACTGCC AGCTTGAAAT GAGAGTAAAG 

851 AAGATACAGA GCAAACAGTG TTTCAGAAAC TGTCCTGCCC TGGGTGTGAT 

901 TCTTTGGCTT CAATTTGAAG GAGGAGGAAT GATGGGATTT CATATTTTAT 

951 TTCACACCAG TTCCTCCTTG TTTCATCTCT TTGCTAAGCT GGCTGCTTCT 

1001 ACCATCTAAT AAATAATTGG CCAAGTTAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 117 bp to 815 bp; peptide length: 233 
Category: putative protein 



1 MSAKRAELKK THLCKNYKAV CLELKPEPTK TFDYKAVKQE GRFTKAGVTQ 
51 DLKNELREVR EELKEKMEEI KQIKDLMDKD FDKLHEFVEI MKEMQKDMDE 
101 KMDILINTQK NYKLPLRRAP KEQQELRLMG KTHREPQLRP KKMDGASGVN 
151 GAPCALHKKT MAPQKTKQGS LDPLHHCGTC CEKCLLCALK WNYNRGNI PS 
201 EASGLYKGGE EPVTTQPSVG HAVPAPKSQT EGR 

BLASTP hits 

Entry JC2223 from database PIR: 

major surface glycoprotein 3 - Pneumocystis carinii (fragment) 
Score = 109, P = 3.5e-04, identities = 41/136, positives = 67/136 
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Alert BLASTP hits for DKFZphfbr2_7j4, frame 3 

TREMBLNEW: PCP115C_1 product: "P115C"; Pneumocystis carinii mRNA for 
P115C, partial sequence., N — 1, Score = 109, P = 0.00024 



>TREMBLNEW:PCP115C_1 product: "P115C" 
partial sequence. 

Length = 196 

HSPs: 



Score = 109 (16.4 bits), Expect = 2.4e-04, P = 2.4e-04 
Identities = 41/134 (30%), Positives = 67/134 (50%) 



Pneumocystis carinii mRNA for P115C, 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



14 CKN-YKAVCLELKPEPTKTFDYKAVKQEGRFTKA-GVTQDLKNELREVREELKEKMEEIK 71 

CK K C ELK + K VK+ TK G ++LK+++++ E KE++E K 

22 CKTELKKYCEELKEADGLKVNDK-VKEICDDTKRDGKCKELKDKVKKELETFKEELE— K 7 8 

7 2 QI KDLMDKDFDKLHEFVE IMKEMQKDMDEKMDILINTQKN YKLPLRRAPKEQQELRLMGK 131 

+KD+ D++ +K E +++E D D K + + + YKL +R E LR +GK 
7 9 ALKDIKDENCEKYEEKCILLEETNHD-DVKKNCVKLREGC YKLKRKRVA-EDLLLRALGK 136 

132 THREPQLRPKKMDGAS 147 

+ + K D S 
137 DVKNGECEKKMKDVCS 152 



Pedant information for DKFZphfbr2_7j4, frame 3 
Report for DKFZphfbr2_7 j4 . 3 



[LENGTH] 

[MW] 

[pi] 

[PROSITE) 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 

[KW] 



233 

26533.95 
9.18 

MYRISTYL 3 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW_COMPLEXITY 

COILED COIL 



14.59 % 
13.73 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MSAKRAELKKTHLCKNYKAVCLELKPEPTKTFDYKAVKQEGRFTKAGVTQDLKNELREVR 

xxxxxxxxx 

ccchhhhhhhhhhccchhhhhhhcccccccccccceeecccccccccccchhhhhhhhhh 

CCCCCCCCCCCC 

EELKEKMEEIKQIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLPLRRAP 

xxxxxxxxx xxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhchhhhhhhhhcccccccccccc 
CCCCCCCCCCCCCCCCCCCC 

KEQQELRLMGKTHREPQLRPKKMDGASGVNGAPCALHKKTMAPQKTKQGSLDPLHHCGTC 

hhhhhhhhhccccccccccccccccccccccccchhhhhhcccccccccccccccccccc 

CEKCLLCALKNNYNRGNIPSEASGLYKGGEEPVTTQPSVGHAVPAPKSQTEGR 
chhhhhhhccccccccccccccccccccccccccccccccccccccccccccc 









Prosite for 


DKFZphfbr2 


7j4.3 


PS00005 




2->5 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


108 


->111 


PKC PHOSPHO] 


SITE 


PDOC00005 


PS00005 


132 


->135 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00006 


132 


->136 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


179 


->183 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


228 


->232 


CK2 PHOSPHO] 


SITE 


PDOC00006 


PS00008 


151- 


->157 


MYRISTYL 




PDOC00008 


PS00008 


196 


->202 


MYRISTYL 




PDOC00008 


PS00008 


204 


->210 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_7 j4 . 3) 
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DKFZphfbr2_82c20 



group: transmembrane protein 

DKFZphfbr2_82c20 encodes a novel 492 amino acid protein with very weak similarity to C. 
elegans cosmid D1007. 

The novel protein contains 7 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C. elegans D1007.5 ; 
membrane regions: 7 

Summary DKFZphf br2_82c20 encodes a novel 492 amino acid protein with 
similarity to a hypothetical C.elegans protein. 



similarity to C.elegans D1007.5 

complete cDNA (Bp 1-100 GC ritch), complete cds, 
potential start at Bp 128 matches Kozak consensus PyNNatgG, 
EST hits, localisation? primer B of STS doesn't match perfect! 
TRANSMEMBRANE 7 

Seguenced by DKFZ 

Locus: /map="109.9 cR from top of Chrl linkage group"??? 
Insert length: 1804 bp 

Poly A stretch at pos . 1794, no polyadenylation signal found 



1 CGGCGGGAGC GCGCGGCTGA TACCCGGGAC TGGGCTGCGG CGGTTAGTCC 
51 TCTCCCGGCC GCCGTCGCCT CCGACATATT GCTCGCAGGA GCTGCGGCGG 
101 CGAAGCGGAG AGCACCGGGG GGAGGAGATG GGAGGACGAA GAGGTCCCAA 
151 CAGGACATCT TACTGTCGAA ATCCGCTCTG TGAGCCGGGA TCCTCGGGGG 
201 GCTCTAGTGG AAGCCACACT TCCAGTGCAT CGGTGACCAG TGTTCGTTCC 
251 CGCACCAGGA GCAGTTCTGG AACAGGCCTC TCCAGCCCTC CTCTGGCCAC 
301 CCAAACTGTT GTGCCTCTAC AGCACTGCAA GATCCCCGAG CTGCCAGTCC 
351 AGGCCAGCAT TCTGTTTGAG TTGCAGCTCT TCTTCTGCCA GCTCATAGCA 
401 CTCTTCGTCC ACTACATCAA CATCTACAAG ACAGTGTGGT GGTATCCACC 
4 51 TTCCCACCCA CCCTCCCACA CCTCCCTGAA CTTCCATCTG ATCGACTTCA 
501 ACTTGCTGAT GGTGACCACC ATCGTTCTGG GCCCCCGCTT CATTGGGTCC 
551 ATCGTGAAGG AGGCCTCTCA GAGGGGGAAG GTCTCCCTCT TTCGCTCCAT 
601 CCTGCTGTTC CTCACTCGCT TCACCGTTCT CACGGCAACA GGCTGGAGTC 
651 TGTGCCGATC CCTCATCCAC CTCTTCAGGA CCTACTCCTT CCTGAACCTC 
7 01 CTGTTCCTCT GCTATCCGTT TGGGATGTAC ATTCCGTTCC TGCAGCTGAA 

7 51 TTGCGACCTC CGCAAGACAA GCCTCTTCAA CCACATGGCC TCCATGGGGC 

8 01 CCCGGGAGGC GGTCAGTGGC CTGGCAAAGA GCCGGGACTA CCTCCTGACA 
851 CTGCGGGAGA CGTGGAAGCA GCACACAAGA CAGCTGTATG GCCCGGACGC 
901 CATGCCCACC CATGCCTGCT GCCTGTCACC CAGCCTCATC CGCAGTGAGG 
951 TGGAGTTCCT CAAGATGGAC TTCAACTGGC GCATGAAGGA AGTGCTCGTC 

1001 AGCTCCATGC TGAGCGCCTA CTATGTGGCC TTTGTGCCTG TCTGGTTCGT 
1051 GAAGAACACA CATTACTATG ACAAGCGCTG GTCCTGTGAA CTCTTCCTGC 
1101 TGGTGTCCAT CAGCACCTCC GTGATCCTCA TGCAGCACCT GCTGCCTGCC 
1151 AGCTACTGTG ACCTGCTGCA CAAGGCCGCC GCCCATCTGG GCTGTTGGCA 
1201 GAAGGTGGAC CCAGCGCTGT GCTCCAACGT GCTGCAGCAC CCGTGGACTG 
12 51 AAGAATGCAT GTGGCCGCAG GGCGTGCTGG TGAAGCACAG CAAGAACGTC 
1301 TACAAAGCCG TAGGCCACTA CAACGTGGCT ATCCCCTCTG ACGTCTCCCA 
1351 CTTCCGCTTC CATTTCTTTT TCAGCAAACC TCTGCGGATC CTCAACATCC 
1401 TCCTGCTGCT GGAGGGCGCT GTCATTGTCT ATCAGCTGTA CTCCCTAATG 

14 51 TCCTCTGAAA AGTGGCACCA GACCATCTCG CTGGCCCTCA TCCTCTTCAG 
1501 CAACTACTAT GCCTTCTTCA AGCTGCTCCG GGACCGCTTG GTATTGGGCA 

15 51 AGGCCTACTC ATACTCTGCT AGCCCCCAGA GAGACCTGGA CCACCGTTTC 
1601 TCCTGAGCCC TGGGGTCACC TCAGGGACAG CGTCCAGGCT TCAGCCAAGG 
1651 GCTCCCTGGC AAGGGGCTGT TGGGTAGAAG TGGTGGTGGG GGGGACAAAA 
1701 GACAAAAAAA TCCACCAGAG CTTTGTATTT TTGTTACGTA CTGTTTCTTT 
17 51 GATAATTGAT GTGATAAGGA AAAAAGTCCT ATTTTTATAC TCCCAAAAAA 
1801 AAAA 



BLAST Results 



Entry HS285343 from database EMBL: 
human STS WI-17488 . 
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Score = 1225, P = 1.3e-50, identities = 263/281 



Medline entries 



No Medline entry 



Peptide information for frame 2 



1 MGGRRGPNRT SYCRNPLCEP GSSGGSSGSH TSSASVTSVR SRTRSSSGTG 

51 LSSPPLATQT VVPLQHCKIP ELPVQASILF ELQLFFCQLI ALFVHYINI Y 

101 KTVWWYPPSH PPSHTSLNFH LIDFNLLMVT TIVLGRRFIG SIVKEASQRG 

151 KVSLFRSILL FLTRFTVLTA TGWSLCRSLI HLFRTYSFLN LLFLCYPFGM 

201 YIPFLQLNCD LRKTSLFNHM ASMGPRBAVS GLAKSRDYLL TLRETWKQHT 

251 RQLYGPDAMP THACCLSPSL IRSEVEFLKM DFNWRMKEVL VSSMLSAYYV 

301 AFVPVWFVKN THYYDKRWSC ELFLLVSIST SVILMQHLLP AS YCDLLHKA 

351 AAHLGCWQKV DPALCSNVLQ HPWTEECMWP QGVLVKHSKN VYKAVGHYNV 

401 AIPSDVSHFR FHFFFSKPLR ILNILLLLEG AVIVYQLYSL MSSEKWHQTI 

451 SLALILFSNY YAFFKLLRDR LVLGKAYSYS ASPQRDLDHR FS 

ORF from 128 bp to 1603 bp; peptide length: 492 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZIPPER (210-232) 
LEUCINE_ZIPPER (210-232) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_82c20, frame 2 

TREMBL : CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid 
D1007., N = 2, Score = 247, P = 4 . 6e-29 



>TREMBL:CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid D1007. 
Length = 512 

HSPs: 



Score 


= 247 


(37.1 bits), Expect = 4.6e-29, Sum P(2) = 4.6e-29 




Identities : 


= 58/204 (28%), Positives = 102/204 (50%) 




Query: 


291 


VSSMLS AY YVAFVPVWFVKNTHYYDKRWSCELFLLVS I STS VI LMQHLLPAS YCDLLHKA 


350 






+S ML +V F + ++ W C+L ++V ++ + + +L P +Y DLLH+A 




Sbjct: 


299 


LSIMLPCIFVPFKTSQGIPQKILINEVWECQLAIVVGLTAFSLYVAYLSPLNYLDLLHRA 


358 


Query: 


351 


AAHLGCWQKVD-PAL CSNVLQHPWTEECKWPQGVLVKHSKN- VYKAVGHYNV 


400 






A HLG W +++ P + + PW+E C++ G V+ Y+A ++ 




Sbjct: 


359 


AIHLGSWHQIEGPRIGHTGSMSSAPTPWSEFCLYNDGETVQMPDGRCYRAKSSNSIRTVA 


418 


Query: 


401 


AI PS DVSHFRFHFFFSKPLRILNILLLLEGAVIVYQLYSLMSSEKWHQT I SLALILFSNY 


460 






A P H F KP ++NI+ E +1 Q + L+ + W ++ L++F+NY 




Sbjct: 


419 


AHPESSRHNTFFKVLRKPNNLINIMCSFEFLLI FIQFWMLVLTNDWQHI VTFVLLMFANY 


478 


Query : 


461 


YAFFKLLRDRLVLGKAYSYSASPQRDL 487 








F KL +D+++L + Y S Q DL 




Sbjct: 


479 


LLFAKLFKDKI ILSRI YEPS QEDL 502 




Score 


= 178 


(26.7 bits), Expect = 4.3e-21, Sum P(2) = 4.3e-21 




Identities = 


= 50/179 (27%), Positives = 90/179 (50%) 




Query: 


262 


HACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYVAFVPVWFV— KNTHYYDKR-- 


317 






H C SP+ IR E++ L D R+K+ + + + +A+ +P FV K + ++ 




Sbjct: 


262 


HMCSDSPAQIREEIQVLIDDLVLRVKKSIFAGVSTAFLSIMLPCIFVPFKTSQGIPQKIL 


321 


Query: 


318 


WSCELFLLVSISTSVILMQH LLPAS YC DLLHKAAAHLGCWQK VD - PAL CSNV 


368 






W C+L ++V ++ + + +L P +Y DLLH+AA HLG W +++ P + + 




Sbjct: 


322 


INEVWECQLAI VVGLTAFSLYVAYLSPLNYLDLLHRAAIHLGSWHQIEGPRIGHTGSMSS 


381 


Query: 


369 


LQHPWTEECMWPQGVLVKHSKN-VYKAVGHYNV-AIPSDVSHFRFHFFFSKPLRILNILL 


426 






PW+E C++ G V+ Y+A ++ + + R t FF K LR N L+ 




Sbjct: 


382 


APTPWSEFCLYNDGETVQNPDGRCYRAKSSNSIRTVAAHPESSRHNTFF-KVLRKPNNLI 


440 
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Score = 146 (21.9 bits), Expect = 4.6e-29, Sum P(2) = 4.6e-29 
Identities = 34/86 (39%), Positives = 50/86 (58%) 

Query: 52 SSPPLATQTVVPLQHCKIPELP-VQASILFELQLFFCQLIALFVHYINI YKTVWWYPPSH 110 

+S P A+ + + H P++ Q + FE LF ++ALF+ Y+NIYKT+WW P S+ 
Sbjct: 19 ASI PRASGVTLSV-HPIWPDIQFTQGELFFECTLFLYSVLALFLQYLNIYKTLWWLPKSY 77 

Query: 111 PPSHTSLNFHLIDFNLLMVTTIVLGRR 137 

H SL FHLI+ L ++LG R 

Sbjct: 78 --WHYSLKFHLINPYFLSCVGLLLGWR 102 

Score = 39 (5.9 bits), Expect = 6.8e-18, Sum P(2) = 6.8e-18 
Identities = 12/41 (29%), Positives = 20/41 (48%) 

Query: 154 LFRSILLFLTRFTVLTATGWSLCRSLIHLFRTYSFLNLLFL 194 

L+ + LFL ++ + T W L +S H + +N FL 
Sbjct: 53 LYSVLALFL-QYLNI YKTLWWLPKSYWHYSLKFHLINPYFL 92 



Pedant information for DKFZphfbr2_82c20, frame 2 



Report for DKFZphf br2_82c20 . 2 



[LENGTH] 


4 92 




[MW] 


5S274 . 05 




tpl] 


9.51 




[HOMOL] 


TREMBL : CEAF31 5 1_8 


gene: "D1007.5"; Caenorhabditis elegans cosmid D1007. 4e-31 


[PROSITE] 


LEUCINE ZIPPER 1 




[PROSITE] 


AMI DAT I ON 2 




[ PROSITE] 


MYRISTYL 5 




[PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


GLYCOSAMINOGLYCAN 


1 


[PROSITE] 


PKC PHOSPHO SITE 


5 


[ PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


TRANSMEMBRANE 1 




[KW] 


LOW COMPLEXITY 


8.74 % 



SEQ MGGRRGPNRTSYCRNPLCEPGSSGGSSGSHTSSASVTSVRSRTRSSSGTGLSSPPLATQT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccccccccccccee 

MEM 

SEQ VVPLQHCKIPELPVQASILFELQLFFCQLIALFVHYINIYKTVWWYPPSHPPSHTSLNFH 

SEG 

PRD eeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMM 

SEQ LIDFNLLMVTTI VLGRRFIGSIVKEASQRGKVSLFRSILLFLTRFTVLTATGWSLCRSLI 

SEG 

PRD eeehhhhhhhhhhhhheeeehhhhhhhcccchhhhhhhhhhhhhhhhhhcccchhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ HLFRTYSFLNLLFLCYEFGMYIPFLQLNCDLRKTSLFNHMASMGPREAVSGLAKSRDYLL 

SEG 

PRD hhhhhhhhheeeeeeecccccceeeeccccchhhhhhhhhhccchhhhhhhhhhhhhhhh 

MEM 



SEQ TLRETWKQHTRQLYGPDAMPTHACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYV 

SEG 

PRD hhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhcchhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ AFVPVWFVKNTHYYDKRWSCELFLLVSISTSVILMQHLLPASYCDLLHKAAAHLGCWQKV 

SEG 

PRD heeeeeeeeccccccchhhhhhhhhhhcchhhhhhhhhhccchhhhhhhhhhhhhhhccc 

MEM MMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DPALCSNVLQHPWTEECMWPQGVLVKHSKNVYKAVGHYNVAIPSDVSHFRFHFFFSKPLR 

SEG xx 

PRD ccccccccccccccceeecccceeeeeccceeeeccccccccccccccceeeeeecccch 

MEM MMMMMMMMMM 

SEQ ILNILLLLEGAVIVYQLYSLMSSEKWHQTISLALILFSNYYAFFKLLRDRLVLGKAYSYS 

SEG xxxxxxxx 

PRD hhhhhhhhhhheeeeehhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMMMM MMMMMMMMMMMXMMMMMMMMM 
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SEQ ASPQRDLDHRFS 

SEG 

PRD ccchhhhhhccc 

MEM 



Prosite for DKFZphfbr2_82c20 . 2 



psooooi 


8 


->12 


ASN GLYCOSYLATION 


PDOCU U U U 1 


rbUUUU2 




->01 


tjlilCUbAMlNOGLYCAN 


JrUULUUUUii 


PS00004 


212- 


>216 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


316- 


>320 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


38 


->41 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


147- 


>150 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


241- 


>244 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


245- 


>248 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


443- 


>446 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


241- 


>245 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


273- 


>277 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


342- 


>346 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


21 


->27 


MYRISTYL 


PDOC00008 


PS000D8 


24 


->30 


MYRI ST YL 


PDOC00008 


PS00008 


28 


->34 


MYRISTYL 


PDOC00008 


PS00008 


48 


->54 


MYRISTYL 


PDOC00008 


PS00008 


231- 


>237 


MYRISTYL 


PDOC00003 


PS00009 




2->6 


AMI DAT I ON 


PDOC00009 


PS00009 


134- 


>138 


AMI DAT I ON 


PDOC00009 


PS00029 


168- 


>190 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for □KFZphfbr2_82c20 . 2) 
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DKFZphfbr2_82el7 



group: transmembrane protein 

DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with very weak similarity to C. 
elegans cosmid R01B10. 

The novel protein contains 6 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C. elegans "R01B10.5" ; 
membrane regions: 6 

Summary DKFZphf br2_82el7 encodes a novel 311 amino acid protein with 
similarity to a hypothetical C. elegans protein. 



similarity to C. elegans "R01B10.5 M 

complete cDNA, EST HS763158 extendes the sequence, complete cds, EST 
hits 

six potential transmembrane domains 
Sequenced by DKFZ 

Locus: /map-"779_C_?; 813_A_1; 877_C_1; 734_C_12; 760_E_11; 171.7 cR from top of Chrl4 linkage 
group" 

Insert length: 1618 bp 

Poly A stretch at pos . 1608, polyadenylation signal at pos . 1588 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



CTGATCTAGT 
TATTCAACCA 
AAAATGGCTC 
CAGAGAACGA 
ACTTTATGAT 
TACATTGGTT 
CTTTTCCAAC 
CACCTTACTT 
GAGTATTGAT 
GATTACGTTA 
TACCATTGTA 
TCCGACCTCT 
CGATTTAAAA 
GCTTCAGGCA 
TATTAGTGTT 
ATAGAGAACT 
CTTCAGCCAC 
TGGATAAACT 
GCCCTTTTTT 
CTCAGAAGGA 
CCTGAGAAGT 
GAACTATTCT 
TGGGTTTGTC 
CAAAACTCTG 
GAAGGCCGCT 
GGGCAAAATC 
GAAAAGCTTT 
ACGTACCTGT 
ACAAACATAC 
TTTTTTGTTA 
AAAATAGATA 
AGATTTGTTT 
ACCTACATAA 



GCTTCTCGAA 
GCATGCCTTG 
AACTGAAATA 
ATGCACAGAA 
TGGCTCTATC 
CTTCATTGAA 
ACATCACTGC 
GTGAGTGATC 
GCTTTCTGAC 
CCACAGTACA 
TTTATCTATT 
TCTGGTGAAG 
GTATTTATGC 
GTTGGTGGAG 
ATCTTTGGTT 
GCTATGATCT 
TGGTTACTTC 
TGAGCAAGAT 
ACTTGTTCAC 
GCCAATGGAC 
GCTCCTAATA 
ATCATATATG 
TTTGTTTTGT 
TAATACTCTG 
AGGAAGCCCT 
ATGTTTCTGT 
AACACGTGTA 
TGTGTTTCAG 
TTGTGGGGTC 
TCTATTTATT 
ATAATTTATA 
GGATTATTGT 
AAAAAAAA 



AAAAACCTTC 
GACTTTATTG 
TATGGAGAAT 
ATATTGTCAG 
TTGGATTTAT 
TGGTACTCGG 
ATTATTTGAA 
CAGTTGGTGT 
TGGTACACGA 
CTGTACTCAT 
ACGCATTCTG 
AAGATTGCAT 
TGCACTTTAC 
GCCTTTTATA 
ACTCTGGCTG 
TCTGGTCAGA 
ATGCCTATGG 
TTGGGGCTT1' 
TGCAAAATTT 
ACTGAGTGTA 
AAAAAGTAAA 
GGAACAAGAT 
TTATGGTTAG 
TTACACAGGG 
TGCTTCTCTC 
GTACCTAGCA 
ATCTGCAGTC 
TTTGTTTTTC 
TGATAGCAAA 
TTCATCAATA 
TAACAGGTTT 
TCCTGTAAAG 



AGGCGGCCCA 
TGGGAAGACC 
GTGGGGTATG 
CCTTGCACAG 
GGCAATGCTT 
GGAAAAAGAG 
TGCAGCATGG 
TCTTTATATT 
TGCTTTACAA 
GAAGCCGTCT 
CTTGGTATTA 
GTGGGTTAGG 
TTCTTCCCAA 
TTACGCCTTC 
TGTACATGTC 
AAGAAAAGAC 
AATAATCTCC 
TGGCTTTGGT 
ACCGAACCTT 
GACATGTGAA 
TCAATCTTAA 
TGTCAGTATA 
ACTTACAGAC 
TAATATTATC 
AACAGTTCAG 
ATGTGTTCCC 
CTTAACAGTG 
ACCTATAATG 
CATAGAAATG 
CAGTATTTTG 
TCTGTTTATA 
AAAACAATAA 



TGGCTGTCGA 
CTATTATTTA 
CCCAAGAGGA 
AATGTCCTGA 
CCTCTGCTTT 
TTCCAGCGCA 
CAGCTATTAT 
CGTTCATGTC 
CCCAAGTCCA 
ACCCACTATA 
ATGATGCTGC 
GAAATCTGAT 
TTTTAACCGT 
CCATACATTA 
TGCTTCTGAA 
TTATTGTTCT 
ATTTCCAGAG 
ACCTACACCA 
CAAGGATACT 
ATGCCAAAAA 
CAGTGTATGA 
TCTTAATGTT 
TTGGAAAATG 
TGCTACACTG 
CTGTTCTTTA 
ATTTTATTAA 
GCGTAATTGT 
AATTGTAAAA 
ATGTATATTG 
ATGTATTGCA 
GATTGGTTCA 
TAAAAAGCTT 



BLAST Results 



Entry HS981146 from database EMBL: 
human STS WI-6253. 
Length = 208 
Minus Strand HSPs: 

Score - 1040 (156.0 bits), Expect = 1.9e-40, P = 1.9e-40 
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Identities = 208/208 (100%), Positives = 208/208 (100%), Strand = Minus 
/ Plus 

Entry HSG20716 from database EMBL: 
human STS A006D06. • 
Length = 195 
Minus Strand HSPs: 

Score = 975 (146.3 bits), Expect = 1.8e-37, P = 1.8e-37 

Identities = 195/195 (100%), Positives = 195/195 (100%), Strand = Minus 

/ Plus 



Medline entries 



No Medline entry 



Peptide information for frame 1 



1 MAVDIQPACL GLYCGKTLLF KNGSTEI YGE CGVCPRGQRT NAQKYCQPCT 

51 ESPELYDWLY LGFMAMLPLV LHWFFIEWYS GKKSSSALFQ HITALFECSM 

101 AAI ITLLVSD PVGVLYIRSC RVLMLSDWYT MLYNPSPDYV TTVHCTHEAV 

151 YPLYTIVFIY YAFCLVLMML LRPLLVKKI A CGLGKSDRFK SIYAALYFFP 

201 ILTVLQAVGG GLLYYAFPYI ILVLSLVTLA VYMSASEIEN CYDLLVRKKR 

251 LIVLFSHWLL HAYGIISISR VDKLEQDLPL LALVPTPALF YLFTAKFTEP 
301 SRILSEGANG H 



ORF from 40 bp to 972 bp; peptide length: 311 
Category: similarity to unknown protein 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82el7 , frame 1 

TREMBL:AF068718_5 gene: "R01B10.5"; Caenorhabditis elegans cosmid 
R01B10., N = 1, Score = 399, P = 1.4e-36 



>TREMBL:AF068718_5 gene: "R01B10.5"; Caenorhabditis elegans cosmid R01B10. 
Length = 670 

HSPs: 

Score = 399 (59.9 bits), Expect = 1.4e-36, P = 1.4e-36 
Identities = 95/280 (33%), Positives = 152/280 (54%) 

Query: 2 AVDIQPACLGLYCGKTLLFKN GSTEI YGECGVCPRGQRTNAQKYCQPC 4 9 

A IQP+CLG +CG+T+L N GST + CG C G R NA C+ C 

Sbjct: 292 ASTIQPSCLG-FCGRTVLVGNYSEDVEATTTAAGSTSL-SRCGPCSFGYRNNAMSICESC 349 

Query: 50 TESPELYDWLYLGFMAMLPLVLHWFFIEWYSGKKSSSALFQ HITALFECSMAAIITL 106 

+■ YDW+YL F+A+LPL+LH FI + K + ++ ++ + E +A +1 + 
Sbjct: 350 DTPLQPYDWMYLLFIALLPLLLHMQFIR-IARKYCRTRYYEVSEYLCVILENVIACVIAV 408 

Query: 107 LVSDPVGVLYIRSCRVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTI VFI YYAFCLV 166 

L+ P ++ C + +WY YNP Y T+ CT+E V+PLY+I FI++ + 

Sbjct: 409 LI YPPRFTFFLNGCSKTDIKEWYPACYNPRIGYTKTMRCTYEVVFPLYSITFIHHLILIG 468 

Query: 167 LMML LRPLLVKKI ACGLGKSDRFKS I YAALYFFPILTVLQAVGGGLLYYAFPYI I LVLSL 226 

+++LR L + L K+ K YAA+ PIL V+ AV G+++Y FPYI+L+ SL 
Sbjct: 469 S I LVLRSTLYCVL LYKTYNGKPFYAAIVSVPILAVIHAVLSGVVFYTFPYILLIGSL 525 

Query: 227 VTLAVYMSASEIENCYDLLVR KKRLI VLFSHWLLHAYGI I S I 268 

+ +++ +++VR LI L L+ ++G+I+I 

SbjCt: 526 WAMCFHLALEGKRPLKEMIVRIATSPTHLIFLSITMLMLSFGVIAI 571 



Pedant information for DKFZphfbr2_82el7, frame 1 



Report for DKFZphfbr2_82el7 . 1 
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[LENGTH] 311 

[MW] 35239.14 

[pi] 7.91 

[HOMOL] TREMBL:AF068718_5 gene: "R01B10.5"; Caenorhabditis elegans cosraid R01B10. 9e-36 

[PROSITE] AMIDATION 1 

[PROSITE] MYRI ST YL 3 

[ PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATION 1 

[KM] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 7.72 % 



SEQ MAVDIQPACLGLYCGKTLLFKNGSTEIYGECGVCPRGQRTNAQKYCQPCTESPELYDWLY 

SEG 

prd cccccccccccccccceeeeccccceeecccccccccccccceeecccccccccchhhhh 

MEM MMMMMM 

SEQ LGFMAMLPLVLHWFFIEWYSGKKSSSALFQHITALFECSMAAIITLLVSDPVGVLYIRSC 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeece 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ RVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFI YYAFCLVLMMLLRPLLVKKI A 

SEG xxxxxxxxxxxx . . . . 

PRD eeeeecceeeeecccccceeeeeeeceeeeeeeeceeeeehhhhhhhhhhhhhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM. . . 

SEQ CGLGKSDRFKS I YAALYFFPILTVLQAVGGGLLYYAFPYIILVLSLVTLAVYMSASEIEN 

SEG 

PRD eecccccchhhhhhhhhhhccccccccccccceeeecceeeeehhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ CYDLLVRKKRLI VLFSHWLLHAYGI I SI SRVDKLEQDLPLLALVPTPALFYLFTAKFTEP 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhcccceeeechhhhhhceeeeeecccceeeeeeeccccc 

MEM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM 

SEQ SRILSEGANGH 

SEG 

PRD ceeeeeccccc 

MEM MM 



Prosite for DKFZphf br2_82el7 . 1 



PS00001 


22->26 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


82->86 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


80->83 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


119->122 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


186->189 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


294->297 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


234->238 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


236->240 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


269->273 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


11->17 


MYRI STYL 




PDOC00008 


PS00008 


37->43 


MYRI STYL 




PDOC00008 


PS00008 


182->188 


MYRISTYL 




PDOC00008 


PS00009 


80->84 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphf br2_82el7 . 1) 



351 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_82e4 



group: signal transduction 

DKFZphfbr2_82e4 encodes a novel 473 amino acid protein with strong similarity to the 
calmodulin-binding proteins. 

The novel protein is similar to human and rat Ca2+/calmodulin-dependent protein kinase (EC 
2.7.1.123), rat calmodulin-binding protein, calmodulin binding protein kinase of Fugu rupie 
and Rattus norvegicus calcium/calmodulin-dependent protein kinase I. Calmodulin is the 
archetype of the family of calcium-modulated proteins of which nearly 20 members have been 
found. Calmodulin is involved in regulation of growth and cell cycle as well as in signal 
transduction and the synthesis and release of neurotransmitters. The novel protein seems to 
involved in calmodulin-mediated pathways in human neuronal cells. 

The new protein can find clinical application in modulating/blocking calmodulin-mediated 
pathways in human neuronal cells. 



strong similarity to calmodulin-binding proteins 

complete cDNA, complete cds, EST hits 
splice variant in comparison to rat 156542 
ESTs HSZZ54543/HS1141907 define splice variant 
see also DKFZphf br2_82g20 unspliced form 

Sequenced by DKFZ 

Locus: /map="200.5 cR from top of Chr3 linkage group" 
Insert length: 2923 bp 

Poly A stretch at pos. 2913, polyadenylation signal at pos . 2890 



1 ATGCTGGAGG TTCGCTAGCC GAAGCGGCTG CATCTGGCGC CGCGTCTGCC 
51 CCGCGTGCTC GGAGCGGATT CTGCCCGCCG TCCCCGGAGC CCTCGGCGCC 
101 CCGCTGAGCC CGCGATCACT TCCTCCCTGT GACCAACCGG CGCTGCAGGT 
151 TAGAGCCTGG CAATGCCGTT TGGGTGTGTG ACTCTGGGTG ACAAGAAGAA 
201 CTATAACCAG CCATCGGAGG TGACTGACAG ATATGATTTG GGACAGGTCA 
251 TCAAGACTGA GGAGTTTTGT GAAATCTTCC GGGCCAAGGA CAAGACGACA 
301 GGCAAGCTGC ACACCTGCAA GAAGTTCCAG AAGCGGGACG GCCGCAAGGT 
351 GCGGAAAGCT GCCAAGAACG AGATAGGCAT CCTCAAGATG GTGAAGCATC 
4 01 CCAACATCCT ACAGCTGGTG GATGTGTTTG TGACCCGCAA GGAGTACTTT 
451 ATCTTCCTGG AGCTGGCCAC GGGGAGGGAG GTGTTTGACT GGATCCTGGA 
501 CCAGGGCTAC TACTCGGAGC GAGACACAAG CAACGTGGTA CGGCAAGTCC 
551 TGGAGGCCGT GGCCTATTTG CACTCACTCA AGATCGTGCA CAGGAATCTC 
601 AAGCTGGAGA ACCTGGTTTA CTACAACCGG CTGAAGAACT CGAAGATTGT 
651 CATCAGTGAC TTCCATCTGG CTAAGCTAGA AAATGGCCTC ATCAAGGAGC 
701 CCTGTGGGAC CCCCGAGTAT CTGGGCAACC CACCTTTCTA TGAGGAGGTG 

7 51 GAAGAAGATG ATTATGAGAA CCATGATAAG AATCTCTTCC GCAAGATCCT 
801 GGCTGGTGAC TATGAGTTTG ACTCTCCATA TTGGGATGAT ATTTCGCAGG 

8 51 CAGCCAAAGA CCTGGTCACA AGGCTGATGG AGGTGGAGCA AGACCAGCGG 
901 ATCACTGCAG AAGAGGCCAT CTCCCATGAG TGGATTTCTG GCAATGCTGC 
951 TTCTGATAAG AACATCAAGG ATGGTGTCTG TGCCCAGATT GAAAAGAACT 

1001 TTGCCAGGGC CAAGTGGAAG AAGGCTGTCC GAGTGACCAC CCTCATGAAA 
1051 CGGCTCCGGG CACCAGAGCA GTCCAGCACG GCTGCAGCCC AGTCGGCCTC 
1101 AGCCACAGAC ACTGCCACCC CCGGGGCTGC AGGTGGGGCC ACAGCTGCAG 
1151 CTGCGAGTGG AGCTACCTCA GCCCCTGAGG GTGATGCTGC TCGTGCTGCA 
1201 AAGAGTGATA ATGTGGCCCC CGCAGACCGT AGTGCCACCC CAGCCACAGA 
1251 TGGAAGTGCC ACCCCAGCCA CTGATGGCAG TGTCACCCCA GCCACCGATG 
1301 GAAGCATCAC TCCAGCCACT GATGGGAGTG TCACCCCAGC CACTGACAGG 
1351 AGCGCTACTC CAGCCACTGA TGGGAGAGCC ACACCAGCCA CAGAAGAGAG 
14 01 CACTGTGCCC ACCACCCAAA GCAGTGCCAT GCTGGCCACC AAGGCAGCTG 
14 51 CCACCCCTGA GCCGGCTATG GCCCAGCCGG ACAGCACAGC CCCAGAGGGC 
1501 GCCACAGGCC AGGCTCCACC CTCTAGTAAA GGGGAAGAGG CTGCTGGTTA 
1551 TGCCCAGGAG TCTCAAAGGG AGGAGGCCAG CTGAGTAGGC AGCCTGGTGA 
1601 GGGGGGGCAG GGGATGGGCA GGAGGGTGGG AGAGTGGATG AGGGGCTTCT 
1651 CACTGTACAT AGAGTCACTG GCATGATGCC CTCGCTCCCC CATGCCCCCA 
1701 CATCCCAGTG GGGCATAACT AGGGGTCACG GGAGAGCAGT CTCGTCTCCT 
1751 GTGTGTATGT GTGTGAGTGG TGGGCAGGCC AGTGGCAGGG CCGGCCCCAG 
1801 CCCCTGCATG GATTCCTTGT GGCTTTTCTG TCTTTTGCTA GCTTCACCAG 
1851 TTTCTGTTCC TTGTGGGATG CTGCTCTAGG GATACTCAGG GGGCTCCTGC 
1901 TCTCCTTCCC CTTCCCTTCT TGCCTCACCA TTCCCCTAGG CAGGCCCTGC 
1951 AGGTCCCACA CTCTCCCAGG CCCTAAACTT GGGCGGCCTT GCCCTGAGAG 
2001 CTGGTCCTCC AGCGAGGCCC TGTCAGCGGT CTTAGGCTCC TGCACATGAA 
2051 GGTGTGTGCC TGTGGTGTGT GGGCTGCTCT AGGAGCAGAT ACAGGCTGGT 
2101 ATAGAGGATG CAGAAAGGTA GGGCAGTATG TTTAAGTCCA GACTTGGCAC 
2151 ATGGCTAGGG ATACTGCTCA CTAGCTGTGG AGGTCCTCAG GAGTGGAGAG 
2201 AATGAGTAGG AGGGCAGAAG CTTCCATTTT TGTCCTTCCT AAGACCCTGT 
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2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 



TATTTGTGTT 
CCCTGAACCT 
CAATGAGACC 
CTTACTGGGT 
CTCTTCCTAG 
CCCTAGGGGG 
TGCCCTTTAC 
GGAACTCCCT 
GAGCCACATT 
CCAGTTTCCT 
ACAGCCTGGA 
TGGTGGTCCT 
GAAGGTCGGT 
ACCCTGCAAA 



ATTTCCTGCC 
CATGAGCCTC 
TGGCAGGGCA 
CCTTACCCTG 
ATGCCCACCT 
CTTGCTGCAT 
AGGGGCAGAT 
CTTTCTACAG 
GAGTTGCTTT 
GAGGGAGGCT 
TAGGCAGCCA 
GCCCTTCTCC 
GGGTTAACTG 
GCCAAAAAAA 



TTTCCGAGTC 
TAAGGGAAAG 
GAGTACAAGC 
GGCCAAACAG 
CCTACAATCT 
GGCAATAACT 
TTTCTGCTCA 
CTCACTTCTA 
TTCTGGGATG 
CCTGACAGGT 
CATTGGTCCT 
CTGCATGCCT 
TGTGCCTACT 
AAA 



CTGCAGTGGG 
GAGGAACAAT 
CCAGCACCCA 
GGAGGGCTGA 
CAGCCCACAA 
CATAATCTGA 
GTTCAACAAT 
TCAGAGGCCC 
AGGAAGTAGG 
GCCCTTTGTC 
CGCCCTTGCT 
GTGGGTCTGC 
GAACCTGGCA 



CTGCCCTGTA 
TAGGACGTGG 
GTGTCCCAGC 
TACCTCCTTG 
GTCCTCTCCA 
TTTGGAGGTT 
GAAATGAAGA 
AGGTGCCTCA 
GTTAAACTCC 
AGACCCTACC 
CGGCACTCCG 
TCTGGTGTGT 
AATAAACATC 



BLAST Results 



Entry HS452352 from database EMBL: 
human STS WI-15318 . 
Length = 350 
Minus Strand HSPs : 

Score = 1547 {232.1 bits), Expect = 5.2e-63, P = 5.2e-63 

Identities = 331/348 (95%), Positives = 331/348 (95%), Strand = Minus / 
PI 



Medline entries 



94110847: 

J Neurosci 1994 Jan; 14 ( 1) : 1-13 

1G5 : a calmodulin-binding, vesicle-associated, protein 
kinase-like protein enriched in forebrain neurites. 

Godbout M, Erlander MG, Hasel KW, Danielson PE, Wong KK, Battenberg EL, 
Foye PE, 

Bloom FE, Sutcliffe JG 



Peptide information for frame 1 



1 MPFGCVTLGD KKNYNQPSEV TDRYDLGQVI KTEEFCEIFR AKDKTTGKLH 

51 TCKKFQKRDG RKVRKAAKNE IGILKMVKHP NILQLVDVFV TRKEYFIFLE 

101 LATGREVFDW ILDQGYYSER DTSNVVRQVL EAVAYLHSLK IVHRNLKLEN 

151 LVYYNRLKNS KIVISDFHLA KLENGLIKEP CGIPSYLGNP PFYEEVEEDD 

201 YENHDKNLFR KILAGDYEFD SPYWDDISQA AKDLVTRLME VEQDQRITAE 

251 EAISHEWISG NAASDKNIKD GVCAQIEKNF ARAKWKKAVR VTTLMKRLRA 

301 PEQSSTAAAQ SASATDTATP GAAGGATAAA ASGATSAPEG DAARAAKSDN 

351 VAPADRSATP ATDGSATPAT DGSVTPATDG SITPATDGSV TPATDRSATP 

401 ATDGRATPAT EESTVPTTQS SAMLATKAAA TPEPAMAQPD STAPEGATGQ 

451 APPSSKGEEA AGYAQESQRE EAS 



ORF from 163 bp to 1581 bp; peptide length: 473 
Category: strong similarity to known protein 



BLASTP hits 
Entry S50193 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - rat 
Length = 374 

Score = 371 (130.6 bits), Expect = 2.2e-66, Sum P(2) = 2.2e-66 
Identities = 74/176 (42%), Positives = 115/176 (65%) 

Entry S57347 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - human 
Length = 370 

Score = 369 (129.9 bits), Expect = 4.6e-66, Sum P(2) = 4.6e-66 ' 
Identities = 74/176 (42%), Positives = 114/176 (64%) 



Alert BLASTP hits for DKFZphf br2_82e4 , frame 1 

PIR: 156542 calmodulin-binding protein - rat, N = 2, Score = 1245, P = 
4e-228 
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TREMBLNEW : FRU01034 8_3 product: "calmodulin binding protein kinase"; 
Fugu rubripes UBEl-like gene, PRGFR2 gene and gene encoding calmodulin 
binding protein kinase, clone 168J21, N = 2, Score = 846, P - 2.6e-139 

TREMBL:RNPRKI_1 product: "protein kinase I"; Rattus norvegicus 
calcium/calmodulin-dependent protein kinase I mRNA, complete cds., N = 



2, Score = 364, P = 5.1e-63 

>PIR: 156542 calmodulin-binding protein - rat 
Length = 504 

HSPs: 

Score = 1246 (186.9 bits). Expect = 4.0e-228, Sum P(2) = 4.0e-228 
Identities = 255/289 (88%), Positives = 259/289 (89%) 

Query: 188 GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 247 

GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 
Sbjct: 215 GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 275 

Query: 248 TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQSSTA 307 

TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQS TA 
Sbjct: 276 TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQSGTA 335 

Query: 308 AAQSASATDTATPGAAGGATAAAASGATSAPE GDAARAAKSDNVAPADRSAT 359 

A +D ATPGAAGGA AAAA GA A GDA AAKSD++A ADRSAT 

Sbjct: 336 AT SDAATPGAAGGAVAAAAGGAAPASGASATVGTGGDAGCAAKSDDMASADRSAT 390 

Query: 360 PATDGSATPATDGSVTPATDGS ITPATDGSVTPATDRSAT PATDGRATPATEESTVPTTQ 419 

P AT DGS AT PATDGSVTP AT DGS I TPATDGS VTPATDRSAT PATDGRATPATEESTVP Q 
Sbjct: 391 PAT DGS AT PATDGS VTP AT DGS I TPATDGS VTPATDRSAT PAT DG RAT PATEESTVPAAQ 450 

Query: 420 SSAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 473 

SSA A KAAATPEPA+AQPDSTA EGATGQAPPSSKGEEA G AQESQR E S 
Sbjct: 451 SSAAPAAKAAATPEPAVAQPDSTALEGATGQAPPSSKGEEATGCAQESQRVETS 504 

Score = 978 (146.7 bits), Expect = 4.0e-228, Sum P(2) = 4.0e-228 
Identities = 186/187 (99%), Positives = 187/187 (100%) 

Query: 1 MPFGCVTLGDKKNYNQPSEVTDRYDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 60 

MPFGCVTLGDKKNYNQPSEVTDRYDLGQV+KTEEFCEI FRAKDKTTGKLHTCKKFQKRDG 
Sbjct: 1 MPFGCVTLGDKKNYNQPSEVTDRYDLGQVVKTEEFCEI FRAKDKTTGKLHTCKKFQKRDG 60 

Query: 61 RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 120 

RKVRKAAKNEIGI LKMVKHPNILQLVDVFVTRKEYFT FLELATGREVFDWILDQGYYSER 
Sbjct: 61 RKVRKAAKNEIGI LKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 120 

Query: 121 DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 180 

DTSNVVRQVLEAVAYLHSLKI VHRNLKLENLVYYKRLKNSKIVISDFHLAKLENGLIKEP 
Sbjct: 121 DTSNVVRQVLEAVAYLHSLKI VHRNLKLENLVY YKRLKNSKI VI SDFHLAKLENGLIKEP 180 

Query: 181 CGTPEYL 187 

CGTPEYL 
Sbjct: 181 CGTPEYL 187 

Pedant information for DKFZphfbr2_82e4, frame 1 



Report for DKFZphfbr2_32e4 . 1 

[ LENGTH] 473 

[MW] 51208.89 

[pi] 5.30 

[HOMOL] PIR: 156542 calmodulin-binding protein - rat 0.0 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YFR014c] 4e-30 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YFR014c] 4e-30 

[ FUNCAT] 03.01 cell growth [S. cerevisiae, YFR014c] 4e-30 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKLlOlw] 2e-26 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 2e-26 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YDLlOlc] 8e-26 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YCL024w] 5e-24 

(FUNCAT] 03.25 cytokinesis [S. cerevisiae, YDR507c] 7e-23 

[ FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507C] 
7e-23 

[ FUNCAT ] 03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] le-21 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YPL153c] le-21 
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[FUNCAT] 

[ FUNCAT ] 

3e-19 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ 

[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
YPL031c] 
[FUNCAT] 
[FUNCAT] 
7e-08 
[ FUNCAT ] 
palmityl 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
cerevisi 
[FUNCAT] 
5e-06 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
le-05 
[FUNCAT] 
YNL183c] 
[FUNCAT] 
8e-05 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[EC] 
[EC] 
[EC] 
[EC] 
[EC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



11.01 stress response [S. cerevisiae, YDR477w] 3e-19 
01.05.04 regulation of carbohydrate utilization [S. 



cerevisiae, YDR477w] 



99 unclassified proteins [S. cerevisiae, YPL141c] le-16 

03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 3e-16 

03.13 meiosis [S. cerevisiae, YOR351c] le-15 

30.02 organization of plasma membrane [S. cerevisiae, YDR122w] 3e-14 

10.03.11 key kinases [S. cerevisiae, YCR073c] 6e-ll 
09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] 8e-ll 

10.02.11 key kinases [S. cerevisiae, YJL095w] 2e-09 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR362w] le-08 

10.05.11 key kinases [S. cerevisiae, YLR362w] le-08 
10.04.11 key kinases [S. cerevisiae, YLR362w] le-08 

02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 

04.05.01.04 transcriptional control [S. cerevisiae, YPL031c] 7e-08 
01.04.04 regulation of phosphate utilization [S. cerevisiae, YPL031c] 



7e-08 



06.07 protein modification (glycolsylation, acylation, myristylation, 
ation, farnesylation and processing) [S. cerevisiae, YFL033c] le-07 

04.99 other transcription activities [S. cerevisiae, YFL033c] le-07 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 5e-07 
05.07 translational control [S. cerevisiae, YDR283c] 8e-07 

01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
YHR07 9c] 5e-06 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR07 9c ] 

30.01 organization of cell wall [S. cerevisiae, YlR019c] le-05 

30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-05 

01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c] le-05 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 



8e-05 



01.02.04 regulation of nitrogen and sulphur utilization 



[S. cerevisiae, 



08.99 other intracellular-transport activities 



[S. cerevisiae, YNL183c] 



dlgol 

dlwfc 

dlkoa_2 
dlkoba_ 

dlphk 

dlirk 

dlapme_ 
dlfgka_ 
dlydre_ 
dl f mk_3 
dlcdkb_ 
d2hcka3 

dlcsn 

dl j sua_ 
dlckia 



1.1.1 
1.1.1 
1.1.1 
1.1.1 
1.1.1 
1.1.2 
1.1.1 
1.1.2 
1.1.1 



03.10 sporulation and germination [S. cerevisiae, YDR52 3c ] 2e-04 
c energy conversion [M. genitalium, MG109] 3e-04 
BL00107A Protein kinases ATP-binding region proteins 
BL00939F 

.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-62 
.8 MAP kinase p38 [human (Homo sapiens) 5e-59 
.7 (1-350) Twitchin, kinase domain [Caenorhabditi le-75 
.6 Twitchin, kinase domain [California sea har le-72 
.5 gamma-subunit of glycogen phosphorylase kinas 4e-65 
.4 insulin receptor [Human (Homo sapiens) 2e-56 
.4 cAMP-dependent PK, catalytic subunit [mouse (Mu 4e-71 
.3 Fibroblast growth factor receptor 1 [human (Horn le-50 
.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 3e-70 
.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 5e-49 
.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 2e-72 
.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 5e-46 
.1.1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 9e-42 
.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-56 
_ 5.1.1.1.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) 9e-52 
2.7.1.38 Phosphorylase kinase 3e-29 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 8e-66 
2.7.1.128 [Acetyl-CoA carboxylase] kinase 2e-17 
2.7.1.117 Myosin-light-chain kinase 2e-38 

2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 2e-17 

2.7.1.37 Protein kinase 6e-28 

phosphotransferase 8e-66 

nucleus 2e-24 

transferase 8e-30 

calcium 2e-27 

duplication 4e-19 

tandem repeat 2e-31 

phorbol ester binding le-16 

zinc le-16 

cell cycle control 2e-20 

serine/threonine-specif ic protein kinase 8e-66 

phospholipid binding le-16 

autophosphorylation 8e-66 

brain le-14 

heterotetramer 2e-16 

polymer 3e-29 

mitosis 2e-20 

magnesium 7e-22 

ATP 8e-66 

alternative initiators le-29 
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PIRKW] 


phosphoprotein 8e-66 


PIRKW] 


apoptosis 2e-31 


PIRKW] 


glycoprotein 4e-19 


PIRKW] 


skeletal muscle 3e-28 


PIRKW] 


protein kinase 2e-28 


PIRKW] 


testis 3e-28 


PIRKW] 


signal transduction le-21 


PIRKW] 


cAMP binding le-16 


PIRKW] 


purine nucleotide binding 5e-25 


PIRKW] 


structural protein 4e-19 


PIRKW] 


calcium binding 3e-45 


PIRKW) 


alternative splicing 3e-45 


PIRKW] 


P-loop 5e-25 


PIRKW] 


lipoprotein 2e-16 


PIRKW] 


cardiac muscle 4e-19 


PIRKW] 


muscle 3e-28 


PIRKW] 


myristylation 2e-16 


PIRKW] 


EF hand 5e-29 


PIRKW] 


cell division 2e-38 


PIRKW] 


calmodulin binding 8e-66 


PIRKW] 


smooth muscle 7e-31 


SUPFAM] 


fibronectin type III repeat homology 7e-31 


SUPFAM] 


immunoglobulin homology 7e-31 


SUPFAM] 


ribosomal protein S6 kinase II 3e-26 


SUPFAM] 


calcium-dependent protein kinase 5e-29 


SUPFAM] 


AMP-activated protein kinase 7e-22 


SUPFAM] 


protein kinase akt le-14 


SUPFAM] 


protein kinase SPK1 3e-20 


SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 2e-36 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 3e-45 


SUPFAM] 


calmodulin repeat homology 5e-29 


SUPFAM] 


protein kinase DUN1 2e-24 


SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic chain le-14 


SUPFAM] 


death-associated protein kinase 2e-31 


SUPFAM] 


myosin-light-chain kinase, nonmuscle le-29 


SUPFAM] 


pleckstrin repeat homology le-14 


SUPFAM] 


ankyrin repeat homology 2e-31 


SUPFAM] 


protein kinase homology 8e-66 


SUPFAM] 


Ca2+/calmodulin-dependent protein kinase II 8e-36 


SUPFAM] 


twitchin le-18 


SUPFAM] 


protein kinase C zinc-binding repeat homology le-16 


SUPFAM] 


titin 4e-19 


SUPFAM] 


protein kinase cdrl 2e-20 


SUPFAM] 


kinase-related transforming protein 2e-38 


SUPFAM] 


Ca2 + /calmodulin-dependent: protein kinase I 8e-66 


SUPFAM] 


kinase interaction domain homology 2e-24 


SUPFAM] 


protein kinase C mu le-16 


PROSITE] 


AM I DAT ION 1 


PROSITEJ 


MYRISTYL 3 


PROSITE] 


CK2 PHOSPHO SITE 10 


PROSITE] 


TYR PHOSPHO SITE 2 


PROSITE] 


PKC_PHOSPHO_SITE 11 


PFAM] 


Eukaryotic protein kinase domain 


KW] 


All Alpha 


KW] 


3D 


KW] 


LOW COMPLEXITY 7.4 0 % 



SEQ MPFGCVTLGDKKNYNQPSEVTDRYDLGQVIKTEEFCEIFRAKDKTTGKLHTCKKFQKRDG 

SEG 

Ia06- CEETTTGGGCEEEEEECBCGGGGGEEEEEETTTTCEEEEEEEEC 

SEQ RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFIFLELATGREVFDWILDQGYYSER 

SEG 

Ia06- HHHHHHHHHCCTTTBCCEEEEEEETTEEEEEECCCCCEEHHHHHHHTTTTBHH 

SEQ DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAKLENGLIKEP 

SEG 

Ia06- HHHHHHHHHHHHHHHHHHHCCCTTTTTTTTEEECCCTTTTCEEECCCTTTTCHHHHHCCC 

SEQ CGTPEYLGNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLME 

SEG 

Ia06- HHHHHHHCCTTTTTT THHHHHHHHHCCCCCCTTTTTTTTCHHHHHHHHHHCT 

SEQ VEQDQRITAEEAI SHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRA 

SEG 

Ia06- TTGGGCCCHHHHHHTTTTTTCCCCCCBHHHHHHHHHHHHHCCTTTTTTBTHHHHHHHC. . 

SEQ PEQSSTAAAQSASATDTATPGAAGGATAAAASGATSAPEGDAARAAKSDNVAPADRSATP 

SEG . . XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

Ia06- 
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SEQ ATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQS 

SEG 

Ia06- 

SEQ SAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 

SEG 

Ia06- 



Prosite for DKFZphfbr2_82e4 . 1 



PS00005 


21 


->24 


PKC_PHOSPHO 


SITE 


PDOC00005 


PS00005 


46 


->49 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


51 


->54 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


91 


->94 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


103- 


>106 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


118- 


>121 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


138- 


>141 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


264- 


>267 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


394- 


>397 


PKC PHOSPHO' 


"site 


PDOC00005 


PS00005 


454- 


>457 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


467- 


>470 


PKC PHOSPHO' 


'site 


PDOC00005 


PS00006 


7 


->11 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00006 


91 


->95 


CK2_PHOSPHO_ 


"site 


PDOC0000S 


PSQ0006 


103- 


>107 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


118- 


>122 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


248- 


>252 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


313- 


>317 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


336- 


>340 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


442- 


>446 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


455- 


>459 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


467- 


>471 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


456- 


>464 


T YR PHOSPHO" 


"site 


PDOC0 00 0 7 


PS00007 


127- 


>136 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


260- 


>266 


MYRISTYL 




PDOC00003 


PS00008 


321- 


>327 


MYRISTYL 




PDOC00003 


PSQ0008 


324- 


>330 


MYRISTYL 




PDOC00008 


PS00009 


59 


->63 


AMIDATIOH 




PDOC00009 



Pfam for DKFZphfbr2_82e4 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



24 



1 YeigRilGeGsFGtVYkCiWr . TGelVAIKIIkkrsms FIREIq 

Y +G++I F ++++++++ TG++ K++ KR+ + +EI 

ydlgqvikteefceifrakdkttgklhtckkfqkrdgrkvrkaakneig 



I rf IMyQI LrGMe YLHSMgl IHRDLKPENILIDeN . . . gqIKIcDFGLAR 
++++Q+L++++YLHS +I+HR LK EN+ + ++ I I+DF LA+ 

122 TSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKIVISDFHLAK 



qMnnYerMttfCGTPWY* 
+ N ++ + CGTP+Y 
172 LEN — GLIKEPCGTPEY 



186 



16 



*GepPFyd dnMeralmrliqrf rrpf WpnCSeElyDFMr 

G PPFY+ + +++I++++++F +P+W+ +S ++D+++ 

GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVT 



wCWnyDPekRPTFrQILnHPWF* 
+++++ ++R+T+++++ H W+ 
237 RLMEVEQDQRITAEEAISHEWI 



258 



72 



IMRrLnHPNI IRFYDwFedddDHI YMIMEYMeGGDLFDYIrrngpMsEwe 
I+++++HPNI+++ D+F + +++ + +E++ G + FD+I ++G++SE++ 
73 ILKMVKHPNILQLVDVFV-TRKEYFIFLELATGREVFDWILDQGYYSERD 121 



171 



236 
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group: transmembrane protein 

DKFZphfbr2_82gl4 encodes a novel 208 amino acid proline-rich protein without similarity to 
known proteins. 

The protein contains one transmembrane domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



unknown prolin rich protein 
membrane regions: 1 

Summary DKFZphf br2_82gl4 encodes a novel 208 amino acid protein. 



unknown prolin rich protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map="26.2 cR from top of Chrl6 linkage group" 
Insert length: 2059 bp 

Poly A stretch at pos . 2049, polyadenylation signal at pos . 2024 



1 AGAAGTGCGA CTGCCAGCTG 
51 CTGCCCCAGG GCTGCGGGGA 
101 ATCCAGGCCA GCAGCTGAAG 
151 CACGGATTTG AGGAGAAGCA 
201 TATCCTGGGG GCCCCACAGC 
251 GCCCACCCCA GGCCGTTCCT 
301 TGCCACTGCC CCCTGCGGAC 
351 CACCCAATGC CCCAGCCTGG 
401 CACCTACATG CCTCCGGGTT 
4 51 TGGGCTACTA CCCCCCAGGG 
501 GGGGGCCACA CAGCCACAGT 
551 GACAGTGCTG CAGGGAGAGA 
601 GTCCCCACTG CCAGCAGGCC 
651 TTGATGAATT TCGTGCTGGG 
7 01 GGGCTGCTGC CTGATCCCCT 
7 51 ACACATGCCC CAGCTGCAAA 
801 TAACGGAGCT GGGACTCGGG 
851 GCTTTGCTCC CTGCGCTCAG 
901 GGAGCCGTGC CACCATCCCC 
951 CCTGAGCCGC TGACTCTTCT 
1001 GGGTCAGTGG GTGGCAGGGG 
1051 GCTTGGTGTG TGTGATCGGG 
1101 GTCCTGATGC CTCTGTTTCC 
1151 TCCCCCTGGG ACCAACAGCA 
1201 CCTGTGGCCA CAGGCGTTTC 
1251 TCTGGAGTCA GGTGGGCCCA 
1301 TTTCTGGGTA CTTTGCGCCT 
1351 GGAAGTAAAA CTGCCAACTC 
14 01 CAGGATGTCT AATACCCTGT 
14 51 TAGAGAGGAC ACTGTACCTG 
1501 GGAACTTGTC CCTTTGAGTC 
1551 CTGTGAACCC TGTATTGCTG 
1601 TTCCCGTCTG CCCTGTGTCC 
1651 TGGCTGGTGT ATCCCAACTG 
17 01 GGTGCGCTTG GATGTGCAGA 
17 51 TGCCGGGCCC CCCACCCCAG 
1801 CTGCTCCTGC AGGCACACTG 
1851 TGGTAGAACT GCCTTGGTGG 
1901 AATGGTTTGT GAACTTGCTC 
1951 TCCTGGTCTC GCACTGCCAC 
2001 CCCAGTCTCA GTTTGTAGTT 
2051 AAAAAAAAA 



CCGAGGCGTT CGGTCCTGCT GTTGCGGCCG 
CGCTCCCGGA GCCCTGCCTG TCCCCTGTCC 
GAGCCTCACC TGCCTCCCTT CTCTGAGTAG 
GCGAAGATGT CCAGCGAGCC TCCCCCTCCT 
CCCACTTCTG GAAGAGAAAA GTGGAGCCCC 
CCCCAGCTGT GATGCAGCCC CCTCCAGGCA 
ATTGGCCCCC CACCCTATGA GCCGCCGGGT 
CTTCATCCCA CCACACATGA GTGCAGATGG 
TCTACCCTCC TCCAGGCCCC CACCCACCCA 
CCCTACACGC CAGGGCCCTA CCCTGGCCCT 
CCTGGTCCCT TCAGGAGCTG CCACCACGGT 
TCTTTGAGGG AGCGCCTGTG CAGACGGTGT 
ATCGCCACCA AGATCTCCTA CGAGATTGGC 
TTTCTTCTGT TGCTTCATGG GATGTGATCT 
GCCTCATCAA TGACTTCAAG GATGTGACGC 
GCCTACATCT ACACGTACAA GCGCCTGTGC 
ACTCCCCCGC CTGTCAGTCT GGCCCCCTGT 
TGGTCACTTT CCCGCTCCCA CTTGGGGCTG 
TAGAAGTCCT GTCCTCTTCA CCCTGCCCTA 
GGCAAAAATT CTGTTGGGAT TTAAGGCCAA 
GCTGGCAATG AGCTTGTGTG TTGTTGGTCT 
AAGATAAGCT GGGAGGGGTC TCCTGCTGGG 
AAACAAGGTA CAGGTTCAGT CCAGACTCTT 
GCCAGAGCAG TTAGCCAGTT AGTCCCCAGG 
TGACCTGCTG GGCCGAGAAT GGGTAAGTTG 
CGTAGGACAG GGTCACAAAG CCTGGGTTTG 
CTGGGGTGCT AGACGTGGGG CATGGTGGCT 
TGGCCCTCAG AACTCTCAGG TATAGAAGCC 
CCCAGTGCCC GAGAGCTGCC TGGTGTCAGG 
GGTGAATGAT CAGACCCTGG TAGCTAAGAA 
AGTGTGCAGA CCCCCTTTCA GGCCATGCCT 
GGGCCGGAAG GAGCCCCTGA GCCTAGCCCC 
TCACTGCGTG TGGGTATGAC CTCTGCCTGG 
GGCAAGAGAT GGCAGAGGGT CCCCCTTGTG 
GCCTTCTCCA TGGATTTTCT TCCCTGTAAG 
CTGACAGGCT GTTGCTGTGC CTGCTCACAC 
GGCTAGGGAC GAGGAAGGAG CAGCCACAAG 
ACACCAGCCT CGCCCTGTCT TTATTTCCTG 
ACCTGGACCA CTGTATCCTG CCACTGTCCT 
TGCATGGCCT CCTGTCACTG TGAATCGTGG 
TCTCATTAAA TTGGCCCTTT CACTCCCCCA 



BLAST Results 
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Entry HS727347 from database EMBL: 
human STS WI-16589. 
Length = 275 
Plus Strand HSPs : 

Score = 1365 (204.8 bits), Expect = 3.0e-55, P = 3.0e-55 

Identities = 275/276 (99%), Positives = 275/276 (99%), Strand = Plus / 

PI 



Medline entries 



No Medline entry 



Peptide information for frame 3 



1 MSSEPPPPYP GGPTAPLLEE KSGAPPTPGR SSPAVMQPPP GMPLPPADIG 

51 PPPYEPPGHP MPQPGFIPPH MSADGTYMPP GFYPPPGPHP PMGYYPPGPY 

101 TPGPYPGPGG HTATVLVPSG AATTVT VLQG EIFEGAPVQT VCPHCQQAIA 

151 TKISYEIGLM NFVLGFFCCF MGCDLGCCLI PCLINDFKDV THTCPSCKAY 
201 I YTYKRLC 

ORF from 177 bp to 800 bp; peptide length: 208 
Category: similarity to known protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82gl4, frame 3 

PIR:S57447 HPBRII-7 protein - human, N = 1, Score = 206, P = 8.4e-16 

PIR:A47655 spliceosome-associated protein SAP 62 - human, N = 1, Score 
= 198, P = 4.3e-15 



>PIR:S57447 HPBRII-7 protein - human 
Length = 551 

HSPs: 



Score = 206 (30.9 bits), Expect = 8.4e-16, P = 8.4e-16 
Identities = 57/115 (49%), Positives = 62/115 (53%) 



Query: 


5 


PPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPP PYEP 


56 






PPPP+P G T P G P PG P PPPG LPP GPP P P 




Sbjct: 


226 


PPPPFPAGQTPP — RPPLGPPGPPGPPGP PPPGQVLPPPLAGPPNRGDRPPPPVLF 


279 


Query: 


57 


PGHPMPQP — GFI PPHMSADGTYMP-PGFYPPPGPHPPM-GYYPP-GPYTPGPYPGPGGH 


111 




PG P QP G +PF G P PG+ PPPGP PP G PP GP+ P P PGP G 




Sbjct: 


280 


PGQPFGQPPLGPLPP GPPPPVPGYGPPPGPPPPQQGPPPPPGPFPPRP-PGPLGP 


333 


Query: 


112 


TATVLVP 118 








T+ P 




Sbjct: 


334 


PLTLAPP 34 0 




Score 


= 177 


(26.6 bits), Expect = l.le-12, P = l.le-12 




Identities = 


= 55/120 (45%), Positives = 61/120 (50%) 




Query: 


5 


PPPPYPGGPTAP — LLEEKSGAPPTPG-RSS PAVM QP PPGMPLPPADIGPPPYE 


55 






P PP P GP P +L PP G R P V+ QP PP PLPP GPPP 




Sbjct: 


244 


PGPPGPPGPPPPGQVLPPPLAGPPNRGDRPPPPVLFPGQPFGQPPLGPLPP GPPP-P 


299 


Query: 


56 


PPGHPMPQPGFIPPHMSADGTYMPPGFYPP— PGP-HPPMGYYPPGPYTPGPYPG PG 


109 






PG+ P PG PP G PPG +PP PGP PP+ PP P+ PGP PG P 




Sbjct: 


300 


VPGYG-PPPGPPPPQQ GPPPPPGPFPPRPPGPLGPPLTLAPP-PHLPGPPPGAPPPA 


354 



Query: 110 GHTATVLVP 118 

H P 
Sbjct: 355 PHVNPAFFP 363 

Score = 168 (25.2 bits), Expect = l.le-11, P = l.le-11 
Identities = 47/118 (39%), Positives = 51/118 (43%) 



Query: 5 PPPPYPG-GPTAPLLEEKSGAPPTPGRSSPAVMQP— PPGMPLPPADI-GPPPYEPPGHP 60 
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Sbjct: 


296 


Query : 


61 


Sbjct: 


356 


Query : 


121 


Sbjct : 


411 



PPPP PG GP + G PP PG P P PP PP + GPPP PP 



P F PP ++ MP P P P G PP PY G Y PG T P 

HVNPAFFPPPTNSG MPTSDSRGPPPTDPYGR-PP-PYDRGDYGPPGREMDTARTPLS 410 



Score = 156 (23.4 bits), Expect = 2.1e-10, P = 2.1e-10 
Identities = 44/103 (42%), Positives = 50/103 (48%) 

Query: 6 PPPYPGGPTAPLLEEKSGAPPT-PGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHPMPQP 64 

P PGG P G PP P +P +PP G P PP GPPP PG +P P 
Sbjct: 20S PGAVPGGDRFPGPAGPGGPPPPFPAGQTPP--RPPLGPPGPPGPPGPPP PGQVLPPP 262 

Query: 65 GFIPPHMSADGTYMPPGFYP-PPGPHPPMGYYPPGPYTP GPYPGP 108 

PP+ D PP +P P BP+G PPGP P GP PGP 
Sbjct: 2 63 LAGPPNRG-DRP-PPPVLFPGQPFGQPPLGPLPPGPPPPVPGYGPPPGP 309 

Score = 121 (18.2 bits). Expect = 5.2e-05, P = 5.2e-05 
Identities = 40/90 (44%), Positives = 45/90 (50%) 

Query: 23 GAPPTPGRSSPAVMQPP-PGMPLPPAD-IGPP-PYEPPGHPMPQPG-FIPPHMSADGTYM 78 

G PG + P PP P PP +GPP P PPG P P PG +PP ++ 
Sbjct: 213 GGDRFPGPAGPGGPPPPFPAGQTPPRPPLGPPGPPGPPG-P-PPPGQVLPPPLAG 265 

Query: 79 PP — GFYPPPG PHPPMGYYPPGPYTPGPYPG-PG 109 

PP G PPP P P G P GP PGP P PG 
Sbjct: 266 PPNRGDRPPPPVLFPGQPFGQPPLGPLPPGPPPPVPG 302 



Pedant information for DKFZphfbr2_82gl4 , frame 3 



Report for DKFZphf br2_82gl4 . 3 



[LENGTH] 208 

[MW] 21862.47 

[pi] 5.55 

[PROSITE] MYRISTYL 3 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 39.90 % 



SEQ MSSEPPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHP 

SEG . . . .xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccchhhhhhccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ MPQPGFIPPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccceeeeecccc 

MEM 



SEQ AATTVTVLQGEI FEGAPVQTVCPHCQQAI ATKISYEIGLMNFVLGFFCCFMGCDLGCCLI 

SEG 

PRD cceeeeeeeeeeecccceeeeccchhhhhhhhhhhhhhhceeeeeeeeeecccccceeec 

MEM MMMMMMMMMMMMM 



SEQ PCLINDFKDVTHTCPSCKAYI YTYKRLC 

SEG 

PRD eeeecccccccccccccceeeeeeeccc 

MEM MMMM 



Prosite for DKFZphf br2_82gl4 . 3 

PS00005 196->199 PKC_PHOSPHO_SITE PDOC00005 

PS00005 203->206 PKC_PHOSPHO_SITE PDOC00005 

PS00008 109->115 MYRISTYL PDOC00008 

PS00008 120->126 MYRISTYL PDOC00008 

PS00008 172->178 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphf br2_82gl4 .3) 



360 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_82il7 



group: signal transduction 

DKFZphtes2_82il7 encodes a novel 334 amino acid protein with similarity to the plasma membrane 
substrate for the cAMP-dependent protein kinase. 

The novel protein is a transmembrane protein with strong similarity to the phospholemman 
protein, a membrane substrate for the cAMP-dependent protein kinase. It seems to serve as a 
chloride channel or as a chloride-channel regulator. 

The new protein can find application in modulating/blocking cAMP-dependent protein kinase- 
dependent pathways . 



similarity to plasma membrane substrate for cAMP-dependent protein kinase 
complete cDNA, complete cds, EST hits 

potential start at Bp 31 matches Kozak consensus PyNNatgG 
might be a SODIUM/POTASSIUM-TRANSPORTING ATPASE 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map-" 11; 920_E_12; 786_(A,H)_11; (797 , 802)_ (E, H) _7 " 
Insert length: 1647 bp 

Poly A stretch at pos. 1637, polyadenylation signal at pos . 1615 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



AGTCTCGGAG 
TCCTCTGCAG 
GAGAAGGAAA 
GGGACTGGTG 
TAAGTCGCAG 
GATGAGGAAG 
CCAGAAAGCA 
CTGAGGCGGC 
CCGGCCACTT 
TGTCCCCCAC 
CTAACACTTG 
TGTGTGTGTG 
TCTTTGTGGC 
GGACTCGCTT 
CTGCCCCCGT 
CCCGAGACCA 
GGGCAGTGGT 
GTCATCATTC 
CCTTATCCCA 
AAAGCAAGGA 
TCCGTGGTTA 
CCGCCCCTTC 
GCCCCTGGGG 
GCTCTGGGAC 
TACAGCCCAG 
TGGCAGGCAA 
GGATGGATGG 
TAGATGGGCA 
GTCAGAGCGG 
CCTTGGGAAC 
CCAGATCCCG 
CGTGCGCTGT 
ACAACAGAAA 



GGGACCGGCT 
CCTGCTGGCC 
TGGACCCTTT 
TTCGCTGTGG 
GTGCAAGTGC 
CCCAGGTGGA 
GAGAACTGAA 
TGCTTGAACC 
CAGCAACAGC 
CCTATCCCCT 
CCTCCCCGCT 
TGTGTGTGTG 
TACTTGTTTG 
TCCCAGGCAG 
GGCCCTCCAT 
GCCCCCTCCC 
CTTCAGTCGT 
TTCATGGACT 
CCTGATCCCA 
GCTGGTGAGC 
ATTTCTTCCC 
ACAGAGCGCC 
AATGTGTCCC 
CCTACCCCTT 
CTCATCCAGA 
TAGTTGAAGG 
AGGGAGAGCA 
GCAGAGGCAA 
TGAGCGAGGT 
AGTGAGAGGT 
CCCCTCCTGT 
GACCCATTGC 
AAAGGAATAA 



GTGCAGACGC 
CCCATGGTCC 
TCATTATGAT 
TTCTCTTCTC 
AGTTTCAATC 
GAACCTCATC 
GTGCAGCCAT 
TTTGGATGCA 
CCTTTCCCCA 
CTAACACCAT 
GCAGCCTGTG 
TGTGTGACTG 
TGGATGGTAT 
GGGCTGAGCC 
CACCTTCTGC 
CTGATTTAGG 
CTTGGGACCT 
CCTTTCACTC 
GTCTGAAGGT 
CCAGCGTTGA 
AGGGGCTTCC 
CGGGGATTCC 
CTGCATATCT 
CCAACCTTCC 
TGCAGACTAC 
ACTTCCTGTT 
GAGGCCTTTG 
CTCCCGCATC 
GGGTTGGAGA 
TGAAGGTCAT 
CCTCTGTGTT 
TGTTCTCTGT 
AATATCCTTT 



CATGGAGTTG 
TGGCCAGTGC 
TACCAGACCC 
GGTTGGGATC 
AGAAGCCCCG 
ACCGCCAATG 
CAGGTGGAAG 
AATGTCGATG 
GGAGAAGCCA 
TCCTCCACCT 
GTCCTGCCCA 
TGTGTGTTTG 
TGTGTTTGTT 
ACACGGCCAT 
TCCTAGGAGG 
GATGCGTAGG 
GGGAAGGTTT 
CTTTAACAAA 
CTCTTAGCAA 
CGTCAGGCAG 
ACGAGGAGTC 
AGGCCCAGGG 
TCTCAGCAAT 
CTGCTTCTGA 
AGTCCCTGCA 
CCGTTGGGGC 
CTTCTCTGCC 
CTTTGCTCTG 
CTCAGCAGGC 
AACGAGAGTG 
CCCGCGGAAA 
ATCGTGACCT 
GTTTCCTAAA 



GTGCTGGTCT 
AGCTGAAAAG 
TGAGGATTGG 
CTCCTTATCC 
GGCCCCAGGA 
CAACAGAGCC 
CCTCTGGAAC 
CTTAAGAAAA 
AGAACTTGTG 
GATGATGCAA 
CCTCCCGTGA 
CTAACTGTGG 
AGTGAACTGT 
CTGCTCCTCC 
CTGCT7GTTG 
GTAAGAGCAC 
GCAGCACTTT 
AACCTTGCTT 
CTGGAGATAC 
GCTATGCCCT 
CCCATCTGCC 
CTTCTACTCT 
AACTCCATGG 
GACTTCAATC 
ATTGGGTCTC 
CAGCACACCG 
TACGTCCCCT 
CCTGTCAGTG 
TCCGTGCAGC 
GGAACTCAAC 
CCAACCAAAC 
ATCCTCAACA 
AAAAAAA 



BLAST Results 



Entry HS31455 from database EMBL : 
human STS WI-2739. 
Length = 103 
Minus Strand HSPs: 

Score = 487 (73.1 bits), Expect = 4.4e-14, P = 4.4e-14 

Identities = 101/104 (97%), Positives = 101/104 (97%), Strand = Minus / 
Plus 

frame shift in primer binding site 
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Medline entries 



91250422: 

Purification and complete sequence determination of the major plasma 
membrane substrate 

for cAMP-dependent protein kinase and protein kinase C in myocardium. 
95091702 : 

Protein kinase C and cyclic AMP-dependent protein kinase phosphorylate 
phospholemman, 

an insulin and adrenaline-regulated membrane phosphoprotein, at 
specific sites in the 

carboxy terminal domain. 

95138184 : 

Mat-8, a novel phospholemman-like protein expressed in human breast 
tumors, induces a 

chloride conductance in Xenopus oocytes. 



Peptide information for frame 2 



1 MELVLVFLCS LLAPMVLASA AEKEKEMDPF HYDYQTLRIG GLVFAVVLFS 
51 VGILLILSRR CKCSFNQKPR APGDEEAQVE NLITANATEP QKAEN 

ORF from 32 bp to 316 bp; peptide length: 95 
Category: strong similarity to known protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82il7, frame 2 

SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. , N = 1, Score = 196, P = 
1.2e-15 

TREMBL : AF091390_1 product: "phospholemman precursor"; Mus musculus 
phospholemman precursor, gene, complete cds., N = 1, Score = 187, P = 
1 . le-14 

PIR:A40533 cAMP-dependent protein kinase major membrane substrate 
precursor - dog, N = 1, Score = 189, P = 6.5e-15 

SWISSPROT:PLM_RAT PHOSPHOLEMMAN PRECURSOR. , N = 1, Score = 185, P = 
1.7e-14 



>SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 
Length = 92 

HSPs : 

Score - 196 (29.4 bits), Expect = 1.2e-15, P = 1.2e-15 
Identities = 43/85 (50%), Positives = 56/85 (65%) 

Query: 4 VLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAVVLFSVGILLILSRRCKC 63 

+LVF LL + AE KE DPF YDYQ+L+IGGLV A +LF +GIL++LSRRC+C 

Sbjct: 7 ILVFCVGLLT MAKAESPKEHDPFTYDYQSLQIGGLVIAGILFILGILIVLSRRCRC 62 

Query: 64 SFNQKPRA--PGDEEAQVENLITANAT 88 

FNQ+ R P +EE +1 +T 

Sbjct: 63 KFNQQQRTGEPDEEEGTFRSSIRRLST 89 



Pedant information for DKFZphfbr2_82il7, frame 2 



Report for DKFZphf br2_82il7 . 2 



[LENGTH] 95 

[MW] 10542.37 

[pi] 5.05 

[HOMOL] SWISSPROT: PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 3e-15 

[BLOCKS] BL01310 
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[EC] 


3.6.1.37 Na+/K+-exchanging ATPase 6e-08 




LIul 1 OlllCllLL_>-L ClllC ULUtCJ.ll 


le-09 


[PIRKW] 


hydrolase 6e-08 




[PROSITE] 


ATP1G1 PLM MAT 8 


1 


[PROSITE] 


MYRISTYL 1 




[PROSITE] 


CK2 PHOSPHO SITE 


1 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


2 


[PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


Alpha_Beta 




[KW] 


SIGNAL_PEPTIDE 19 





SEQ MELVLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAVVLFSVGILLILSRR 

PRD ccchhhhhhhhhhccccccccccccccccccceeeeecccceeeehhhhhhheeeeehhh 

SEQ CKCSFNQKPRAPGDEEAQVENLITANATEPQKAEN 

PRD hhhcccccccccccchhhhhhhhhhhccccccccc 



Prosite for DKFZph£br2_82il7 . 2 



psooooi 


8 6- 


■>90 


ASN GLYCOSYLATION 


PDOCC0001 


PS00005 


36- 


>39 


PKC_PHOSPHO SITE 


PDOC00005 


PS00005 


58- 


■>61 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


19- 


•>23 


CK2 PHOSPHO SITE 


PDOCC0006 


PS00007 


25- 


>33 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


41- 


>47 


MYRISTYL 


PDOC00008 


PS01310 


28- 


■>42 


ATP1G1 PLM MAT 8 


PDOC01014 



(No Pfam data available for DKFZphfbr2_82117.2) 
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DKFZphfbr2_82i24 



group: nucleic acid management 

DKFZphfbr2_82i24 encodes a novel 547 amino acid protein with similarity to DEAD-box 
superfamily ATP-dependent helicases. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis . 

The novel protein contains a DEAD-box an ATP/GTP-binding site motif A (P-loop, interacting 
with one of the phophate groups of the nucleotide) and a leucine zipper. Mutations in the 
closely related Drosophila Hlc gene result in lethality in homozygotes. Therefore the new 
protein seems to be critical involved in RNA processing in eulcariontic c ells. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to DEAD-box subfamily ATP-dependent helicase 
complete cDNA, complete cds, EST hits 

potential Start at Bp 9 matches Kozak consensus PyNNatgG, 
[PFAM] Helicases conserved C-terminal domain 
[PFAM] DEAD and DEAH box helicases 

Sequenced by DKFZ 

Locus: /raap="720_A_3; 758_H_4; 772_E_3; 804_A_5; 175.5 cR from top FT of Chr7 linkage group" 
Insert length: 1860 bp 

Poly A stretch at pos . 1850, polyadenylation signal at pos . 1829 



1 
51 
101 
151 

201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



AGCAGCGCCA 
CGATCCCCGG 
CGCTGATCCA 
CTGGCTCGGG 
GATGCTGCAG 
AGGCAGTGAG 
GCACAGTCCA 
AGTGGCCAAT 
TGATGGAGAA 
CACTTGCAGC 
GGTGGACGAA 
AGAGTCTCCT 
GCTACTTTTA 
CCCGGTTACC 
TACAGCAGTT 
CTGTATGCCC 
TGTCAACACT 
TCAGCATCCC 
TGCCACATCA 
AACTGATGCT 
GAGGGCCCAA 
GGCATAGACT 
AACCCCTGAG 
ACCCAGGCAT 
GGCAAGATTG 
CCCCTACCAG 
GGGATGCCAT 
AAGGAGATCA 
TGAAGACAAC 
ACCCCGCAGT 
CCTCCTGCTC 
GTCTTCCTCT 
GCAGCTTCAA 
TGAGGTTGTT 
ACACCCTTCG 
AGACAGTTCT 
CAAGCTGGCA 
AAAAAAAAAA 



TGGAGGACTC 
CTCCTTCAGG 
GGAGAAGGCC 
CCCGCACGGG 
CTGTTGCTCC 
AGGCCTTGTT 
TGATTCAGCA 
GTCTCAGCTG 
GCCAGATGTG 
AAGACAGCCT 
GCTGACCTTC 
CTGTCACTTG 
ACGAGGACGT 
CTTAAGTTAC 
TCAGGTGGTC 
TGCTCAAGCT 
CTAGAACGGA 
CACCTGTGTG 
TCTCACAGTT 
GAAGTCCTGG 
AGGGGACAAG 
TCCACCATGT 
GCCTACATCC 
AGTCTTAACC 
AGGAGCTTCT 
TTCCGGATGG 
GCGCTCAGTG 
AGGAAGAGCT 
CCTAGGGACC 
GGTGAAGCCC 
TCCGTGGCCT 
TGTAGGAAGG 
GCACAAAGGA 
GGGCCTCTCT 
TGGACAGGCG 
GGGGCCGGCA 
TCTTGCCCCT 



TGAAGCACTG 
CTGTCACCCA 
ATCCCACTGG 
CTCCGGGAAG 
ATAGGAAGGC 
CTTGTTCCTA 
GCTGGCTACC 
CTGAAGACTC 
GTAGTAGGGA 
GAAACTTCGT 
TTTTTTCCTT 
CCCCGGATTT 
ACAAGCACTC 
AGGAGTCCCA 
TGTGAGACTG 
GTCATTGATT 
GTTACCGGCT 
CTCAATGGAG 
CAACCAAGGC 
GGGCCCCAGT 
GCCTCTGATC 
GTCTGCTGTG 
ATCGAGCTGG 
TTTGTGCTTC 
CAGTGGAGAG 
AGGAGATCGA 
ACTAAGCAGG 
TCTGCATTCT 
TCCAGCTGCT 
CACCTGGGCC 
GGTACGCCCT 
CCAAGAGAGC 
AAGAAATTCA 
GGAGCTGAGC 
AGGCTCTGGT 
GTGCTGGGCC 
TGACAACAGA 



GGCTTCGAAC 
TCTGGGCTGG 
CCCTAGAAGG 
ACGGCCGCTT 
GACAGGTCCG 
CCAAGGAGCT 
TACTGTGCTC 
AGTCTCTCAG 
CCCCATCTCG 
GACTCCCTGG 
TCGCTTTGAA 
ACCAGGCTTT 
AAGGAGCTGA 
GCTGCCTGGG 
AGGAAGACAA 
CGGGGCAAGT 
ACGCCTGTTC 
AGCTTCCACT 
TTCTACGACT 
CAAGGGCAAG 
CGGAAGCAGG 
CTCAACTTTG 
CAGGACAGCA 
CCACGGAGCA 
AACAGGGGCC 
GGGCTTCCGC 
CCATTCGGGA 
GAGAAGCTTA 
GCGGCATGAC 
ATGTTCCTGA 
CACAAGAAGC 
AAAGTCCCAG 
GACCCACAGC 
ACATTGTGGA 
GCTTACTGCA 
CTTTAGCTCC 
ATAAAAATTT 



ACATGGGCCT 
TCGCCACCTA 
GAAGGACCTC 
ATGCTATTCC 
GTGGTAGAAC 
GGCACGGGAA 
GGGATGTCCG 
AGAGCTGTGC 
CATATTAAGC 
AGCTTTTGGT 
GAAGAGCTCA 
TCTCATGTCA 
TATTACATAA 
CCAGACCAGT 
ATTCCTCCTG 
CTCTGCTCTT 
TTGGAACAGT 
GCGCTCCAGG 
GTGTCATAGC 
CGTCGGGGCC 
TGTGGCCCGG 
ATCTTCCCCC 
CGCGCTAACA 
GTTCCACTTA 
CCATTCTGCT 
TATCGCTGCA 
GGCAAGATTG 
AGACATACTT 
CTACCTTTGC 
CTACCTGGTT 
GGAAGAAGCT 
AACCCACTGC 
CAAGCCCTCC 
GCACAGGCTT 
CAGCCTGAAC 
TTGGCACTTC 
TAGCTGCCCC 



BLAST Results 



364 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Entry HSG05793 from database EMBL: 
human STS WI-6581. 
Length = 206 
Minus Strand HSPs: 

Score = 992 (148.8 bits), Expect = 6.0e-38, P = 6.0e-38 

Identities = 204/208 (98%), Positives = 204/208 (98%), Strand = Minus / 
PI 

Entry AC004938 from database EMBL: 

Homo sapiens clone DJ0971C03; HTGS phase 1, 18 unordered pieces. 
Score = 1269, P = 6.5e-202, identities = 269/282 
12 exons Bp -87920-93706 (matching 1-1497) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 1650 bp; peptide length: 547 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (51-59) 
LEUCINE_ZIPPER (149-171) 



1 MEDSEALGFE HMGLDPRLLQ 

51 ARTGSGKTAA YAIPMLQLLL 

101 MIQQLATYCA RDVRVANVSA 

151 QDSLKLRDSL ELLVVDEADL 

201 NEDVQALKEL ILHNPVTLKL 

251 LLKLSLIRGK SLLFVNTLER 

301 ISQFNQGFYD CVIATDAEVL 

351 FHHVSAVLNF DLPPTPEAYI 

401 EELLSGENRG PILLPYQFRM 

451 KEELLHSEKL KTYFEDNPRD 

501 LRGLVRPHKK RKKLSSSCRK 



AVTDLGWSRP TLIQEKAIPL ALEGKDLLAR 

HRKATGPVVE QAVRGLVLVP TKELARQAQS 

AEDSVSQRAV LMEKPDWVG TPSRILSHLQ 

LFSFGFEEEL KSLLCHLPRI YQAFLMSATF 

QESQLPGPDQ LQQFQVVCET EEDKFLLLYA 

SYRLRLFLEQ FSIPTCVLNG ELPLRSRCHI 

GAPVKGKRRG RGPKGDKASD PEAGVARGID 

HRAGRTARAN NPGIVLTFVL PTEQFHLGKI 

EEIEGFRYRC RDAMRSVTKQ AIREARLKEI 

LQLLRHDLPL HPAVVKPHLG HVPDYLVPPA 

AKRAKSQNPL RSFKHKGKKF RPTAKPS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82i24, frame 1 

TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila 
melanogaster tweety (tty) , flightless (fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw) , bobby 
sox (bbx) , sluggish (slg) , helicase (hlc), misato (rast) , and la costa 
(lcs) genes, complete cds . , N = 1, Score = 1230, P = 3.2e-125 

TREMBL:SPCC14 94_6 gene: "SPCC1494 . 06c" ; product: "atp dependent 
helicase"; S.pombe chromosome II cosmid cl494., N = 2, Score = 753, P = 
2.5e-113 

PIR:S51412 hypothetical protein YLR276c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 711, P = 8.2e-117 

TREMBL:AF025451_2 gene: "C24H12.4"; Caenorhabditis elegans cosmid 
C24H12., N = 2, Score = 564, P = 2.7e-9,9 



>TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila 

melanogaster tweety (tty), flightless (fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw), bobby sox 
(bbx), sluggish (slg), helicase (hlc), misato (mst), and la costa (lcs) 
genes, complete cds. 
Length = 560 

HSPs : 

Score = 1230 (184.5 bits), Expect = 3.2e-125, P = 3.2e-125 
Identities = 251/497 (50%), Positives = 344/497 (69%) 
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Query : 




IT IT LI MPT HDD T T Pi7A VTFlT rffl^RPTT TflFKflT PT HI VCZK nT T 7A R Zl RTP.QPTCTZ1 7A VZ1 T PMT PiT 
C iLitll l^LiL/r tSLiLiS^M V 1 UboWOixc 1 Li i- St t\rt X c 1j f-t J_i £j\j i\ LJI_tLinI\nf\ 1 o VjIV 1 r-Vrt I L It1*.1.LjS«:Li 


68 






it i t r> D _i_T xnw T PUT J.DT7 T /~1 fiTDT T T7PVri4-4- DBDTTCrifT'Tl VA4-D4-4-P1 
t + LiD K+Jj+H V LioW Tr 1 lily Al trli LitAjlS.IJT-r KHK1 IjoolS. J. A lAtt^tty 




Sbjct: 


11 


FHELELDQRILKAVAQLGWQQPTLIQSTAIPLLLEGKDVVVRARTGSGKTATYALPLIQK 


70 


Query : 


d y 


T T UDK'n r T 1 PD\A7irnZl\/PPTVT VPTTflTT ttRPiaPl^MT PlPiT TAT VP 7A R T"1\7R VZUvH/C! — Q Zl PFl^U D 


127 






4-T J- V CP \7 4-VT DTK VI RPi4-4- + T4-PtT P 4- 17RW7A-J- + + 4-4- H4-U4-PI 




Sbjct: 


71 


ilnsklnas--eqyvsavvlaptkelcrqsrkvieqlvescgkvvrvadiadssndtvtq 


128 


Query : 


128 


D ft WT MITTfDniAn/rTDQTJTT CUT Piping T Tf T Rn<;T FT T UUniTinT T TTC; TTPPVTrrT K^ZJ T PHT 


187 






R TV Dnj.W rrinj. XT X X _L Cl ±r TU17TMT2inT 4- 174- J. P J. IT 4. 4. If T 4- TJ T 
K L D rLltW i r+ tJjtt t tb IjV VLIII1ALJL1 + r +TLi-i-£j-t--t- js. LiT riLi 




Sbjct: 


129 


RHALSESPDIVVATPANLLAYAEAGSVVDLKHVETLWDEADLVFAYGYEKDFKRLIKHL 


188 


Query : 


loo 


DRTVPiTiFT MQLT ITMirrSWPiA T tflTT T T HMOWTT If T rtPCAT DPD T/SPT PiPiTTPATWP TTTCTnTfTTT T 


9 AH 






d Tvnn t x c t\ t 1 i.nu 1 x t xhtdt7t>t vt xt xt nnr 1, ,1, 1 c 1 u* r\v x 
F 11 yA L1+0AI +LJ V +i\ Li Li+Nr V li J is.L>+r J +Li LiLJ-'-' + + + t UK ■+• 




Sbjct: 


189 


PPIYQAVLVSATLTDDVVRMKGLCLNNPVTLKLEEPELVPQDQLSHQRILAE-ENDKPAI 


247 


Query : 


9 a q 


T V7AT T TfT QT T RP If T T TPWMTT ITR Q VP T RT CT PAPQT RTPtJT MPT?T RT DCCTUTT QP1IPMP1P 
Li I HljLit\LoLilK^rSoljLiE VCJ 1 LiLit\o I KLiKJ-i t LtiL>\jE oL Jr 1 U V LiJMljlj.Lir Lit\0 KL-ri X. liDsjH Hyb 


JU / 






T VQT T T»fT T T RP V Q 4-4- tT\7M4-4- 4-R V4-4-HT ITT lTPlT T P17T M TTT D RUT CPlTTM-J-P 
Li I HLiLif\Li Li L KlafSo t+£ VNtt + k ittrLI! JjEj^JE J. L,v±jW SLLilr K fl loyi Ntb 




Sbjct: 


248 


LYALLKLRLIRGKSIIFVNSIDRCYKVRLFLEQFGIRACVLNSELPANIRIHTISQFNKG 


307 


Query : 




RVnPT7T 7iTn7A ITX7T P 7A D^7WP 6f D DPR P DliTPTWZl C nDP7iPI77i DP T nPUHU C? 7A 17T BFHT DDTRTT 


JO f 






v Pi xTTixn x r> f~~ x ic* _i. j. r> cx j.or"T nr 1 7 j. 17xmitt^ t> 
1 U TlA-r-JJ t c or IS. T+ U CiT +KLii uc v+ VtDJcU f 




Sbjct: 


308 


TYDIUASDEHHMEKP--GGKSATNRKSPRSGDMESSASRGIDFQCVNNVINFDFPRDVT 


365 


Query : 


368 


J\VTUDH^DT1\ DI MM rS/~" T I JT T 1 E7T 7T nTCACUT /~* T PPT T CTrMDrm T T T> VrMTTJ RyTC 1 TJ 1 T 

A I IrtKAtjKl AKANIN c tjl VLlt VLri tyt MLbf.1 tLLJj bLjtNKljtri L,hsr 1 IJr KMUjCji 


423 






+ Y1HRAGRIAR NN G VL+fcV b +Et L + + 1+ YQr +MLL-I- 




Sbjct : 


366 


SYIHRAGRTARGNNKGSVLSFVSMKESKVNDSVEKKLCDSFAAQEGEQI IKNYQFKMEEV 


425 


Query: 


424 


EGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPLHPA 


483 






E FRYR +D R+ T+ A+ + R++EIK E+L+ EKLK +FE+N RDLQ LRHD PL 




Sbjct: 


426 


ESFRYRAQDCWRAATRVAVHDTRIREIKIEILNCEKLKAFFEENKRDLQALRHDKPLRAI 


485 


Query: 


484 


VVKPHLGHVPDYLVPPALRGLV 505 








V+ HL +P+Y+VP AL+ +V 




Sbjct: 


486 


KVQSHLSDMPEYIVPKALKRVV 507 





Pedant information for DKFZphfbr2_82i24, frame 1 



Report for DKFZphfbr2_82i24 . 1 



[S. cerevisiae, YKR059w] 3e-39 
[S. cerevisiae, YDL160c] 3e-35 



(S. cerevisiae, 
[S. cerevisiae, 



[LENGTH] 547 

[MW] 61589.88 

[pi] 9.34 

[HOMOL] TREMBL:AF017777_10 gene: "hlc"; product: "helicase"; Drosophila melanogaster 
tweety (tty) , flightless (fli), dodo (dod) , penguin (pen), small optic lobes (sol), innocent 
bystander (iby) , waclaw (waw) , bobby sox (bbx) , sluggish (slg), helicase (hlc), misato (mst), 
and la costa (lcs) genes, complete cds. le-121 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YLR276c] le-109 

[FUNCAT] j mrna translation and ribosome biogenesis EH. influenzae, HI0231 RNA] 

2e-42 

[FUNCAT] 04.01.04 rrna processing [S. cerevisiae, YLL008w] 8e-40 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YLL008W] 8e-40 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YLL008w] 8e-40 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. 
cerevisiae, YKR059w] 3e-39 

[FUNCAT] 30.03 organization of cytoplasm 

[FUNCAT] 04.99 other transcription activities 

[FUNCAT] 04.05.03 mrna processing (splicing) 

[FUNCAT] 04.05.01.07 chromatin modification 

[FUNCAT] 1 genome replication, transcription, recombination and repair 
influenzae, HI0892] le-27 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 2e-27 

[FUNCAT] 30.16 mitochondrial organization [S. cerevisiae, YDR194C] 4e-21 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YGL064c] le-05 

[BLOCKS] BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

[BLOCKS] BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

[PIRKW] nucleus 4e-34 

[PIRKW] RNA binding 7e-41 

[PIRKW] DEAD box 2e-38 

[PIRKW] transmembrane protein 9e-20 

[PIRKW] DNA binding 8e-23 

[PIRKW] ATP le-107 

[PIRKW] purine nucleotide binding 2e-38 

[PIRKW] P-loop le-107 

[PIRKW] hydrolase 2e-35 

[PIRKW] protein biosynthesis 2e-38 

[PIRKW] ATP binding 7e-43 



YPL119C] 3e-29 
YMR2 90c] 4e-29 
[H. 
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[SUPFAM] WW repeat homology le-26 

[SUPFAM] DEAD/H box helicase homology le-107 

[SUPFAM] unassigned DEAD/H box helicases le-107 

[SUPFAM] ATP-dependent RNA helicase DBP1 3e-31 

[SUPFAM] ATP-dependent RNA helicase DHH1 2e-35 

[SUPFAM] translation initiation factor eIF-4A 2e-38 

[SUPFAM] tobacco ATP-dependent RNA helicase DB10 le-26 

[PROSITE] ATP_GTP_A 1 

[PROSITE] LEUCINE_ZIPPER 1 

[PFAM] Helicases conserved C-terminal domain 

[PFAM] DEAD and DEAH box helicases 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 9.87 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MEDSEALGFEHMGLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAA 

ccccccccccccccchhhhhhhhhhccccccccccccccccccccceeeeecccccccee 

YAI PMLQLLLHRKATGPVVEQAVRGLVLVPTKELARQAQSMI QQLAT YCARDVRVANVS A 

ehhhhhhhhhhhcccccccccceeeeeeccchhhhhhhhhhhhhhhhhhhccee eeeecc 

AEDS VSQRAVLMEKPDVVVGT PSRI LSHLQQDSLKLRDSLELLVVDEADLLFSFGFEEEL 

xxxxxxxxxxxx 

ccchhhhhhhhhcccceeeeccccchhhhhhcccccchhhhhhhhhhhhhhhhhcchhhh 

KSLLCHLPRIYQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCET 

hhhhhhccchhhhhhhhhccchhhhhhhhhhhcccceeeeeccccccchhhhhhhhhhhh 

EEDKFLLLYALLKLSLIRGKSLLFVNTLERSYRLRLFLEQFSIPTCVLNGELPLRSRCHI 

xxxxxxxxxxx 

hhhhhhhhhhhhhhhhccceeeeeeehhhhhhhhhhhhhhcccceeeccccchhhhhhhh 

ISQFNQGFYDCVIATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNF 

xxxxxxxxxxxxx 

hhhhhccceeeeeeccccccccccccccccccccccccccccccccccccccceeeeeec 

DLPPTPEAY IHRAGRTARANNPGI VLTFVLPTEQFHLGKIEELLSGENRGPILLPYQFRM 

ccccccceeeeccccccccccccceeeeeecchhhhhhhhhhhhhhhccccccccccchh 

EEIEGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhlihhhhccc 

HPAVVKPHLGHVPDYLVPPALRGLVRPHKKRKKLSSSCRKAKRAKSQNPLRSFKHKGKKF 

xxxxxxxxxxxxxxxxxx 

cccccccccccccceeeccccccccccccccccccchhhhhhcccccccccccccccccc 

RPTAKPS 

ccccccc 



Prosite for DKFZphf br2_82i24 . 1 

PS00017 51->59 ATP_GTP_A PDOC00017 

PS00029 149->171 LEUCINE ZIPPER PDOC00029 



Pfam for DKFZphf br2_82i24 . 1 



HMM_NAME 


DEAD and DEAH box helicases 




HMM 
Query 


13 


* gLpPWI LRnI yeMGFEkPTPIQQqAI Pi ILeGRDVMACAQTGSGKTAAF 
GL+P +L +++++G+++PT IQ++AIP++LEG+D++A+A TGSGKTAA+ 
GLDPRLLQAVTDLGWSRPTLIQEKAIPLALEGKDLLARARTGSGKTAAY 


61 


HMM 
Query 


62 


UPMLQHIDwdP. . .WpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMn 
+IPMLQ +++ + + + +R+L+L+PT ELA+Q Q +++++ ++ 
AIPMLQLLLHRKATGPVVEQA-VRGLVLVPTKELARQAQSMIQQLATYCA 


110 


HMM 
Query 


111 


g.IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDr . 

+R++ + + Q +L+++P ++V++TP R++ H+++ +L+L++ 
RDVRVANVSAAEDSVSQRAVLMEKP-DVVVGTPSRILSHLQQDSLKLRDS 


159 


HMM 




IeMLVMDEADRMLDMGFIDQIRrlMrqlPMpwNRQTMMFSATMPdelqEL 
+E LV DEAD +++ GF++++ ++ ++P + Q + SAT+ +++Q L 
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Query 160 LELLVVDEADLLFSFGFEEELKSLLCHLP — RI YQAFLMSATFNEDVQAL 207 

HMM ARrFMRNPIRInldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 

+ +++NP+ + + +++L + ++Q+ +++E E++KF +L+ L++ 
Query 208 KELILHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLLLYALLK 253 



HMM_NAME Helicases conserved C-terminal domain 

HMM +EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDV. . . 

+L+ +L++ I+++++ G +P + R 1+ +FN+G Y++ I+TD+ 
Query 272 YRLRLFLEQFSIPTCVLNGELPLRSRCHIISQFNQGFYDCVIATDAEVL 320 

HMM ggRGIDI PdVNHVINYDMPWNPEqYI 

+RGID+ V+ V N+D+P +PE YI 
Query 321 GAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNFDLPPTPEAYI 370 

HMM QRIGRTgRIG* 

+R+GRT+R++ 
Query 371 HRAGRTARAN 380 
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DKFZphfbr2_82ml6 



group: brain derived 

DKFZphf br2_82ml6 encodes a novel 289 amino acid protein with very weak similarity to 
A.thaliana F28A23. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A.thaliana F28A23.140 

complete cDNA, complete cds, few EST hits 
many ATGs in front of the ORF 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map="4" 

Insert length: 2715 bp 

Poly A stretch at pos . 2705, polyadenylation signal at pos. 2687 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



AGAGGAGGGG 
CCCGGCTGCT 
AAAGGATGGG 
GCGGCCGCCT 
ATTCATTCAA 
TGGGCGGCTT 
GGAGGGCAGG 
GCTGGGGTTC 
CCCAGTAGGC 
GGGTAGTTGC 
TCTGGCGGAT' 
GTCACGGAGC 
ACAGGCTCAC 
CCAAAGGAGA 
TTTAGCGCTG 
TCCGATTTCG 
GTTCCTGCGG 
TCCTGCGCTC 
CAACTGTCCT 
TTGAAAAGGG 
AGCCCGTAAC 
CAGGGGATTT 
CCAGGGCGAT 
AACAACAGGG 
CCACTCAGGA 
CCCCTCATCA 
GTCCTGCCTC 
TCTGCAAGTA 
TGGGAGAAAC 
TGTCACATTC 
TATTGATAGA 
GTCCTTGAAT 
CACAGGAGGT 
TGTGGCGCAG 
CCAGACACTG 
AGACATCAAA 
CACTGCCATC 
AACCTGTTGG 
CTACAAATGA 
TTTCTCCTAC 
AAAACAAACA 
AAGTAAAGTA 
GACAATTAAG 
ATTAAATGGA 
AGAATACAGA 
TTAACTGTCT 
AAATTGTCTG 
CATATTTTGA 
TCTGGTTATT 
TTTGAAGACA 
AAGATTTTCA 
GAACTGTTAT 



AGAGGACTGG 
CGTCCGCTAG 
AGAATGCCCG 
GAGCTACTTC 
TCAGCCTCAC 
GTCCCTGATC 
CAGCAGGTGC 
GCGGCGGGTG 
GCCTCTCTGG 
TACTATTGGC 
CCCCAGTGCG 
CGCACCCCTC 
GCGCTCCTAG 
GCGCTGAGTA 
ATGCGCTCAA 
GTTCCCTGCA 
CCAGCGGGAC 
ACGGGAAATG 
GCTGTGATTA 
ATAGGAGGAA 
CCTCACAGAA 
GGCTGACGCC 
CTGCAAGTCG 
ACAGCTCCCA 
CATCTGCAGA 
CACCCTGTCG 
CACCAGTGGA 
TGACTTCATA 
T AC AG AT G AC 
CACGTAATCG 
CCGGACAGCG 
GGCCATTTTG 
CTTGTCTTCA 
GCTGAAGGCC 
CCAAAAAACT 
GATGCTGTGG 
TGCAGAGGGT 
GAGTTTCTTC 
CAGAAGTGAC 
TGACACATTT 
AACAGACCAA 
TGATATGGAA 
AACTACTGGG 
CCCATGATAC 
CTTGAATTGC 
CATCTGGAAA 
CATTATCCCT 
TATGCTTTGA 
TTGACAGATG 
TTGATATTTG 
GAAAATGTTC 
CAGGTCTTAT 



GGAGCCGAGC 
CTGGGGAGGA 
CGCCCCGGGA 
ACCCTCCGCC 
TGGGAGCCCC 
CCGAGCGGGG 
CTTTGCCTGG 
CTGCCACCCA 
TGAGAGGAGG 
CCCCAGCGCC 
CGGCGCGCTG 
GGAAAGCGCG 
TGTGGGCTTG 
CGGAAGACAC 
CCCTGAGTCG 
ACCGCCCTCC 
TGAGAGCTGG 
TACCCCAAAA 
AACAAGACTG 
GGGGAAAATG 
TTCCAAACAA 
TCACAAACCT 
ATCAAGTAAC 
GGAGCCAGTC 
ATCTGTCACT 
CTGCACTGGG 
TAAAGAGCTC 
ATGGAGACCA 
CACAAGTGAA 
CGATCACCTG 
GAGGAAATCA 
GACAAAACTG 
TGTACGTACA 
TACAACCGTG 
GGAGAAGAAC 
TAGTGCCTGT 
GGCCCCCCTG 
ACCGAAGAAT 
CTTGAATTAT 
TTCCTGACTT 
ATGCCCAGGA 
ATGTGAAGTT 
GCAATGAATG 
TCTTCTTCAC 
GATGTGTATT 
TAATAACTAA 
AAGTCACATT 
CAGCTAACAG 
CATGTTTTTT 
GAATTAATTA 
GGATATAATT 
ATTTATTTTC 



CAGAGCCGGG 
GCGCTCCACC 
TGCCGGCCGC 
GGTAAGTGAC 
TTCTCTCCGG 
CTTGGCACAG 
TGGGTCCACT 
ACCTGCGGGC 
CGGCTCCAGC 
CGCTCTGCGC 
TTTACACCGG 
GAGTCGATGA 
AAGGGGACGG 
AGGGCAGCCT 
GGTTCACTGC 
TGGGCGAGAG 
GACTTAAGAC 
GAACTCTGAG 
CTGTATTTTA 
CTGGGCTGGT 
CACGCGAACA 
CCACATTGAA 
ATTTCAAAAG 
AAGGTTGTCT 
GCGAAGGGGA 
ACACTGCGCT 
AGATACACGC 
AGCTCAAACC 
AGGAGGAAAA 
TGTGGTTTGG 
AGCAAGCCAA 
GTTGTGGTAG 
GTGTAAAGTC 
TGATCTTTGT 
TTCTCATGTA 
ACCACAAACA 
AAGTTGTATC 
ATCTTTCTAG 
TTACTCCCTT 
TGTTCAAAGA 
GCCCATGAAG 
TGCAAGAGAA 
CTTTTAGGCA 
AGTAACAGGG 
ACTTCTAGGG 
CATATTTGGT 
GGAAGTGAAC 
ATT'f GTATGG 
TTAAATAGAT 
TGTTTGTTTA 
AGCTCTGTTA 
ATCTGGTTCC 



CTGCCTGCCA 
CGCAACTGAC 
ACGCAGCCTG 
TGCAAACATC 
CTGGTAGTCC 
CATCAGCCCT 
GGGGAGCGTG 
GGCGGGCTCG 
CCGCATCCTG 
GCGCGCCGTT 
CGTGGTACTA 
CAGCCACTTC 
GGACCGATTA 
TTGTCTTGGG 
AACTGTTGTG 
ATGTCATTGT 
GCCAGGAGGG 
AGAATATACT 
ATTTCAGAAA 
GTGAAGCGAT 
CCCGAGATCT 
TGAAAAATCC 
CAAGCAGCCC 
GTCTGTCCAT 
TGAAGAGAGC 
TTGTCCACCA 
TGCTGTGAGC 
CCTCCGGAAG 
TATTCTGCTC 
TCTTTGTATG 
TGACAATGGT 
CCATTGGCTT 
TATGTTCAGT 
ACAAAATTGC 
ATGTAAACAC 
GGTGCAAATT 
AGTCTGATGG 
CCCTCAGCCA 
CAGCTCCTCC 
GGAAAGGAGA 
TAATAGCGTA 
TGATTTCCAA 
GTAATCAAAG 
GAAAAGTTCA 
CCTTGTAATG 
TTTAAGCCTG 
TTGGAGGATG 
TTTAGTGGAG 
GCAATATACA 
AGTCACGCAA 
AATACCCACA 
TCTAATACAG 
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2601 TGCTGTCCAA TAGAAACACA ACAGCCACAA ATGCAGGCCA CAGATGCAAA 
2651 TATTTAACTT CCCAGTAGCC CTATTTTAAA AAGTAAAAAT AAATGTTTGT 
27 01 TTGTTAAAAA AAAAA 



BLAST Results 



Entry G37457 from database EMBLNEW: 
SHGC-57357 Human Homo sapiens STS genomic. 
Length = 458 
Plus Strand HSPs: 

Score = 2116 (317.5 bits), Expect = 4.3e-91, P = 4.3e-91 
Identities = 444/456 (97%) 



Medline entries 



No Medline entry 



Peptide information for frame 3 



1 MLGWCEAIAR NPHRIPNNTR TPEISGDLAD ASQTSTLNEK SPGRSASRSS 

51 NISKASSPTT GTAPRSQSRL SVCPSTQDIC RICHCEGDEE SPLITPCRCT 

101 GTLRFVHQSC LHQHIKSSDT RCCELCKYDF IMETKLKPLR KWEKLQMTTS 

151 ERRKIFCSVT FHVI AITCVV WSLYVLIDRT AEEIKQGNDN GVLEWPFWTK 

201 LVVVAIGFTG GLVFMYVQCK VYVQLWRRLK AYNRVIFVQN CPDTAKKLEK 

251 NFSCNVNTDI KDAVVVPVPQ TGANSLPSAE GGPPEVVSV 

ORF from 978 bp to 1844 bp; peptide length: 289 
Category: similarity to unknown protein 



BLASTP hits 
Entry AB011169_1 from database TREMBL: 

gene: "KIAA0597"; product: "KIAA0597 protein"; Homo sapiens mRNA for 

KIAA0597 protein, partial cds. 

Score = 188, P = 6.0e-12, identities = 30/54, positives = 38/54 
Entry SPBC14F5_7 from database TREMBL: 

gene: "SPBC14F5 . 07"; product: "hypothetical protein"; S.pombe 
chromosome II cosmid cl4F5. 

Score = 185, P = 1.9e-ll, identities = 29/53, positives = 38/53 
Entry CEY57A10B_1 from database TREMBL : 

gene: " Y57A10B . 1 " ; Caenorhabditis elegans cosmid Y57A10B 

Score = 171, P = 2.6e-10, identities = 40/107, positives - 58/107 



Alert BLASTP hits for DKFZphfbr2_82ml6, frame 3 

TREMBL : ATF28A2 3_14 gene: "F28A23 . 140"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII 

project), N = 1, Score = 198, P = 3.4e-13 



>TREMBL:ATF28A23_14 gene: "F28A23 . 140"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII project) 
Length = 1,051 

HSPs: 

Score = 198 (29.7 bits), Expect = 3.4e-13, P - 3.4e-13 
Identities = 38/103 (36%), Positives = 61/103 (59%) 

Query: 28 LADASQTSTLNEKSPGRSASRS-SNISKASSPTTGTAPRSQSRLSVCPSTQDICRICHCE 86 

+++ S +S+ + SP +++ SN+ A S TG+ +D+CRIC 
Sbjct: 20 VSEPSVSSSSSSSSPNQASPNPFSNMDPAVSTATGSRYVDDDE DEEDVCRICRMP 74 

Query: 87 GDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDTRCCELCKYDF 130 

GD ++PL PC C+G+++FVHQ CL QW+ S+ R CE+CK+ F 
Sbjct: 75 GDADNPLRYPCACSGSIKFVHQDCLLQWLNHSNARQCEVCKHPF 118 
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Report for DKFZphfbr2_82ml6 . 3 

[LENGTH] 2B9 

[MW] 32308.36 

[pi] 8.76 

[HOMOL] PIR:T00268 hypothetical protein KIAA0597 - human (fragment) 9e-14 

[FUNCAT] 04.99 other transcription activities [S . cerevisiae, YIL030c] 4e-09 

[PIRKW] transmembrane protein 9e-08 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.57 % 

SEQ MLGWCBAIARNPHRIPNNTRTPEISGDLADASOTSTLNEKSPGRSASRSSNISKASSPTT 

SEG XXXXXXXXXXXXXXXXXXX . . 



PRD ccchhhhhhccccccccccccccccchhhhhhhhhccccccccccccccccccccccccc 

SEQ GTAPRSQSRLSVCPSTQDICRICHCEGDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDT 

SEG 

PRD ccccccccccccccccceeeeeeecccccccccccccccccceeeeehhhhhhhhhcccc 

SEQ RCCELCKYDFIMETKLKPLRKWEKLQMTTSERRKI FCSVTFHVIAITCVVWSLYVLIDRT 

SEG 

PRD ceeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

SEQ AEEIKQGNDNGVLEWPFWTKLVVVAIGFTGGLVFMYVQCKVYVQLWRRLKAYNRVIFVQN 

SEG 

PRD ccccccccccceeehhhhheeeeeeecccccceeeeehhhhhhhhhhhhhhhheeeeeee 

SEQ CPDTAKKLEKNFSCNVNTDIKDAVVVPVPQTGANSLPSAEGGPPEVVSV 

SEG 

PRD ccchhhhhhccccccccccceeeeeeecccccccccccccccccccccc 



Prosite for DKFZphfbr2_82ml6 . 3 



PS00001 


17->21 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


51->'55 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


251->255 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00005 


102->105 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005- 


150->153 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00005 


244->247 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


36->40 


CK2^ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


75->79 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


149->152 


CK2] 


"PHOSPHO SITE 


PDOC00006 


PS00006 


180->184 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00007 


121->129 


TYR 


"PHOSPHO SITE 


PDOC00007 


PS00008 


187->193 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphf br2_82ml6 . 3 ) 
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DKFZphfbr2_82m6 



group: signal transduction 

DKFZphfbr2_82m6 . 3 encodes a novel 654 amino acid protein with similarity to murine sphingosine 
kinase . 

Sphingosine kinase is a new type of lipid kinase, which is regulated by growth factors. The 
enzyme phosphorylates sphingosine, which subsequently exerts intracellular and extracellular 
actions. Intracellulary, sphingosine 1-phosphate (SPP) promotes proliferation and inhibits 
apoptosis. In yeast, survival of cells exposed to heat shock indicates is dependend on SPP. 
Extracellulary, SPP inhibits cell motility and influences cell morphology, effects that appear 
to be mediated by the G protein-coupled receptor EDG1 . 

The new protein can find application in modulating/blocking the shingosine kinase 
intracellular signal transmission pathway. 



strong similarity to mouse "sphingosine kinase" 

complete cDNA, complete cds, EST hits, 
YLR260w/YOR171c Lcb5p/Lcb4p = long chain base kinases, 
involved in biosynthesis of sphingolipids 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2875 bp 

Poly A stretch at pos . 2865, polyadenylation signal at pos. 2838 



1 AGTGTTGGAG GTGAGGAGGC GGGGCTGGCA GGGCTAGTCG GGGCATCTGG 
51 AAATTTCCGA CCCCACGCTT CGGGCGTTTC CTTATCAGGT TCACCGCTCC 
101 CTGATCTCGC GCTGCACTTC GTAGGCGCAG CCGCTGCTTG GGAAGTCCTA 
151 CTTAAGAGCT GAAGGTCAGG CCAGGACAGT GAGACCTGAC TCCTTGCTCC 
201 TACCAGCCTA CTATGGCTTA AGACCCAGGG CCAGGGTCCC GTTGATGTAA 
251 CAGAGCAGAG GACCAGCAGA TGAATGGACA CCTTGAAGCA GAGGAGCAGC 
301 AGGACCAGAG GCCAGACCAG GAGCTGACCG GGAGCTGGGG CCACGGGCCT 
351 AGGAGCACCC TGGTCAGGGC TAAGGCCATG GCCCCGCCCC CACCGCCACT 
401 GGCTGCCAGC ACCTCGCTCC TCCATGGCGA GTTTGGCTCC TACCCAGCCC 
4 51 GAGGCCCACG CTTTGCCCTC ACCCTTACAT CGCAGGCCCT GCACATACAG 
501 CGGCTGCGCC CCAAACCTGA AGCCAGGCCC CGGGGTGGCC TGGTCCCGTT 
551 GGCCGAGGTC TCAGGCTGCT GCACCCTGCG AAGCCGCAGC CCCTCAGACT 
601 CAGCGGCCTA CTTCTGCATC TACACCTACC CTCGGGGCCG GCGCGGGGCC 
651 CGGCGCAGAG CCACTCGCAC CTTCCGGGCA GATGGGGCCG CCACCTACGA 
701 AGAGAACCGT GCCGAGGCCC AGCGCTGGGC CACTGCCCTC ACCTGTCTGC 
7 51 TCCGAGGACT GCCACTGCCC GGGGATGGGG AGATCACCCC TGACCTGCTA 
801 CCTCGGCCGC CCCGGTTGCT TCTATTGGTC AATCCCTTTG GGGGTCGGGG 
851 CCTGGCCTGG CAGTGGTGTA AGAACCACGT GCTTCCCATG ATCTCTGAAG 
901 CTGGGCTGTC CTTCAACCTC ATCCAGACAG AACGACAGAA CCACGCCCGG 
951 GAGCTGGTCC AGGGGCTGAG CCTGAGTGAG TGGGATGGCA TCGTCACGGT 
1001 CTCGGGAGAC GGGCTGCTCC ATGAGGTGCT GAACGGGCTC CTAGATCGCC 
1051 CTGACTGGGA GGAAGCTGTG AAGATGCCTG TGGGCATCCT CCCCTGCGGC 
1101 TCGGGCAACG CGCTGGCCGG AGCAGTGAAC CAGCACGGGG GATTTGAGCC 
1151 AGCCCTGGGC CTCGACCTGT TGCTCAACTG CTCACTGTTG CTGTGCCGGG 
1201 GTGGTGGCCA CCCACTGGAC CTGCTCTCCG TGACGCTGGC CTCGGGCTCC 
1251 CGCTGTTTCT CCTTCCTGTC TGTGGCCTGG GGCTTCGTGT CAGATGTGGA 
1301 TATCCAGAGC GAGCGCTTCA GGGCCTTGGG CAGTGCCCGC TTCACACTGG 
1351 GCACGGTGCT GGGCCTCGCC ACACTGCACA CCTACCGCGG ACGCCTCTCC 
14 01 TACCTCCCCG CCACTGTGGA ACCTGCCTCG CCCACCCCTG CCCATAGCCT 
14 51 GCCTCGTGCC AAGTCGGAGC TGACCCTAAC CCCAGACCCA GCCCCGCCCA 
1501 TGGCCCACTC ACCCCTGCAT CGTTCTGTGT CTGACCTGCC TCTTCCCCTG 
1551 CCCCAGCCTG CCCTGGCCTC TCCTGGCTCG CCAGAACCCC TGCCCATCCT 
1601 GTCCCTCAAC GGTGGGGGCC CAGAGCTGGC TGGGGACTGG GGTGGGGCTG 
1651 GGGATGCTCC GCTGTCCCCG GACCCACTGC TGTCTTCACC TCCTGGCTCT 
1701 CCCAAGGCAG CTCTACACTC ACCCGTCTCC GAAGGGGCCC CCGTAATTCC 
1751 CCCATCCTCT GGGCTCCCAC TTCCCACCCC TGATGCCCGG GTAGGGGCCT 
1801 CCACCTGCGG CCCGCCCGAC CACCTGCTGC CTCCGCTAGG CACCCCGCTG 
1851 CCCCCAGACT GGGTGACGCT GGAGGGGGAC TTTGTGCTCA TGTTGGCCAT 
1901 CTCGCCCAGC CACCTAGGCG CTGACCTGGT GGCAGCTCCG CATGCGCGCT 
1951 TCGACGACGG CCTGGTGCAC CTGTGCTGGG TGCGTAGCGG CATCTCGCGG 
2001 GCTGCGCTGC TGCGCCTTTT CTTGGCCATG GAGCGTGGTA GCCACTTCAG 
2051 CCTGGGCTGT CCGCAGCTGG GCTACGCCGC GGCCCGTGCC TTCCGCCTAG 
2101 AGCCGCTCAC ACCACGCGGC GTGCTCACAG TGGACGGGGA GCAGGTGGAG 
2151 TATGGGCCGC TACAGGCACA GATGCACCCT GGCATCGGTA CACTGCTCAC 
2201 TGGGCCTCCT GGCTGCCCGG GGCGGGAGCC CTGAAACTAA ACAAGCTTGG 
2251 TACCCGCCGG GGGCGGGGCC TACATTCCAA TGGGGCGGAG CCTGAGCTAG 
2301 GGGGTGTGGC CTGGCTGCTA GAGTTGTGGT GGCAGGGGCC CTGGCCCCGT 
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2351 CTCAGGATTG CGCTCGCTTT CATGGGACCA GACGTGATGC TGGAAGGTGG 
2401 GCGTCGTCAC GGTTAAAGAG AAATGGGCTC GTCCCGAGGG TAGTGCCTGA 
2451 TCAATGAGGG CGGGGCCTGG CGTCTGATCT GGGGCCGCCC TTACGGGGCA 
2501 GGGCTCAGTC CTGACGCTTG CCACCTGCTC CTACCCGGCC AGGATGGCTG 
2551 AGGGCGGAGT CTATTTTACG CGTCGCCCAA TGACAGGACC TGGAATGTAC 
2601 TGGCTGGGGT AGGCCTCAGT GAGTCGGCCG GTCAGGGCCC GCAGCCTCGC 
2651 CCCATCCACI CCGGTGCCTC CATTTAGCTG GCCAATCAGC CCAGGAGGGG 
2701 CAGGTTCCCC GGGGCCGGCG CTAGGATTTG CACTAATGTT CCTCTCCCCG 
2751 CGGGTGGGGG CGGGGAAATT CATATCCCCT GTTCGTCTCA TGCGCGTCCT 
2801 CCGTCCCCAA TCTAAAAAGC AATTGAAAAG GTCTATGCAA TAAAGGCAGT 
2851 CGCTTCATTC CTCTCAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



99045661: 

Tumor necrosis factor-alpha induces adhesion molecule 
expression through the sphingosine kinase pathway. 

98395082: 

Molecular cloning and functional characterization 
of murine sphingosine kinase. 

98241633: 

Purification and characterization of rat kidney sphingosine kinase. 
99178622: 

Sphingosine 1-phosphate: a prototype of a new class of second 
messengers . 



Peptide information for frame 3 



1 MNGHLEAEEQ QDQRPDQELT GSWGHGPRST LVRAKAMAPP PPPLAASTSL 

51 LHGEFGSYPA RGPRFALTLT SQALHIQRLR PKPEARPRGG LVPLAEVSGC 

101 CTLRSRSPSD SAAYFCIYTY PRGRRGARRR ATRTFRADGA ATYEENRAEA 

151 QRWATALTCL LRGLPLPGDG EITPDLLPRP PRLLLLVNPF GGRGLAWQWC 

201 KNHVLPMISE AGLSFNLIQT ERQNHARELV QGLSLSEWDG I VTVSGDGLL 

251 HEVLNGLLDR PDWEEAVKMP VGILPCGSGN ALAGAVNQHG GFEPALGLDL 

301 LLNCSLLLCR GGGHPLDLLS VTLASGSRCF SFLSVAWGFV SDVDIQSERF 

351 RALGSARFTL GTVLGLATLH TYRGRLSYLP ATVEPASPTP AHSLPRAKSE 

401 LTLTPDPAPP MAHSPLHRSV SDLPLPLPQP ALASPGSPEP LPILSLNGGG 

451 PELAGDWGGA GDAPLSPDPL LSSPPGSPKA ALHSPVSEGA PVIPPSSGLP 

501 LPTPDARVGA STCGPPDHLL PPLGTPLPPD WVTLEGDFVL MLAISPSHLG 

551 ADLVAAPHAR FDDGLVHLCW VRSGISRAAL LRLFLAMERG SHFSLGCPQL 

601 GYAAARAFRL EPLTPRGVLT VDGEQVEYGP LQAQMHPGIG TLLTGPPGCP 

651 GREP 



ORF from 270 bp to 2231 bp; peptide length: 654 
Category: similarity to known protein 



BLASTP hits 
Entry SPAC4A8_7 from database TREMBL: 

gene: "SPAC4A8 . 07c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c4A8 . 

Score = 301, P = 7.9e-32, identities = 68/190, positives = 109/190 
Entry CEC34C6_3 from database TREMBLNEW: 

product: "C34C6.5"; Caenorhabditis elegans cosmid C34C6 
>TREMBL:CEC34C6_3 product: "C34C6.5"; Caenorhabditis elegans cosmid 
C34C6 

Score = 273, P = 9.0e-29, identities = 78/265, positives = 142/265 
Entry S67059 from database PIR: 

hypothetical protein YOR171c - yeast (Saccharomyces cerevisiae) 
>TREMBL:SC55021_9 gene: "03615"; product: "03615p"; Saccharomyces 
cerevisiae cosmid pUOA1258 from chromosome 15R. >TREMBL:SCYOR170W_2 
S. cerevisiae chromosome XV reading frame ORF YOR170w 
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Score = 253, P = 2.0e-25, identities = 70/234, positives = 116/234 
Entry S51398 from database PIR: 

hypothetical protein YLR2 60w - yeast ( Saccharomyces cerevisiae) 
>TREMBL:SCL8479_4 gene: "YLR260W"; product: "Ylr260wp"; Saccharomyces 
cerevisiae chromosome XII cosmid 8479. 

Score = 251, P = 1.0e-24, identities = 62/198, positives = 103/198 



Alert BLASTP hits for DKFZphfbr2_82m6, frame 3 

TREMBL:AF0 687 4 9_1 gene: "SPHKlb"; product: "sphingosine kinase"; Mus 

musculus sphingosine kinase (SPHKlb) mRNA, complete cds., N = 2, Score 
= 615, P = 1.2e-92 

TREMBL:AF068748_1 gene: "SPHKla"; product: "sphingosine kinase"; Mus 

musculus sphingosine kinase (SPHKla ) mRNA, partial cds., N = 2, Score = 
616, P = 2e-92 

TREMBL : ATF1 8E5_1 6 gene: "F18E5 . 160"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F18E5 (ESSAII 
project), N = 2, Score = 370, P - 6.8e-33 



>TREMBL:AF0 68748_1 gene: "SPHKla" ; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKla) mRNA, partial cds. 
Length = 504 

HSPs: 



Score = 616 (92.4 bits), Expect = 2.0e-92, Sum P(2) = 2.0e-92 
Identities = 128/260 (49%), Positives = 173/260 (66%) 



Query : 


154 


ATALTCLLRGLPLPGDGEITPDLLPRPPRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGL 


213 






A C L + E LLPRP R+L+L+NP GG+G A Q ++ V P + EA + 




Sbjct: 


110 


APVAPCQREPRDLAMEPECPRGLLPRPCRVLVLLNPQGGKGKALQLFQSRVQPFLEEAEI 


169 


Query: 


214 


SFNLIQTERQNHARELVQGLSLSEWDGIVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGI 


273 






+F LI TER+NHARELV L WD + +SGDGL+HEV+NGL++RPDWE A++ P+ 




Sbjct : 


170 


TFKLILTERKNHARELVCAEELGHWDALAVMSGDGLMHEVVNGLMERPDWETAIQKPLCS 


229 


Query : 


274 


LPCGSGNALAGAVNQHGGFEPALGLDLLLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFL 


333 






LP GSGNALA +VN + G+E DLL+NC+LLLCR P++LLS+ ASG R +S L 




Sbjct: 


230 


LPGGSGNALAASVNHYAGYEQVTNEDLLINCTLLLCRRRLSPMNLLSLHTASGLRLYSVL 


289 


Query: 


334 


SVAWGFVSDVDIQSERFRALGSARFTLGTVLGLATLHTYRGRLSYLPA-TVEPASPTPAH 


392 






S++WGFV+DVD++SE++R LG RFT+GT LA+L Y+G+L+YLP TV AS PA 




Sbjct : 


290 


SLSWGFVADVDLESEKYRRLGEI RFTVGTFFRLASLRT YQGQLAYLPVGTV — ASKRPAS 


347 


Query : 


393 


SL-PRAKSELTLTPDPAPPMAH 413 








+L + + L P P +H 




Sbjct: 


348 


TLVQKGPVDTHLVPLEEPVPSH 369 




Score 


= 324 


(48.6 bits). Expect = 2.0e-92, Sum P(2) - 2.0e-92 




Identities = 


= 72/160 (45%), Positives = 100/160 (62%) 




Query : 


499 


LPLPTPDARVGASTC GPPDHLLPPLGTPLPPDWVTL-EGDFVLMLAISPSHLGADLV 


554 






LP+ T ++ AST GP D L PL P+P W + E DF+L+L + +HL ++L 




Sbjct: 


335 


LPVGTVASKRPASTLVQKGPVDTHLVPLEEPVPSHWTVVPEQDFLLVLVLLHTHLSSELF 


394 


Query : 


555 


AAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQLGYAAARAFRLEPLT 


614 






AAP R + G++HL +VR+G+SRAALLRLFLAM++G H L CP L + AFRLEP + 




Sbjct: 


395 


AAPMGRCEAGVMHLFYVRAGVSRAALLRLFLAMQKGKHMELDCPYLVHVPVVAFRLEPRS 


454 


Query: 


615 


PRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCP-GRE 653 








RGV +VDGE + +Q Q+HP ++ G P GR+ 




Sbjct: 


455 


QRGVFSVDGELMVCEAVQGQVHPNYLWMVCGSRDAPSGRD 494 




Score 


= 37 


(5.6 bits), Expect = 3.6e-62, Sum P(2) = 3.6e-62 




Identities = 


= 8/20 (40%), Positives = 9/20 (45%) 




Query : 


459 


GAGDAPLSPDPLLSSPPGSP 478 








G+ DAP D PP P 





Sbjct: 485 GSRDAPSGRDSRRGPPPEEP 504 

Pedant information for DKFZphfbr2_82m6, frame 3 



Report for DKFZphfbr2_82m6 . 3 
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[LENGTH] 

[MW] 

[pi] 

[ HOMOL] 

sphingosir 

[FUNCAT] 

4e-20 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



654 
, 69207.45 
6.47 

TREMBL : AF0687 4 9_1 gene: "SPHKlb"; product: 
kinase (SPHKlb) mRNA, complete cds. 2e-50 

01.06.01 lipid, fatty-acid and sterol biosynthesis 



AMIDATION 1 

CAMP_PHOS PHO_S ITE 

MYRISTYL 12 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

GLYCOSAMI NOGL YCAN 

PKC_PHOSPHO_SITE 

ASN_GLYCOSYLATION 

AlphaBeta 

LOW COMPLEXITY 



sphingosine kinase"; Mus musculus 
[S. cerevisiae, YLR260w] 



20.18 % 



SEQ MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPLAASTSLLHGEFGS YPA 

SEG xxxxxxxxxxxxx 

PRD ccchhhhhhhhcccccceeecccccccceeehhhhhccccccceeeceeeeccccccccc 

SEQ RGPRFALTLTSQALHIQRLRPKPEARPRGGLVPLAEVSGCCTLRSRSPSDSAAYFCIYTY 

SEG 

PRD cccceeehhhhhhhhhhhhhccccccccccceeeeeeeceeeeeecccccceeeeeeeec 

SEQ PRGRRGARRRATRTFRADGAATYEENRAEAQRWATALTCLLRGLPLPGDGEITPDLLPRP 

SEG .xxxxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGLSFNLIQTERQNHARELVQGLSLSEWDG 

SEG xxxxxx 

PRD ceeeeeeecccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccce 

SEQ IVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGILPCGSGNALAGAVNQHGGFEPALGLDL 

SEG xxxxx 

PRD eeeecccccceeeccccccccchhhhhccceeeccccccccccccccccccccchhhhhh 

SEQ LLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFLSVAWGFVSDVDIQSERFRALGSARFTL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhccccccccccceeeeeeccccceeeeeeeeccccceeeehhhhhhhhhhhhhhc 

SEQ GTVLGLATLHTYRGRLSYLPATVEPASPTPAHSLPRAKSELTLTPDPAFPMAHSPLHRSV 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SDLPLPLPQPALASPGSPEPLPILSLNGGGPELAGDWGGAGDAPLSPDPLLSSPPGSPKA 

SEG . . xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccce 

SEQ ALHSPVSEGAPVIPPSSGLPLPTPDARVGASTCGPPDHLLPPLGTPLPPDWVTLEGDFVL 

SEG xx xxxxxxxxxxxxxxx 

PRD eeccccccccccccccccccccccccccccccccccccccccccccccccccccccccee 

SEQ MLAISPSHLGADLVAAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQL 

SEG 

PRD eeeeecccccccccccccccccccceeeeeeeccchhhhhhhhhhhhhcccceeecccch 

SEQ GYAAARAFRLEPLTPRGVLTVDGEQVEYGPLQAQMHPGIGTLLTGPPGCPGREP 

SEG \ xxxxxxxxxxxxxxx... 

PRD hhhhhhhhhhccccccceeeeccceeecccccccccccccceeecccccccccc 



Prosite for DKFZphfbr2_82m6 . 3 



PS00001 


303 


->307 


ASN 


GLYCOS YL AT I ON 


PDOC00001 


PS00002 


245 


->249 


GLYCOSAMINOGLYCAN 


PDOCO0002 


PS00004 


129 


->133 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


102- 


->105 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


134 


->137 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


220 


->223 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


347 


->350 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


355' 


->358 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


371 


->374 


PKC~ 


PHOSPHO SITE 


PDOC00005 


PS00005 


477 


->480 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


614 


->617 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00006 


107 


->111 


CK2 


"PHOSPHO SITE 


PDOC00006 
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PS000 06 


142- 


->146 


UKZ rriUtsFrlU 


SITE 


t-> r\r\f n ii r\r\ a 


PS00006 


234- 


->238 


LKz rriUbrrlU 


SITE 


PUUCUUUUb 


PS00006 


236- 


->240 


LK<£ PHOSFriU 


SITE 


n r\r\r" nr\r\nc 
FUUCUUUUb 


PS00006 


341 


->345 


CKZ FnUbfnU 


SITE 


nnnr n n n n £ 

irlAA,UU UU D 


PS00006 


419- 


->423 


CK2 PHOSPHO 


SITE 


r> rM~\f n <"i Pi A <z 


PS00007 


106- 


->115 


TYR PriU5Pn(J 


_SI TE 


PLHJOUUUU / 


PS00008 


56->62 






rUULUUUUo 


PS00008 


212 


->218 


MYRISTYL 




n r^/"\/"* n c\ r\ f\ o 


PS00008 


232- 


->238 


M I Kl IL 




Dnnrfi AnriR 

F UULU UUUO 


PS00008 


272- 


->278 


MYK-L5I Y Li 




PLXX-UUUUo 


FSUUUUO 


277- 


->283 


MVRT CTVT 




p nor n n on p 


PS0000B 


279- 


->285 


MYRISTYL 




PDOC00008 


PS00008 


361- 


->367 


MYRISTYL 




PDOC00008 


PS00008 


476- 


->482 


MYRISTYL 




PDOC00008 


PS00008 


509- 


->515 


MYRISTYL 




PDOC00008 


PS00008 


574- 


->580 


MYRISTYL 




PDOC00008 


PS00008 


590- 


->596 


MYRISTYL 




PDOC00008 


PS00008 


640- 


->646 


MYRISTYL 




PDOC00008 


PS00009 


122 


->126 


AM I DAT I ON 




PDOC00009 



(No Pfara data available for DKFZphfbr2_82m6 . 3) 
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DKFZphf kd2_l j 9 



group: kidney derived 

DKFZphf kd2_l j 9 . 3 encodes a novel 105 amino acid protein with high similarity to Xenopus laevis 
XLCL2 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

strong similarity to XLCL2 protein, African clawed frog 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 2955 bp 

Poly A stretch at pos . 2935, polyadenylation signal at pos . 2915 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



GGGGGGGGCT 
GACGTCGCTA 
GTGTCCGCAC 
ACTCGGGACG 
CCCTAGGACT 
GAGATCGGCC 
AGATCGGCCT 
GACCCGTCAG 
CATAGTGGTG 
GGCCCCCTGA 
GAGACAGCGA 
GGTCATACCA 
GGAAGAAGAG 
TGTTTTGCAC 

AAGGCCCTCT 
GGGGAGCGAG 
TAGAGGCAGT 
ATGTGCAAAG 
AAAAGAAAAA 
CAGGATGCGC 
AAAGTCTTCA 
TTTCAGGTGT 
AGCAATTGGC 
AGGTCACCCC 
TCACATAGCC 
CTCCCTGATA 
GCACTTCCTG 
TCTCTCCCTT 
GCGTGGTCTT 
CCCCAGCCCC 
TCCCGCCAGT 
TCTGTTTTTC 
GTCTTGTTGG 
CCTTGGGATC 
CCCCATGTTG 
CTGCTGTCCT 
TTCCTGCAAG 
TTTCTTATTG 
GCTTCAGTCC 
CCTTCCTTGA 
GGTCTCCAGT 
CTGTATTTAA 
AGGACGAGGG 
AAGCCAATGG 
TGTGACACAC 
GAGAGGCCGG 
GCTTCGTTCT 
CTCGCCTCCC 
GGTGTCTTCC 
GTGATGGGAA 
GTCTGTTTAT 
GTATGACCGT 
GACCCTCAGC 



GAGTGCTCAG 
GCCGTGGGGC 
TCGGCCGCCT 
CAGGGACCGT 
TCATGTCTAT 
AAGATGACTG 
CAGCCCCCGC 
ATTTTGACCG 
GAACCCACAG 
GGAGTTCCTG 
AAGAAAATAA 
GCCAGCATCT 
AGTGAGCCGC 
CTGCAGATCA 
TTTGAAATTT 
GAGAAAGGAA 
TGCTGTTTTI 
GGTGGGATTC 
CAAGCAAGGA 
ATACATGTTT 
TCAGCAGACA 
TCTCTCTCTC 
GTTGGTCTAT 
TGCCTGCAAA 
ATTCTACTCC 
AGTGTGATCG 
TATCCATTTT 
CCCTTGTTGG 
AACTGCTGCC 
CCAGGCTGGC 
AGCAGCCATG 
CTGCCCAGCT 
TTGTGGGAGC 
GGAGGTGACT 
GAGTTTGAGA 
TTCCCGGACA 
TACTCAACAG 
GGCACCAGTT 
TAGCTAAGAG 
GCCGAGAGCA 
GCTAAACTCG 
CAGTCCAGGT 
TACCTCAAGG 
TGCGCTGTGG 
TCGCCGGCAC 
ATCAATGGCA 
CAGCCTGCAC 
GTTTTGGCTG 
CTTGTGGACG 
CTTATCTGCT 
CATGCTTGTA 
GGGGGTGGTT 
TTGCAGTCTG 
GGTGGAGCCT 



TGGAGAGCGG 
TGTCCTGGGA 
GCCGTGCCCG 
TTTTAAATCA 
ATATTTCCCC 
AGGTGATGAT 
AAGGATGGCC 
CCGCTGCAAA 
AAGGGGAGGT 
GTCCAGGAGG 
AGAGCAGTAG 
GTTCCTGAAC 
AATTGTTCTG 
CCGAGTTGGT 
GCCGAGCAGT 
GCTGCTTAGA 
GAGATCATTA 
CAATGGGTCT 
ACATTTGGGG 
TTAAGAAAAC 
TTCACTCTGG 
CAGTTTCACC 
ATGACAGGGA 
GCCAGCTGGA 
A'l'GGCCTCTC 
GTTTTTAAGA 
GCTTCCCAGC 
AAGCCCCAGG 
CTTCCAAGAC 
CCTTCCTTCT 
GGTACATGGG 
GCAGTACTCA 
CTAGTGGAGA 
TGCATGGTGG 
CTAAAGGATG 
AGGGCAGAAG 
TGGTCCTCAT 
GTATGAGAAA 
CCATCTGAAG 
GTACCGTGTG 
GCCAACCAAG 
TGACCCTCAG 
TCATTGTGGC 
ACACAGCAGT 
CCCTTGCTTC 
ATAACTTCTT 
CGAGAGGGGC 
CAGAGAGTGG 
GGGGTCTTGC 
ACCCTGAATC 
AACTGCGTAA 
TATTATTTTT 
AGCAGGCCAG 
GCTGGGGGGA 



GGAGTTGTGT 
AGGCGGACGG 
TCTGCGCCCG 
CAGGGGCGTG 
ATTCACTGCC 
GAACACCCAG 
TTTCCTACCA 
CTGAAGGACC 
GGAGAGCGGG 
ATGAGCAAGA 
AGTCCCTGTG 
TGTGTTTTTC 
AAAATGTCAA 
TTTCTTTTCT 
GGAGCCCTCT 
GCCAGGGGGT 
TCTGAACTCA 
TGGTGGGTGG 
TAAGAAAACA 
ATTGAGCAGA 
CCGCTGGGAC 
CACCCCACCC 
GGAGAGTAAA 
GGTGAAGTGC 
TGCTCCCAGC 
GGCAGTGCTT 
ACTTTTTAGG 
GTGGACACTC 
TTGCTCCCGA 
CCTCACCGCC 
TCCCCAGCTC 
CGCCCCATGG 
GCAGACGTGG 
GGACAAGGCT 
TCATGAGATC 
GGAGGCATGG 
CCCTCCCCAC 
GTTGGCCTTT 
CAGCAGGTTG 
GCCAAGAGGT 
GCACGCAGCA 
TTCTGGACGT 
TCTGGGGATG 
CCGCGGAATT 
CTCCCTCTGT 
CCAACTCCTC 
TTTCCTCTCT 
TTCATCCATA 
CTTTTCAATT 
ACCTGTCCTG 
CAAATCTACT 
GCTGGTCCCT 
GGGCTGACAG 
CCCAGCTGCT 



CCACCTTGCC 
CGAGCGCCCG 
TGTCATCCTC 
TGTCAGCCTG 
CCGACTATCT 
CCCATGGAGG 
GATCTTCCCA 
GTCTGCCCTC 
GAGCTCCGGT 
TAACTGCGAA 
GACTCCCATG 
CCATCATGAC 
ACGAGGCTTC 
TTTCTTGCCT 
GACAATTTGC 
TAGTGGGTGA 
GGCAGCCTAG 
GAGGTGGGGC 
AACATGAGGC 
GAACTGCAGC 
ATCAGAAAAC 
TTTGCTTTCA 
GGAGAGCAGG 
AGGAAAGGAA 
TGTGGTAGGC 
TTCAGCTTTT 
AGTAGTGAGA 
AGCACGAAGG 
GATGGAGTGG 
ACCTTCCCTG 
ACCTATGGAT 
GGGATCTTGG 
CTTTTTATGT 
GTCGTGGCAA 
CCTGGCTTCT 
CAAGGGACCT 
CTCCCACTGC 
GGACTTAGGA 
CAGGACAAAT 
GGACTCAGAG 
TGTCCCCTCA 
GTGTATATAG 
CCAGGGCAGG 
CCGTTCTGGG 
TGTCTGCCTG 
GCAGAAGTGG 
CTTGCTCCCC 
CTCTCATTCC 
CCTGTGTTTT 
GTCTTGCTGT 
TTGTGTATGT 
AGACCACTTT 
CTAATGTCAG 
CTTGGACAAG 
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2701 TGGCTGAGCT CCTATCTGGC CTCCTCTTTT TTTTTTT TTT CAAGTAATTT 
2751 GTGTGTATTT CTAACTGATT GTATTGAAAA AATTCCTAGT ATTTCAGTAA 
2801 AAATGCCTGT TGTGAGATGA ACCTCCTGTA ACTTCTATCT GTTCTTTTTT 
2851 GAGGCTCAGG GAGAAACTAG CATTTTTTTT TTTCCAAACT ACTTTTTGTC 
2901 ACTGTGACAG TTGTAAATAA AGTTTGAAAA TGCTCAAAAA AAAAAAAAAA 
2951 AAAAC 



BLAST Results 



Entry HSG19750 from database EMBL: 
human STS A001X2 4. 
Score = 1050, P = 1.9e-39, identities = 212/213 

Entry HSG20267 from database EMBL: 
human STS A005C12. 
Score = 610, P = 4.1e-19, identities = 122/122 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 213 bp to 527 bp; peptide length: 105 
Category: strong similarity to known protein 
Classification: unset 



1 MSIYFPIHCP DYLRSAKMTE VMMNTQPMEE IGLSPRKDGL SYQIFPDPSD 
51 FDRRCKLKDR LPSIVVEPTE GEVESGELRW PPEEFLVQED EQDNCEETAK 
101 ENKEQ . 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_lj9, frame 3 

PIR:S52241 XLCL2 protein - African clawed frog, N = 1, Score = 443, P = 
8e-42 

PIR:S52241 XLCL2 protein - African clawed frog, N = 1, Score = 443, P = 
8.2e-42 



>PIR:S52241 XLCL2 protein - African clawed frog 
Length = 102 

HSPs : 

Score = 443 (66.5 bits), Expect = 8.0e-42, P = 8.0e-42 
Identities = 80/104 (76%), Positives = 95/104 (91%) 

Query: 1 MSIYFPIHCPDYLRSAKMThJVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 60 

MS+++PIHC DYLRSA+MTEV+MNTQ M+EIGLSPRKD SYOIFPDPSDF+R CKLKDR 
Sbjct: 1 MSVFYPIHCTDYLRSAEMTEVIMNTQSMDEIGLSPRKD — SYQI FPDPSDFERCCKLKDR 5 8 

Query: 61 LPSIVVEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKE 104 

LPSIVVEPTEG+VESGELRWPPEEF+V ED++ C++T KEN++ 
SbjCt: 59 LPSIVVEPTEGDVESGELRWPPEEFVVDEDKEGTCDQTKKENEQ 102 



Pedant information for DKFZphf kd2_lj 9, frame 3 



Report for DKFZphf kd2_l j 9 . 3 



[LENGTH] 105 

[MW] 12269.78 

[pi] 4.40 

[HOMOL] PIR:S52241 XLCL2 protein - African clawed frog 5e-44 
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[KW] Alpha_Beta 

SEQ MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 
PRD cccccccccccchhhhhhhhhhhhcccccccccccccccceeeecccccccchhhhhhhc 

SEQ LPSIVVEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKEQ 
PRD ccceeeecccccccccccccccccceeeccccchhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_l j 9 . 3) 
(No Pfam data available for DKFZphf kd2_lj9 . 3) 
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DKFZphf kd2_24al5 



group: transmembrane protein 

DKFZphf kd2_24al5 encodes a novel amino acid protein with similarity to C. elegans cosmid 
R07G3. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 



similarity to C. elegans R07G3.8 
membrane regions : 1 

Summary DKFZphf kd2_24al5 encodes a novel 323 amino acid protein, with 
similarity to C. elegans R07G3.8. 



similarity to C. elegans R07G3.8 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1513 bp 

Poly A stretch at pos . 1494, no polyadenylation signal found 



1 GGGGTACTCG GCGGCGGCGG AGCGGGCGGC AGAGCAGGGC GGCGGCGACT 
51 CGCAGGGTAC CACCATCTTA AGGACAGAAA AGCTACAGGA CTCTAGGAGG 
101 CCACCGTCCT GATTTGGGAA GTCCAACTTA CTTTGGCCAG ACAGCAGCTA 
151 AGCTGGTTCA TCCCATCAGC CTGGATTGGT GAAACTGAAT CACAGGAGAT 
201 ATTTCCAGGT TTGCTGGGAT GGGAAACCTG CTCAAAGTCC TTACCAGGGA 
251 AATTGAAAAC TATCCACACT TTTTCCTGGA TTTTGAAAAT GCTCAGCCTA 
301 CAGAAGGAGA GAGAGAAATC TGGAACCAGA TCAGCGCCGT CCTTCAGGAT 
351 TCTGAGAGCA TCCTTGCAGA CCTGCAGGCT TACAAAGGCG CAGGCCCAGA 
401 GATCCGAGAT GCAATTCAAA ATCCCAATGA CATTCAGCTT CAAGAAAAAG 
4 51 CTTGGAATGC GGTGTGCCCT CTTGTTGTGA GGCTAAAGAG ATTTTACGAG 
501 TTTTCCATTA GACTAGAAAA AGCTCTTCAG AGTTTATTGG AATCTCTGAC 
551 TTGTCCACCC TACACACCAA CCCAACACCT GGAAAGGGAA CAGGCCCTGG 
601 CAAAGGAGTT TGCCGAAATT TTACATTTTA CCCTTCGATT CGATGAGCTG 
651 AAGATGAGGA ACCCGGCTAT TCAGAATGAC TTCAGCTACT ACAGAAGAAC 
7 01 AATCAGTCGC AACCGCATCA ACAACATGCA CCTAGACATT GAGAATGAAG 
751 TCAATAATGA GATGGCCAAT CGAATGTCCC TCTTCTATGC AGAAGCCACG 
801 CCAATGCTGA AAACCCTTAG CAATGCCACA ATGCACTTTG TCTCTGAAAA 
851 CAAAACTCTG CCAATAGAGA ACACCACAGA CTGCCTCAGC ACAATGACAA 
901 GTGTCTGTAA AGTCATGCTG GAAACTCCGG AGTACAGAAG TAGGTTTACG 
951 AGTGAAGAGA CCCTGATGTT CTGCATGAGG GTGATGGTGG GAGTCATCAT 
1001 CCTCTATGAC CATGTCCACC CTGTGGGAGC TTTCTGCAAG ACATCCAAGA 
1051 TCGATATGAA AGGCTGCATA AAAGTTTTGA AGGAGCAGGC CCCAGACAGT 
1101 GTGGAGGGGC TGCTAAATGC CCTCAGGTTC ACTACAAAGC ACTTGAACGA 
1151 TGAATCAACT TCCAAACAGA TTCGAGCAAT GCTTCAGTAG AGCTCTGCTC 
1201 AAAGAAGAGG ATCTATGTGC TGACCTCAGA AGATGTATAT GTTTACATAA 
1251 TTTAATACAG ATTGATGTTA ATACTTGTGT ATTTACATAA CCGTTTCCTT 
1301 CTTGTCACTG AAATATATGG ACCTTAATTT GTATCCTGAC TGACTCAACC 
1351 CAGCAGAGCA TAAATTGACT TGAGAGCCTT ACCTTTGATG TCTGAAATGA 
1401 AACCCCCTTC TCCAAAGGCA AAATTCGGAG ACTTTGATCT TTGCTACTGG 
14 51 AGTCCTTTAA CAACATCTAT AACGATAAAA AATTCCTAAT TGTCAAAAAA 
1501 AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 219 bp to 1187 bp; peptide length: 323 
Category: similarity to unknown protein 



1 MGNLLKVLTR EIENYPHFFL DFENAQPTEG EREIWNQISA VLQDSESILA 

51 DLQAYKGAGP EIRDAIQNPN DIQLQEKAWN AVCPLVVRLK RFYEFSIRLE 

101 KALQSLLESL TCPPYTPTQH LEREQALAKE FAEILHFTLR FDELKMRNPA 

151 IQNDFSYYRR TISRNRINNM HLDIENEVNN EMANRMSLFY AEATPMLKTL 

201 SNATMHFVSE NKTLPIENTT DCLSTMTSVC KVMLETPEYR SRFTSEETLM 

251 FCMRVMVGVI ILYDHVHPVG AFCKTSKIDM KGCIKVLKEQ APDSVEGLLN 

301 ALRFTTKHLN DESTSKQIRA MLQ 

BLASTP hits 

Entry CER07G3_7 from database TREMBL : 

gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 

Score = 544, P = 1.4e-52, identities = 119/323, positives = 186/323 



Alert BLASTP hits for DKFZphf kd2_24al5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24al5, frame 3 



Report for DKFZphf kd2_24al5 . 3 



[LENGTH] 323 

[MW] 37313.06 

[pi] 5.71 

[HOMOL] TREMBL :CER07G3_7 gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 4e-54 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] TYR_PHOSPHG_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOS YLATION 3 

[KW] TRANSMEMBRANE 1 



SEQ MGNLLKVLTREIENYPHFFLDFENAQPTEGEREIWNQISAVLQDS SSI LA DLQAYKGAGP 

PRD ccccchhhhhhhhcccceeecccccccchhhhhhhhhhhhhhhcchhhhhhhhhhccccc 

MEM 

SEQ EIRDAIQNPNDIQLQEKAWNAVCPLVVRLKRFYEFSIRLEKALQSLLESLTCPPYTPTQH 

PRD hhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhh 

MEM 

SEQ LEREQALAKEFAEILHFTLRFDELKMRNPAIQNDFSYYRRTISRNRINNMHLDIENEVNN 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhccchhhhhhhhhhhhhhhh 

MEM 

SEQ EMANRMSLFYAEATPMLKTLSNATMHFVSENKTLPIENTT DCLSTMTSVC KVMLETPEYR 

PRD hhhhhhhhhhhhccchhhhhhhhceeecccccccccccccceeeeehhhhhhhhcccccc 

MEM 

SEQ SRFTSEETLM FCMRVMVGVI I LYDHVHPVGAFCKTSKIDMKGCIKVLKEQAPDSVEGLLN 

PRD cccccchhhhhhhhhhhheeeeeeeccccccccccccccchhhhhhhhhccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

SEQ ALRFTTKHLNDESTSKQIRAMLQ 

PRD hhhhhhcccccccchhhhhhccc 

MEM 



Prosite for DKFZphf kd2_24al5 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 



202->206 
211->215 
218->222 
96->99 
138->141 
275->278 
305->308 



AS N_GLYCOS YLATION 
ASN_GL YCOS YLATION 
AS N_GLYC0S YLATION 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 



314->317 
28->32 
105->109 
244->248 
276->280 
231->240 
297->303 



PKCEHOSPHOSITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOS PHO_S ITE 
TYR_PHOSPHO_SITE 
MYRISTYL 



PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphf kd2_24al5 . 3) 
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DKFZphfkd2_24bl5 



group: metabolism 

DKFZphf kd2_24bl5 encodes a novel 612 amino acid protein with similarity to bacterial and yeast 
phosphoglucomutase and phosphomannomutases. 

The novel protein contains a phosphoserine signature typical for phosphoglucomutase (EC 
5.4.2.2) or phosphomannomutase (EC 5.4.2.8). Thus, the protein seems to be taking part in the 
conversion of hexose phosphates. 

The new protein can find application in modulation of hexose metabolism pathways and as a new 
enzyme for biotechnologic production processes. 



similarity to phosphomannomutases 
complete cDNA, complete cds, EST hits 

potential start at bp 30 matches kozak consensus PyCNatgG, 
Sequenced by GBF 

Locus: map="158.8 cR from top of Chr4 linkage group" 
Insert length: 2204 bp 

Poly A stretch at pos . 2186, no polyadenylation signal found 



1 GGGCTCTGCA GCGGTAGCAC AAGCTCAGCG ATGGCGGCTC CAGAAGGCAG 
51 CGGTCTAGGC GAGGACGCCC GGCTGGACCA GGAGACCGCC CAGTGGCTGC 
101 GCTGGGACAA GAATTCCTTA ACTTTGGAGG CAGTGAAACG ACTAATAGCA 
151 GAAGGTAATA AAGAAGAACT ACGAAAATGT TTTGGGGCCC GAATGGAGTT 
201 TGGGACAGCT GGCCTCCGAG CTGCTATGGG ACCTGGAATT TCTCGTATGA 
251 ATGACTTGAC CATCATCCAG ACTACACAGG GATTTTGCAG ATACCTGGAA 
301 AAACAATTCA GTGACTTAAA GCAGAAAGGC ATCGTGATCA GTTTTGACGC 
351 CCGAGCTCAT CCATCCAGTG GGGGTAGCAG CAGAAGGTTT GCCCGACTTG 
401 CTGCAACCAC ATTTATCAGT CAGGGGATTC CTGTGTACCT CTTTTCTGAT 
451 ATAACGCCAA CCCCCTTTGT GCCCTTCACA GTATCACATT TGAAACTTTG 
501 TGCTGGAATC ATGATAACTG CATCTCACAA TCCAAAGCAG GATAATGGTT 
551 ATAAGGTCTA TTGGGATAAT GGAGCTCAGA TCATTTCTCC TCACGATAAA 
601 GGGATTTCTC AAGCTATTGA AGAAAATCTA GAACCGTGGC CTCAAGCTTG 
651 GGACGATTCT TTAATTGATA GCAGTCCACT TCTCCACAAT CCGAGTGCTT 
701 CCATCAATAA TGACTACTTT GAAGACCTTA AAAAGTACTG TTTCCACAGG 
751 AGCGTGAACA GGGAGACAAA GGTGAAGTTT GTGCACACCT CTGTCCATGG 
801 GGTGGGTCAT AGCTTTGTGC AGTCAGCTTT CAAGGCTTTT GACCTTGTTC 
851 CTCCTGAGGC TGTTCCTGAA CAGAGAGATC CGGATCCTGA GTTTCCAACA 
901 GTGAAATACC CGAATCCCGA AGAGGGGAAA GGTGTCTTGA CTTTGTCTTT 
951 TGCTTTGGCT GACAAAACCA AGGCCAGAAT TGTTTTAGCT AACGACCCGG 
1001 ATGCTGATAG ACTTGCTGTG GCAGAAAAGC AAGACAGTGG TGAATGGAGG 
1051 GTGTTTTCAG GCAATGAGTT GGGGGCCCTC CTGGGCTGGT GGCTTTTTAC 
1101 ATCTTGGAAA GAGAAGAACC AGGATCGCAG TGCTCTCAAA GACACGTACA 
1151 TGTTGTCCAG CACCGTCTCC TCCAAAATCT TGCGGGCCAT TGCCTTAAAG 
1201 GAAGGTTTTC ATTTTGAGGA AACATTAACT GGCTTTAAGT GGATGGGAAA 
1251 CAGAGCCAAA CAGCTAATAG ACCAGGGGAA AACTGTTTTA TTTGCATTTG 
1301 AAGAAGCTAT TGGATACATG TGCTGCCCTT TTGTTCTGGA CAAAGATGGA 
1351 GTCAGTGCCG CTGTCATAAG TGCAGAGTTG GCTAGCTTCC TAGCAACCAA 
1401 GAATTTGTCT TTGTCTCAGC AACTAAAGGC CATTTATGTG GAGTATGGCT 
1451 ACCATATTAC TAAAGCTTCC TATTTTATCT GCCATGATCA AGAAACCATT 
1501 AAGAAATTAT TTGAAAACCT CAGAAACTAC GATGGAAAAA ATAATTATCC 
1551 AAAAGCTTGT GGCAAATTTG AAATTTCTGC CATTAGGGAC CTTACAACTG 
1601 GCTATGATGA TAGCCAACCT GAT A AAAAAG CTGTTCTTCC CACTAGTAAA 
1651 AGCAGCCAAA TGATCACCTT CACCTTTGCT AATGGAGGCG TGGCCACCAT 
1701 GCGCACCAGT GGGACAGAGC CCAAAATCAA GTACTATGCA GAGCTGTGTG 
1751 CCCCACCTGG GAACAGTGAT CCTGAGCAGC TGAAGAAGGA ACTGAATGAA 
1801 CTGGTCAGTG CTATTGAAGA ACATTTTTTC CAGCCACAGA AGTACAATCT 
1851 GCAGCCAAAA GCAGACTAAA ATAGTCCAGC CTTGGGTATA CTTGCATTTA 
1901 CCTACAATTA AGCTGGGTTT AACTTGTTAA GCAATATTTT TAAGGGCCAA 
1951 ATGATTCAAA ACATCACAGG TATTTATGTG TTTTACAAAG ACCTACATTC 
2001 CTCATTGTTT CATGTTTGAC CTTTAAGGTG AAAAAAGAAA ATGGCCAAAC 
2 051 CCAACAAACT AACATTCCTA CTAAAAAGTT GAGCTTGGAC ATATTTTGAA 
2101 TTTTTGTAAG TGAAGATTTT TAAACTGACT AACTTAAAAA AATAGATTGT 
2151 AATTGATGTG CCTTAATTTG CATAAATCAT AAATGTAAAA AAAAAAAAAA 
2201 AAAA 



BLAST Results 



Entry HS705145 from database EMBL: 
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human STS wi-6820. 
Score = 1261, P = 



3.6e-52, identities = 253/254 



No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 31 bp to 1866 bp; peptide length: 612 
Category: strong similarity to known protein 



1 MAAPEGSGLG 
51 FGARMEFGTA 
101 IVISFDARAH 
151 VSHLKLCAGI 
201 EPWPQAWDDS 
2 51 VHTSVHGVGH 
301 GVLTLSFALA 
351 LGWWLFTSWK 
4 01 GFKWMGNRAK 
4 51 AS FLATKNLS 
501 DGKNNYPKAC 
551 NGGVATMRTS 
601 QPQKYNLQPK 



EDARLDQETA 
GLRAAMGPGI 
PSSGGSSRRF 
MITASHNPKQ 
LIDSSPLLHN 
SFVQSAFKAF 
DKTKARIVLA 
EKNQDRSALK 
QLIDQGKTVL 
LSQQLKAIYV 
GKFEISAIRD 
GTEPKIKYYA 
AD 



QWLRWDKNSL 
SRMNDLTIIQ 
ARLAATTFIS 
DNGYKVYWDN 
PSAS INNDYF 
DLVPPEAVPE 
NDPDADRLAV 
DTYMLSSTVS 
FAFEEAIGYM 
EYGYHITKAS 
LTTGYDDSQP 
ELCAPPGNSD 



TLEAVKRLIA 
TTQGFCRYLE 
QGIPVYLFSD 
GAQIISPHDK 
EDLKKYCFHR 
QRDPDPEFPT 
AEKQDSGEWR 
SKILRAIALK 
CCPFVLDKDG 
YFICHDQETI 
DKKAVLPTSK 
PEQLKKELNE 



EGNKEELRKC 
KQFSDLKQKG 
ITPTPFVPFT 
GISQAIEENL 
SVNRETKVKF 
VKYPNPEEGK 
VFSGNELGAL 
EGFHFEETLT 
VSAAVISAEL 
KKLFENLRNY 
SSQMITFTFA 
LVSAIEEHFF 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24bl5, frame 1 

TREMBL : CEY43F4B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid 
Y43F4B, N = 1, Score = 1431, P = 1.6e-146 

TREMBL:SPCC1840_5 gene: "SPCC1840 . 05c"; product: "similarity to 
phosphomannomutases"; S.pombe chromosome III cosmid cl840., N = 1, 
Score = 1210, P = 4.2e-123 

PIR:S54585 hypothetical protein YMR278w - yeast (Saccharomyces 
cerevisiae), N = 1 , Score = 1046, P = le-105 

PIR:A71299 probable phosphomannomutase (manB) - syphilis spirochete, N 
= 1, Score = 697, P = 9.7e-69 

>TREMBL:CEY43F4B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid Y43F4B 
Length = 595 

HSPs : 

Score = 1431 (214.7 bits), Expect = 1.6e-146, P = 1.6e-146 
Identities = 285/598 (47%), Positives = 393/598 (65%) 



Query: 


13 


Sbjct: 


6 


Query: 


73 


Sbjct: 


66 


Query: 


133 


Sbjct: 


119 


Query: 


193 


Sbjct: 


179 


Query: 


253 


Sbjct: 


238 



A+LD++ A WL WDKN 



+NDLTIIQ T GF R++ 



IPVYLFS+++PTP V + 



+++L+ E N + L+ 



K G+ I FD R + 



R+ FGTAG+R+ M 



SRRFA L+A F+ 
-SRRFAELSANVFVRNN 118 



AG++ITASHNPK+DNGYK YW NGAQII PHD I 



+P + WD S + SSPL H+ 



1+ YFE K F R +N T +KF + 



++ HG+G+ + + 



+V EQ+DP+P+FPT+ +PNPEEG+ VLTL+ 
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Query : 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query : 
Sbjct: 
Query : 
Sbjct: 



311 DKTKARIVLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWKEKNQDRSALK 370 

DK + ++LANDPDADR+ +AEKQ GEWRVF+GNE+GAL+ WW++T+W++ N + A K 
298 DKNGSTVILANDPDADRIQMAEKQKDGEWRVFTGNEMGALITWWIWTNWRKANPNADASK 357 

371 DTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVLFAFEEAIGYM 430 

Y+L+S VSS + I++ IA EGF E TLTGFKWMGNRA++L G V+ A+EE+IGYM 
358 - VYILNS AVSSQI VKT I ADAEGFKNETTLTGFKWMGNRAEELRADGNQVI LAWEES IGYM 416 

431 CCP-FVLDKDGVSAAVISAELAS FLATKNLSLSQQLKAIYVEYGYHITKAS YFICHDQET 48 9 

P +DKDGVSAA + AE+A+FL + SL QL A+Y YG+H+ +++Y++ E 
417 — PGHTMDKDGVSAAAVFAEI AAFLHAEGKSLQDQLYALYNRYGFHLVRST YWMVPAPEV 47 4 

490 IKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSKSSQMITFTF 54 9 

KKLF LR D K +P G+ E++++RDLT GYD+S+PD K VLP S SS+M+TF 
47 5 TKKLFSTLRA-DLK — FPTKIGEAEVASVRDLTIGYDNSKPDNKPVLPLSTSSEMVTFFL 531 

550 ANGGVATMRTSGTEPKIKYYAELCAPPGNS — DPEQLKKELNELVSAIEEHFFQPQKYNL 607 

G V T+R SGTEPKIKYY EL PG + D E + E+++L + +PQ++ L 

532 KTGSVTTLRASGTEPKIKYYIELITAPGKTQNDLESVI SEMDQLEKDVVATLLRPQQFGL 591 

608 QPK 610 
P+ 

592 IPR 594 



Pedant information for DKFZphf kd2_24bl5, frame 1 



Report for DKFZphf kd2_24bl5 . 1 



[LENGTH] 

[MW] 

[pi] 

[ HOMOL ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 



612 

68311.58 
6.28 

TREMBL:CEY4 3F4B_5 gene: 



'Y43F4B. 5"; Caenorhabditis elegans cosmid Y43F4B le-157 



01.05.01 carbohydrate utilization [S. cerevisiae, YMR278w] le-111 

g carbohydrate metabolism and transport [H. influenzae, HI0740] 3e-66 

c energy conversion [M. genitalium, MG053] 4e-50 

m outer membrane and cell wall [H. influenzae, HI1463] 2e-04 

BL00607D cAMP phosphodiesterases class-II proteins 

BL00710 Phosphoglucomutase and phosphomannomutase phosphoserine signa 
5.4.2.8 Phosphomannomutase 3e-56 

5.4.2.2 Phosphoglucomutase le-09 
isomerase 3e-56 

intramolecular transferase 3e-56 

Methanobacterium thermoautotrophicum phosphomannomutase le-06 
probable phosphorylating protein ureC 9e-06 
PGM_PMM 1 

MYRISTYL 10 
LIPOCALIN 2 
CK2_PHOSPHO_SITE 9 
GLYCOS AMINOGLYCAN 1 
PKC_PHOSPHO_SITE 8 
ASN_GL YCOS YLAT I ON 1 

Phosphoglucomutase and phosphomannomutase phosphoserine 
Alpha_Beta 



SEQ MAAPEGSGLGEDARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKCFGARMEFGTA 

PRD ccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhcchhhhhhhhhhhhccccc 

SEQ GLRAAMGPGISRMNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRF 

PRD cccccccccccccceeeeeehhhhhhhhhhhhcccccceeeeeecccccccccccchhhh 

SEQ ARLAATTFISQGI PVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDN 

PRD hhhhhhhhhhccceeeeeccccccccchhhhhhhcccceeeeeeccccccccceeeeecc 

SEQ GAQIISPHDKGISQAIEENLEPWPQAWDDSLIDSSPLLHNPSASINNDYFEDLKKYCFHR 

PRD ccccccccchhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhcc 

SEQ SVNRETKVKFVHTSVHGVGHSFVQSAFKAFDLVPPEAVPEQRDPDPEFPTVKYPNPEEGK 

PRD ccccccceeeeeeeccccccchhhhhhhhhcccccccccccccccccccccccccccchh 

SEQ GVLTLSFALADKTKARIVLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWK 

PRD hhhhhhhhhhhhhcceeeeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhh 

SEQ EKNQDRSALKDTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVL 

PRD hcccccccccceeeeeeeehhhhhhhhhhhcccceeeeeccccchhhhhhhhhhccceee 
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SEQ FAFEEAIGYMCCPFVLDKDGVSAAVI SAELASFLATKNLSLSQQLKAI YVEYGYHITKAS 

PRD hhhhhccccccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccccc 

SEQ YFICHDQETIKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSK 

PRD eeeccchhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccccccc 

SEQ SSQMITFTFANGGVATMRTSGTEPKIKYYAELCAPPGNSDPEQLKKELNELVSAIEEHFF 

PRD ccceeeeeecccceeeeecccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhh 

SEQ QPQKYNLQPKAD 

PRD cccccccccccc 
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PSUUUU1 


4 58- 


>4 62 


ASN GLYCOSYLATION 


PDUCUUUU1 


PS00002 


■ 


->11 


GLYCOSAMINOGLYCAN 


PDOCUUUU2 


PS00005 


116- 


>119 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


117- 


•>120 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


290- 


■>293 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


358- 


•>361 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


380- 


■>383 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


489- 


■>492 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


538- 


->541 


PKC PHOSPHO' 


"site 


PDOC00005 


PS00005 


556- 


>559 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00006 


186- 


■>190 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


210- 


>214 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


343- 


•>347 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


358- 


>362 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00006 


523- 


>527 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


528- 


>532 


CK2 PHOSPHO 


"site 


PDOC000D6 


PS00006 


560- 


>564 


CK2 PHOSPHO 


SITE 


PDOC0C036 


PS00006 


579- 


>583 


CK2 PHOSPHO 


"she 


PDOC00036 


PS00006 


593- 


>597 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


6-M2 


MYRISTYL 




PDOC00008 


PS00008 


61 


->67 


MYRISTYL 




PDOC00008 


PS00008 


100- 


>106 


MYRISTYL 




PDOC00008 


PS00008 


159- 


>165 


MYRISTYL 




PDOC00008 


PS00008 


191- 


>197 


MYRISTYL 




PDOC00008 


PS00008 


257- 


>263 


MYRISTYL 




PDOC00008 


PS00008 


344- 


>350 


MYRISTYL 




PDOC00008 


PS00008 


348- 


>354 


MYRISTYL 




PDOC00008 


PS00008 


440- 


>446 


MYRISTYL 




PDOC00008 


PS00008 


552- 


>558 


MYRISTYL 




PDOC00008 


PS00710 


159- 


■>174 


PGM PMM 




PDOC00589 


PS00213 


346- 


■>358 


LIPOCALIN 




PDOC00187 


PS00213 


344- 


>358 


LIPOCALIN 




PDOC00187 



Pram for DKFZphf kd2_24bl5 . 1 



HMM_NAME Phosphoglucomutase and phosphomannomutase phosphoserine 

HMM *GvnVIdIGQNGMMPTPMIYFaIRTYKhmcmggGIMITaSHNPGGPDnDN 
G+ V + ++PTP + F + H+++ +GIMITASHNP DN 

Query 132 GI PVYLFS — DITPTPFVPFTVS HLKLCAGIMITASHNP— KQ-DN 172 



HMM GIK* 
G+K 

Query 173 GYK 175 
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PCT/IB00/01496 



group: kidney derived 

DKFZphf kd2_24e23 encodes a novel 198 amino acid protein without similarity to 
known proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of 
kidney-specific genes. 



unknown 

complete cDNA, complete cds, 1 EST hit, 
many ATGs in front of the ORF 

Sequenced by GBF 

Locus : unknown 

Insert length: 1723 bp 

Poly A stretch at pos . 1695, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 



GGGGGATTTT 
CGTGTCCAGT 
ATGCAACACA 
TGCACTTGTC 
ATGGAGGGGT 
CTGTAGGGTG 
GGCAGATACC 
GCACAGACCT 
CCATTCACTG 
GGAACCCGAT 
GCAGGAAGTG 
GACTACCATC 
CCACGTTGGA 
CAGTGTCTTT 
TTGTTGAAAG 
CGGTCACAGC 
AACTGTCCCC 
GCCTGCGCTT 
TGGGGTGCTG 
GTCCTGTTTT 
TGCGGGGCTA 
AACTGGGAAG 
TTTGTGCAGG 
TGACCCCTCA 
CTGCACGGTC 
GAGTAGTTTG 
TCAGGAGTGC 
AATTCCAGTC 
TTTATGCATT 
TGTTTTGAGA 
TTTCCTCTCA 
TGAGCAAGGC 
TAATCCCAGC 
AATTTGAGAC 
T AC A A AAAAA 



CGATCATGAC 
GAGTGTTGAT 
GCATT ATTTC 
AAGGGCTCTT 
ATTTTCAGCA 
GTCTTGCACC 
CAGTGCTGCC 
TGCACTAGGA 
GGTTTGGCAA 
AATGGTGACG 
GGCAGCTCAC 
CTGGAGGTGG 
CACACCGGGG 
AACAAGCCAG 
CTTGGGGCTT 
CTGTGCTGTC 
CTTTCGTGCT 
CCTGCAACTG 
GGCGAGGCCA 
CCCACCTACC 
ACGCCATCCC 
GGGCCTTGAG 
GAGCTCCTCT 
AAGCAGAGCC 
CTCTTCTCCT 
GGGCCTGGTA 
AAGAACCCCG 
TGAGGTGAAT 
TCCCCTGCAG 
CTCTCTTGAG 
GCATCAGAGA 
CAGGCACACT 
ACTTTGGGAG 
CAACCTGGCC 
AAAAAAAAAA 



AACGATAGCA 
TGTGTGTGGT 
ACCGCCTTTA 
TGGCTGAAGA 
GATATGCCCA 
CTGCTCACTG 
CGCCACCATG 
ATGGGCTGGG 
GTGTGCTGGG 
ACCGAGGTAG 
GGGACTATGG 
GAGCGCATGC 
TCTTCTTCTT 
AGTCTGATCC 
CCTCCTCCTC 
TGCTGTACAC 
CTGTGCTTAG 
TTTAGCAAGC 
ATCGCTCCTA 
CCTGTAACGC 
ACAAGGGCTG 
GACCTGTGTC 
CCCATCTTTG 
AGTAGTGATC 
CTCCGCACAT 
AACAGAGGGA 
CGTACTCTGT 
TCTTAGAGAG 
CTGTGACTAA 
ATTTTTCTGG 
AGGCAGAAAG 
TGTGCTACTG 
GCCGAGGCGG 
AACATGTTGA 
AAA 



ATTGATATAC 
TTCTCTAGGA 
CCCCAGCTTC 
GAAGTTAGAA 
CCGCCATGGT 
CTGGCATCAC 
TGAATTCATC 
ACGCCACCCT 
ATCTGGAATC 
CAGGCGAACC 
CTGCACCGCG 
TCATCTGTAA 
TGTTGACCAG 
CACCGCTCAT 
TGTGCGCAGC 
CGACTTGGTA 
GGCCCTCTGA 
ACCTATTATC 
TTACTTTCTG 
CTCTGCTCTG 
GGCTGTCCGT 
CAGGCAGGGT 
TGTCCTGACA 
AGTATCCTGC 
CTGCATGCCT 
AGTTGGCTGG 
CCCACGTGGA 
TGCTTTCATT 
TTGTGGAACA 
CAGTGTAAGG 
CAAGAGAAAG 
CAGTTGGCAA 
GTGGATCACC 
AACCTCGTCT 



CTTCAAAATA 
GACCGTGTTC 
TTCATACACA 
GTTTCCAGAT 
TTTGTCAGCT 
CTGAGCCTAT 
AGCTCTGCAG 
CTGCCTCTTA 
ACATGGATGA 
ACTGGCCAGG 
GGTTCATACC 
AAGTCCGGTC 
GATCCTCTGG 
AAAGCCAGGG 
CCTCAGCAAA 
TCATCCCATG 
TGCCCCATCT 
TATAGGGTGC 
CCCTGGGGAC 
CCTTCCCATC 
TCAGAAGAGA 
GGACAAGGGC 
GCCGTGACCG 
TGCTTCAAGC 
GTCAAACCCA 
AGGAGGCCAG 
TAAAGTCTCT 
TAATGTTTGC 
GCATACATTT 
TCTACACCAT 
GAATGCAATG 
GAATGGAGTC 
TGAGGTCAGG 
GTACTAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 
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ORF from 299 bp to 892 bp; peptide length: 198 
Category: putative protein 



1 MADTQCCPPP CEFISSAGTD LALGMGWDAT LCLLPFTGFG KCAGIWNHMD 

51 EEPDNGDDRG SRRTTGQGRK WAAHGTMAAP RVHTDYHPGG GSACSSVKVR 

101 SHVGHTGVFF FVDQDPLAVS LTSQSLIPPL IKPGLLKAWG FLLLCAQPSA 

151 NGHSLCCLLY TDLVSSHELS PFRALCLGPS DAPSACASCN CLASTYYL 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24e23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24e23, frame 2 



Report for DKFZphf kd2_24e23 . 2 



[LENGTH] 198 

[MW] 20948.98 

[pi] 6.01 

[PROSITE] MYRISTYL 5 

[PROSITE] AMIDATION 1 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[ PROSITE] CK2_PHOSPHO_SITE 1 

[ PROSITE] PKC_PHOSPHO_SITE 2 

[KW] All_Beta 

[KW] L0W__COMPLEXITY 6.06 % 



SEQ MADTQCCPPPCEFI SSAGTDLALGMGWDATLCLLPFTGFGKCAGIWNHMDEEPDNGDDRG 

SEG 

PRD ccccccccccccccccccccccccccccceeeeeccccccceeeeccccccccccccccc 

SEQ SRRTTGQGRKWAAHGTMAAPRVHTDYHPGGGSACSSVKVRSHVGHTGVFFFVDQDPLAVS 

SEG 

PRD cccccccccccccccccccceeeeecccccccccceeeeeeeccccceeeeeccccceee 

SEQ LTSQSLI PPLIKPGLLKAWGFLLLCAQPSANGHSLCCLLYTDLVSSHELS PFRALCLGPS 

SEG xxxxxxxxxxxx 

PRD eccccccccccccchhhhhhhhhhhccccccccceeeeeeeeeccccccccceeeecccc 

SEQ DAPSACASCNCLASTYYL 

SEG 

PRD cccccccccccccccccc 



Prosite for DKFZphf kd2_24e23 . 2 



PS00004 


62->66 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


61->64 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


96->99 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


165->169 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


18->24 


MYRISTYL 


PDOC00008 


PS00008 


60->66 


MYRISTYL 


PDOC00008 


PS00008 


89->95 


MYRISTYL 


PDOC00008 


PS00008 


91->97 


MYRISTYL 


PDOC00008 


PS00008 


134->140 


MYRISTYL 


PDOC00008 


PS00009 


67->71 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphf kd2_24e23 . 2 ) 
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DKFZphf kd2_24n20 



group: intracellular transport and trafficking 

DKFZphf kd2_24n20 . 3 encodes a novel 366 amino acid protein with similarity to human eps8 
binding protein e3Bl and spectrins. 

The new protein contains an Src homology domain 3 and is similar to human eps8 SH3 domain 
binding protein 1 (e3Bl) and spectrins. Eps8 is a substrate of receptor tyrosine kinases 
involved in mitogenic signaling. Spectrin is part of the submembrane cytoskeletal network in 
the human erythrocyte ghost. Nonerythroid spectrins are proposed to have roles in cell 
adhesion, establishment of cell polarity, and attachment of other cytoskeletal structures to 
the plasma membrane. The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane /cy to skeleton. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamics. 



strong similarity to eps8 binding protein e3Bl 
complete cDNA, complete cds, few EST hits 

potential start at Bp 300, but there are ATGs in other frames in 
5 1 region of the cDNA 

Sequenced by GBF 

Locus: /map="17" 

Insert length: 1719 bp 

Poly A stretch at pos. 1699, polyadenylation signal at pos . 1680 



1 GGGGACAGCT GCCCCGACCT TGGCTTCCTC TGCTGGGTGG GATTGGGGGC 

51 TGGGCCCCCA AATGGGCCCC TGGCTTCCCC CTTCCTCTGG GCAGGGGACA 

101 GAGAGACACA GGCTCGGGGA GCAGGACTGA CTTCCTCTTG TCCCGGAATG 

151 AGCATGCCTG CCCTTTGCAA GCAGGTTTGG GTCTCACGCA GAGGAAACCA 

201 AAAGCAATAA GAGGGAGGGA AGGCAGAGCA ACCAATCAAG GGCAGGGTGA- 

251 GACTCAAAAC GAGCGGGCTC CCTGGGGAGC CAGACAGAGG CTGGGGGTGA 

301 TGGCGGAGCT ACAGCAGCTG CAGGAGTTTG AGATCCCCAC TGGCCGGGAG 

351 GCTCTGAGGG GCAACCACAG TGCCCTGCTG CGGGTCGCTG ACTACTGCGA 

401 GGACAACTAT GTGCAGGCCA CAGACAAGCA GAAGGCGCTG GAGGAGACCA 

4 51 TGGCCTTCAC TACCCAGGCA CTGGCCAGCG TGGCCTACCA GGTGGGCAAC 

501 CTGGCCGGGC ACACTCTGCG CATGTTGGAC CTGCAGGGGG CCGCCCTGCG 

551 GCAGGTGGAA GCCCGTGTAA GCACGCTGGG CCAGATGGTG AACATGCATA 

601 TGGAGAAGGT GGCCCGAAGG GAGATCGGCA CCTTAGCCAC TGTCCAGCGG 

651 CTGCCCCCCG GCCAGAAGGT CATCGCCCCA GAGAACCTAC CCCCTCTCAC 

701 GCCCTACTGC AGGAGACCCC TCAACTTTGG CTGCCTGGAC GACATTGGCC 

751 ATGGGATCAA GGACCTCAGC ACGCAGCTGT CAAGAACAGG CACCCTGTCT 

8 01 CGAAAGAGCA TCAAGGCCCC TGCCACACCC GCCTCCGCCA CCTTGGGGAG 

851 ACCGCCCCGG ATTCCCGAGC CAGTGCACCT GCCGGTGGTG CCCGACGGCA 

901 GACTCTCCGC CGCCTCCTCT GCGTCTTCCC TGGCCTCGGC CGGCAGCGCC 

951 GAAGGTGTCG GTGGGGCCCC CACGCCCAAG GGGCAGGCAG CACCTCCAGC 

1001 CCCACCTCTC CCCAGCTCCT TGGACCCACC TCCTCCACCA GCAGCCGTCG 

1051 AGGTGTTCCA GCGGCCTCCC ACGCTGGAGG ACTTGTCCCC ACCCCCACCG 

1101 GACGAAGAGC TGCCCCTGCC ACTGGACCTG CCTCCTCCTC CACCCCTGGA 

1151 TGGAGATGAA TTGGGGCTGC CTCCACCCCC ACCAGGATTT GGGCCTGATG 

12 01 AGCCCAGCTG GGTGCCTGCC TCATACTTGG AGAAAGTGGT GACACTGTAC 
1251 CCATACACCA GCCAGAAGGA CAATGAGCTC TCCTTCTCTG AGGGCACTGT 
1301 CATCTGTGTC ACTCGCCGCT ACTCCGATGG CTGGTGCGAG GGCGTCAGCT 

13 51 CGGAGGGGAC TGGATTCTTC CCTGGGAACT ATGTGGAGCC CAGCTGCTGA 
1401 CAGCCCAGGG CTCTCTGGGC AGCTGATGTC TGCACTGAGT GGGTTTCATG 

14 51 AGCCCCAAGC CAAAACCAGC TCCAGTCACA GCTGGACTGG GTCTGCCCAC 
1501 CTCTTGGGCT GTGAGCTGTG TTCTGTCCTT CCTCCCATCG GAGGGAGAAG 
1551 GGGTCCTGGG GAGAGAGAAT TTATCCAGAG GCCTGCTGCA GATGGGGAAG 
1601 AGCTGGAAAC CAAGAAGTTT GTCAACAGAG GACCCCTACT CCATGCAGGA 
1651 CAGGGTCTCC TGCTGCAAGT CCCAACTTTG AATAAAACAG ATGATGTCCA 
1701 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AC004797 from database EMBL: 

Homo sapiens chromosome 17, clone hRPC.62_0_9, complete sequence. 
Score = 2316, P = 5.9e-255, identities = 464/465 
7 exons Bp 93317-110902 
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Medline entries 



97163405: 

Isolation and characterization of e3Bl, an eps8 binding 
protein that regulates cell growth. 

98256293: 

Identification of a candidate human spectrin Src homology 3 
domain-binding protein suggests a general mechanism of 
association of tyrosine kinases with the spectrin-based 
membrane skeleton. 



Peptide information for frame 3 



ORF from 300 bp to 1397 bp; peptide length: 366 
Category: strong similarity to known protein 



1 MAELQQTjQEF EI PTGREALR GNHSALLRVA DYCEDNYVQA TDKQKALEET 

51 MAFTTQALAS VAYQVGNLAG HTLRMLDLOG AALRQVEARV STLGQMVNMH 

101 MEKVARREIG TLATVQRLPP GQKVIAPENL PPLTPYCRRP LNFGCLDDIG 

151 HGIKDLSTQL SRTGTLSRKS IKAPAT PAS A TLGRPPRIPE PVHLPVVPDG 

201 RLSAASSASS LASAGSAEGV GGAPTPKGQA APPAPPLPSS LDPPPPPAAV 

251 EVFQRPPTLE ELSPPPPDEE LPLPLDLPPP PPLDGDELGL PPPPPGFGPD 

301 EPSWVPASYL EKVVTLYP YT SQKDNELSFS EGTVICVTRR YSDGWCEGVS 

351 SEGTGFFPGN YVEPSC 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf kd2_24n20, frame 3 
No Alert BLASTP hits found 



Pedant information for DKFZphf kd2_24n20, frame 3 



Report for DKFZphf kd2_24n20 . 3 



[LENGTH] 366 

[MW] 38947.21 

[pi] 4.93 

[HOMOL] TREMBL:U87166_1 gene: "SSH3BP1"; product: "spectrin SH3 domain binding protein 

1"; Homo sapiens spectrin SH3 domain binding protein 1 ( SSH3BP1 ) mRNA, complete cds . 3e-48 



[ FUNCAT ] 


10.99 other signal-transduction activities [S. cerevisiae, YGR136w) 9e-06 


[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YGR136w] 9e-06 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YPR154w] 3e-05 


[FUNCAT] 


30.04 organization of cytoskeleton [S. cerevisiae, YDR388w] 2e-04 


[FUNCAT] 


03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR388w] 


2e-04 




[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YDR162c] 4e-04 


[BLOCKS] 


BL50002B Src homology 3 (SH3) domain proteins profile 


[SUPFAM] 


SH3 homology 6e-17 


[PROSITE] 


MYRISTYL 6 


[PROSITE] 


CAMP PHOSPHO SITE 1 


[PROSITE] 


CK2 PHOSPHO SITE 6 


[PROSITE] 


PKC PHOSPHO SITE 8 


[PROSITE] 


ASN_GLYCOSYLATION 1 


[PFAM] 


Src homology domain 3 


[KW] 


Irregular 


[KWJ 


3D 


[KW] 


LOW_COMPLEXITY 24.04 % 



SEQ MAELQQLQEFEIPTGREALRGNHSALLRVADYCEDNYVQATDKQKALEETMAFTTQALAS 

SEG 

laboA 

SEQ VAYQVGNLAGHTLRMLDLQGAALRQVEARVSTLGQMVNMHMEKVARREIGTLATVQRLPP 

SEG 

laboA 
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SEQ GQKVI APENLPPLTPYCRRPLNFGCLDDIGHGIKDLSTQLSRTGTLSRKSIKAPATPASA 

SEG 

laboA 

SEQ TLGRPPRIPEPVHLPVVPDGRLSAASSASSLASAGSAEGVGGAPTPKGQAAPPAPPLPSS 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

laboA 

SEQ LDPPPPPAAVEVFQRPPTLEELSPPPPDEELPLPLDLPPPPPLDGDELGLPPPPPGFGPD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laboA 

SEQ EPSWVPASYLEKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSEGTGFFPGN 

SEG xx 

laboA EECCCBCCCTTTBCCBTTTEEEEEEEETTTTEEEEEETTEEEEEEGG 

SEQ YVEPSC 

SEG 

laboA GEEE.. 



. Prosite for DKFZphf kd2_24n20 . 3 



psooooi 


22->26 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


339->343 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


14->17 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


41->44 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


72->75 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


167->170 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


170->173 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


225->228 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


321->324 


PKC PHOSPHO 


"site 


P3OCC0005 


PS00005 


338->341 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


14->18 


CK2 PHOSPHO" 


[site 


PDOCC0006 


PS00006 


239->243 


CK2 PHOSPHO" 


"site 


PDOCC0006 


PS00006 


258->262 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


308->312 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


321->325 


CK2 PHOSPHO" 


"SITS 


PDOC00006 


PS00006 


328->332 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


21->27 


MYRISTYL 




PDOC00008 


PS0000B 


66->72 


MYRISTYL 




PDOC00008 


PS00008 


94->100 


MYRISTYL 




PDOC00008 


P300008 


110->116 


MYRISTYL 




PDOC00008 


PS00008 


215->221 


MYRISTYL 




PDOC00008 


PS00008 


332->338 


MYRISTYL 




PDOC00008 



Pfam for DKFZphf kd2_24n20 . 3 



HMM_NAME Src homology domain 3 

HMM *pyVIALYDYqAqdpDELSFkEGDIIiIIEdsDD.WWrgRnnnTNGQEGW 
++V+ LY+Y++Q ++ELSF EG +1 + + D W++G + +G+ 
Query 311 EKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSE GTGF 356 

HMM IPSNYVEPi* 
+P NYVEP 

Query 357 FPGNYVEPS 3 65 
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DKFZphf kd2_24p5 



group: intracellular transport and trafficking 

DKFZphf kd2_24p5 encodes a novel 811 amino acid protein which is a novel splice variant of 
human ankyrin G. 

The ankyrin 3 gene encodes a novel ankyrin, which is expressed in multiple tissues, with very 
high expression at the axonal initial segment and nodes of Ranvier of neurons in the central 
and peripheral nervous systems. Ankyrin G shows several tissue-specific alternative mRNA 
processing. The different ankyrin G proteins participate in maintenance/targeting of ion 
channels and cell adhesion molecules to nodes of Ranvier and axonal initial segments. 

The new protein can find application in modulating the structure and membrane topology of 
Ranvier nodes and other neuronal cell membranes. 



Human ankyrin G (ANK-3) new splice variant 
splice variant 

potential frame shift at 2720 was checked 
see BLASTX 

Sequenced by EMBL 

Locus: /map="10q21" 

Insert length: 3470 bp 

Poly A stretch at pos . 3459, no polyadenylation signal found 



1 AGCTTTAAAA GGATGTCTGC GAAGTGGTCA AAAGGATCTT AACCTCAATT 
51 AAGTGGGGTT TTTTAAAAAG ATTTTTTGGG GGGCCTGAAA TTTTGAAAAT 
101 CTTCGAACTC TGAGTGGGGA AAGATGTATA ATTCCTCAAT TGCCTACGAG 
151 GATATCAAGA TGCTGAGAGG AATTCAGCGG TGGTGAAGAG AGTGGATACA 
201 AACCAGGGAT TGGTTTCCTT GAGCTGTTTT GGAGGTTGAT TCTAAATCAC 
251 TGCTTAAGGA ATTCCTGGAA ACATCAGGAA AACATTTGAT CATCCAAGCC 
301 TAGTGGAAAT GGCTTTACCG CAGAGTGAAG ATGCAATGAC CGGGGACACA 
351 GACAAATATC TTGGGCCACA GGACCTTAAG GAATTGGGTG ATGATTCCCT 
401 GCCTGCAGAG GGTTACATGG GCTTTAGTCT CGGAGCGCGT TCTGCCAGCC 
451 TCCGCTCCTT CAGTTCGGAT GGGTCTTACA CCTTGAACAG AAGCTCCTAT 
501 GCACGGGACA GCATGATGAT TGAAGAACTC CTCGTGCCAT CCAAAGAGCA 
551 GCATCTAACA TTCACAAGGG AATTTGATTC AGATTCTCTT AGACATTACA 
601 GCTGGGCTGC AGACACCTTA GACAATGTCA ATCTTGTTCC AAGCCCCATT 
651 CATTCTGGGT TTCTGGTTAG CTTTATGGTG GACGCGAGAG GGGGCTCCAT 
701 GAGAGGAAGC CGTCATCACG GGATGAGAAT CATCATTCCT CCACGCAAGT 
751 GTACGGCCCC CACTCGAATC ACCTGCCGTT TGGTAAAGAG ACATAAACTG 
801 GCCAACCCAC CCCCCATGGT GGAAGGAGAG GGATTAGCCA GTAGGCTGGT 
851 AGAAATGGGT CCTGCAGGGG CACAATTTTT AGGCCCTGTC ATAGTGGAAA 
901 TCCCTCACTT TGGGTCCATG AGAGGAAAAG AGAGAGAACT CATTGTTCTT 
951 CGAAGTGAAA ATGGTGAAAC TTGGAAGGAG CATCAGTTTG ACAGCAAAAA 
1001 TGAAGATTTA ACCGAGTTAC TTAATGGCAT GGATGAAGAA CTTGATAGCC 
1051 CAGAAGAGTT AGGGAAAAAG CGTATCTGCA GGATTATCAC GAAAGATTTC 
1101 CCCCAGTATT TTGCAGTGGT TTCCCGGATT AAGCAGGAAA GCAACCAGAT 
1151 TGGTCCTGAA GGTGGAATTC TGAGCAGCAC CACAGTGCCC CTTGTTCAAG 
1201 CATCTTTCCC AGAGGGTGCC CTAACTAAAA GAATTCGAGT GGGCCTCCAG 
1251 GCCCAGCCTG TTCCAGATGA AATTGTGAAA AAGATCCTTG GAAACAAAGC 
1301 AACTTTTAGC CCAATTGTCA CTGTGGAACC AAGAAGACGG AAATTCCATA 
1351 AACCAATCAC AATGACCATT CCGGTGCCCC CGCCCTCAGG AGAAGGTGTA 
1401 TCCAATGGAT ACAAAGGGGA CACTACACCC AATCTGCGTC TTCTCTGTAG 
1451 CATTACAGGG GGCACTTCGC CTGCTCAGTG GGAAGACATC ACAGGAACAA 
1501 CTCCTTTGAC GTTTATAAAA GATTGTGTCT CCTTTACAAC CAATGTTTCA 
1551 GCCAGATTTT GGCTTGCAGA CTGCCATCAA GTTTTAGAAA CTGTGGGGTT 
1601 AGCCACGCAA CTGTACAGAG AATTGATATG TGTTCCATAT ATGGCCAAGT 
1651 TTGTTGTTTT TGCCAAAATG AATGATCCCG TAGAATCTTC CTTGCGATGT 
1701 TTCTGCATGA CAGATGACAA AGTGGACAAA ACTTTAGAGC AACAAGAGAA 
1751 TTTTGAGGAA GTCGCAAGAA GCAAAGATAT TGAGGTTCTG GAAGGAAAAC 
1801 CTATTTATGT TGATTGTTAT GGAAATTTGG CCCCACTTAC CAAAGGAGGA 
1851 CAGCAACTTG TTTTTAACTT TTATTCTTTC AAAGAAAATA GACTGCCATT 
1901 TTCCATCAAG ATTAGAGACA CCAGCCAAGA GCCCTGTGGT CGTCTGTCTT 
1951 TTCTGAAAGA ACCAAAGACA ACAAAAGGAC TGCCTCAAAC AGCGGTTTGC 
2001 AACTTAAATA TCACTCTGCC AGCACATAAA AAGATTGAGA AAACAGATGG 
2051 ACGACAGAGC TTCGCATCCT TAGCTTTACG TAAGCGCTAC AGCTACTTGA 
2101 CTGAGCCTGG AATGAGTCCA CAGAGTCCAT GTGAACGGAC AGATATCAGG 
2151 ATGGCAATAG TAGCCGATCA CCTGGGACTT AGTTGGACAG AACTGGCAAG 
2201 GGAACTGAAT TTTTCAGTGG ATGAAATCAA TCAAATACGT GTGGAAAATC 
2251 CAAATTCTTT AATTTCTCAG AGCTTCATGT TTTTAAAAAA ATGGGTTACC 
2301 AGAGACGGAA AAAATGCCAC AACTGATGCC TTAACTTCGG TCTTGACAAA 
2 351 AATTAATCGA ATAGATATAG TGACACTGCT AGAAGGACCA ATATTTGATT 
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2401 ATGGAAATAT TTCAGGCACC 

24 51 CATGACCCTG TTGATGGTTA 

25 01. CACAGGGTTG CACTACACAC 
2551 TTAGTGATAT CTCTAGCATA 
2601 AGTGATGGGC TAGTGCCTTC 
2 651 ACCTCCAGTC GTAACTGCAG 
2701 AAGACTCAGT GCCTTTAACA 
27 51 GCCAGTTGGA GAATGTATGT 
2801 AACCTAGAGT CCTGCGCTCA 
2851 TCGACTGGAT GACAGCCCTG 
2901 TCAAAGGAGA AGCTGGCAAA 
2951 ACTCCAGAAG CAAAGACAAA 
3001 AGGAAAACAG AGTACCAAGG 
3051 GTCATGTTGA AGAACCAGCA 
3101 GAAGAAACCA GCAAGCTTAT 
3151 CAGTATGAAA AAGATGAGTA 
32 01 TTAGCCTCCA TGAAGAAGAG 
3251 GAAGGTTTTA AGGTGAAAAC 
3301 GAGCCACTCG TAACAGCGAA 
3351 CCAGTATTGA GAAATTCGTG 
34 01 CCGAGAAGTG TGTGTGTGTT 
34 51 TTTTTATGCA AAAAAAAAAA 



AGAAGTTTTG CAGATGAGAA CAATGTTTTC 
TCCTTCCCTT CAAGTGGAAC TGGAAACCCC 
CACCTACCCC TTTCCAGCAA GATGATTATT 
GAATCTCCCC TTAGAACCCC TAGTAGACTG 
CCAGGGGAAC ATAGAGCATT CCGCAGATGG 
AAGACGCTTC CTTAGAAGAC AGCAAACTGG 
GAAATGCCTG AAGCAGTGAT GTAGATGAGA 
CTGAGTTGGC AGAATGAGAC ATCAAGTGGA 
AGCTCGAAGA GTAACTGGTG GGTTACTAGA 
ACCAGTGTAG AGATTCCATT ACCTCATATC 
TTTGAAGCAA ATGGAAGCCA TACAGAAATC 
ATCTTACTTT CCAGAATCCC AAAATGATGT 
AAACTCTGAA ACCAAAAATA CATGGATCTG 
TCACCACTAG CAGCATATCA GAAATCTCTA 
AATAGAAGAG ACTAAACCCT GTGTGCCTGT 
GGACTTCTCC AGCAGATGGC AAGCCAAGGC 
GGGTCCAGTG GGTCTGAGCA AAAGCAGGGA 
GAAGAAAGAA ATCCGGCATG TGGAAAAGAA 
CGGTCAGTCA AGGATCATAA GTTTTTACTG 
GAAGAAATGT CAGCAGGAAG TAAAAATTCA 
CGCTGCTTCC ACACATTAAT GGCATGATTT 



BLAST Results 



Entry MMANK3A_1 from database TREMBL: 

Ank3"; product: "ankyrin 3"; Mus mu... +3 4022 0.0 2 

Entry HS13616 from database EMBL : 

Human ankyrin G (ANK-3) mRNA, complete cds. 

Length = 14,770 

Plus Strand HSPs: 

Score = 8505 (1275.1 bits), Expect = 0.0, Sum P(3) = 0.0 
Identities = 1799/1873 (96%) 



Medline entries 



95394457: 

Chromosomal localization of the ankyrinG gene 
(ANK3/Ank3) to human 10q21 and mouse 10. 

95138209: 

A new ankyrin gene with neural-specific isoforms localized at the 
axonal initial segment and node of Ranvier 



Peptide information for frame 3 



ORF from 309 bp to 2741 bp; peptide length: 811 
Category: known protein 
Classification: unset 



1 MALPQSEDAM 
51 FSSDGSYTLN 
101 ADTLDNVNLV 
151 PTR1TCRLVK 
201 FGSMRGKERE 
251 LGKKRICRT I 
301 PEGALTKRIR 
351 TMTIPVPPPS 
401 TFIKDCVSFT 
451 FAKMNDPVES 
501 VDCYGNLAPL 
551 EPKTTKGLPO 
601 GMSPQSPCER 
651 LISQSFMFLK 
701 ISGTRSFADE 
751 ISSIESPLRT 
801 VPLTEMPEAV 



TGDTDKYLGP 
RSSYARDSMM 
PSPIHSGFLV 
RHKLANPPPM 
LIVLRSENGE 
TKDFPQYFAV 
VGLQAQPVPD 
GEGVSNGYKG 
TNVSARFWLA 
SLRCFCMTDD 
TKGGQQLVFN 
TAVCNLNITL 
TDIRMAIVAD 
KWVTRDGKNA 
NNVFHDPVDG 
PSRLSDGLVP 
M 



QDLKELGDDS 
IEELLVPSKE 
SFMVDARGGS 
VEGEGLASRL 
TWKEHQFDSK 
VSRIKQESNQ 
EIVKKILGNK 
DTTPNLRLLC 
DCHQVLETVG 
KVDKTLEQQE 
FYSFKENRLP 
PAHKKIEKTD 
HLGLSWTELA 
TTDALTSVLT 
YPSLQVELET 
SQGNIEHSAD 



LPAEGYMGFS 
QHLTFTREFD 
MRGSRHHGMR 
VEMGPAGAQF 
NEDLTELLNG 
IGPEGGILSS 
ATFSPIVTVE 
SITGGTSPAQ 
LATQLYRELI 
NFEEVARSKD 
FSIKIRDTSQ 
GRQSFASLAL 
RELNFSVDEI 
KINRIDIVTL 
PTGLHYTPPT 
GPPVVTAEDA 



LGARSASLRS 
SDSLRHYSWA 
IIIPPRKCTA 
LGPVIVEIPH 
MDEELDSPEE 
TTVPLVQASF 
PRRRKFHKPI 
WEDITGTTPL 
CVPYMAKFVV 
IEVLEGKPIY 
EPCGRLSFLK 
RKRYSYLTEP 
NQIRVENPNS 
LEGPIFDYGN 
PFQQDDYFSD 
SLEDSKLEDS 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24p5, frame 3 

TREMBL : MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds., N = 1, 
Score = 4022, P = 0 

TREMBL :MMANK3B_3 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N = 1, Score = 
4005, P = 0 

TREMBL :MMANK3B_4 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 {7kb isoform) mRNA, complete cds., N = 1, Score = 
4005, P = 0 



>TREMBL:MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds. 
Length = 1,094 

HSPs : 

Score = 4022 (603.5 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 769/805 (95%), Positives = 783/805 (97%) 

Query: 1 MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 60 

MALP SEDA+TGDTDKYLGPQDLKELGDDSLPAEGY+GFSLGARSASLRSFSSD SYTLN 
SbjCt: 1 MALPHSEDAITGDTDKYLGPQDLKELGDDSLPAEGYVGFSLGARSASLRSFSSDRSYTLN 60 

Query: 61 RSSYARDSMMI EELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 120 

RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLV SP+HSGFLV 
SbjCt: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVSSPVHSGFLV 120 

Query: 121 SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 
SbjCt: 121 SFMVDARGGSMRGSRHHGMRI IIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

Query: 181 VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLTELLNG 240 

VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDL ELLNG 
Sbjct: • 181 VEMGPAGAQFLGPVIVEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLAELLNG 240 

Query: 241 MDEELDSPEELGKKRICRI ITKDFPQYFAWSRIKQESHQIGPEGGILSSTTVPLVQASF 300 

MDEELDSPEELG KRICRIITKDFPQYFAWSRIKQESNQIGPEGGILSSTTVPLVQASF 
SbjCt: 241 MDEELDSPEELGTKRICRIITKDFPQYFAVVSRIKQESNQIGPEGGILSSTTVPLVQASF 300 

Query: 301 PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRRKFHKPITMTIPVPPPS 360 

PEGALTKRIRVGLQAQPVP+E VKKI LGNKATFSPIVTVEPRRRKFHKPITMTI PVPPPS 
SbjCt: 301 PEGALTKRIRVGLQAQPVPEETVKKI LGNKATFSPIVTVEPRRRKFHKPITMTI PVPPPS 360 

Query: 361 GEGVSNGYKGDTTPNLRLLCS ITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

GEGVSNGYKGD TPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 
Sbjct: 361 GEGVSNGYKGDATPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 420 

Query: 421 DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 4B0 

DCHQVLETVGLA+QLYRELICVPYMAKFVVFAK NDPVESSLRCFCMTDD+VDKTLEQQE 
Sbjct: 421 DCHQVLETVGLASQLYRELICVPYMAKFVVFAKTNDPVESSLRCFCMTDDRVDKTLEQQE 480 

Query: 481 NFEEVARSKDIEVLEGKPI YVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 
Sbjct: 481 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

Query: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 600 

EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKK EK D RQSFASLALRKRYSYLTEP 
SbjCt: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKAEKADRRQSFASLALRKRYSYLTEP 600 

Query: 601 GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 660 

MSPQSPCERTDIRMAI VADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFM LK 
Sbjct: 601 SMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMLLK 660 

Query: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 720 

KWVTRDGKNATTDALTSVLTKINRIDI VTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 
Sbjct: 661 KWVTRDGKNATTDALTSVLTKINRIDIVTLLEGPIFDYGNISGTRSFADENNVFHDPVDG 720 

Query: 721 YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 7 80 

+PS QVELETP GL++TPP PFQQDD+FSDISSIESP RTPSRLSDGLVPSQGNIEH 
Sbjct: 721 HPSFQVELETPMGLYWTPPNPFQQDDHFSDISSIESPFRTPSRLSDGLVPSQGNIEHPTG 780 

Query: 781 GPPVVTAEDASLEDSKLEDSVPLTE 805 
GPPVVTAED SLEDSK++DSV +T+ 
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Report for DKFZphf kd2_24p5 . 3 



[LENGTH] 811 

[MW] 90104.66 

[pi] 5.40 

[HOMOL] TREMBL:MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus epithelial 

ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds . 0.0 

[BLOCKS] BL50017B Death domain proteins profile 

[PIRKW] phosphoprotein 0.0 

[PIRKW] alternative splicing 0.0 

[PIRKW] peripheral membrane protein 0.0 

[PIRKW] cytoskeleton 0.0 

[SUPFAM] ankyrin 0.0 

[SUPFAM] ankyrin repeat homology 0.0 

[SUPFAM] unassigned ankyrin repeat proteins 0.0 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 1.73 % 



SEQ MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccceeeeeccccccccc 

MEM 

SEQ RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 

SEG 

PRD cccchhhhhhhhheeeehhhhhhhhhhhccccccccccccccccccccccccccccceee 

MEM MMMMMMMMMMMM 

SEQ SFMVDARGGSMRGSRHHGMRIIIPPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 

SEG xxxxxxxxxxxxxx 

PRD eeeeeccccccccccccceeeecccccccccceeeeehhhhhccccccccccccccccee 

MEM MMMMMMMMMMMMMMMM M 

SEQ VEMGPAGAQFLGPVI VEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDLTELLNG 

SEG 

PRD eecccccceeeceeeeeeccccccccccceeeeeeccccceeeeeccccccchhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ MDEELDSPEELGKKRICRIITKDFPQYFAVVSRIKQESNQIGPEGGILSSTTVPLVQASF 

SEG 

PRD cccccchhhhhhhhheeeeeeccccceeeeehhhhhcccccccccccccceeeeeeeccc 

MEM 

SEQ PEGALTKRIRVGLQAQPVPDEIVKKILGNKATFSPIVTVEPRRKKFHKPITMTIPVPPPS 

SEG 

PRD ccchhhhhhhhhhhhhccccceeeeccccccccccceeeccccccccccceeeecccccc 

MEM 

SEQ GEGVSNGYKGDTTPMLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 

SEG 

PRD ccccccccccccccceeeeeeeeccccccccccccccceeeeeeccccccccccceeeec 

MEM 

SEQ DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhcchhhhhheeecccchhhhhhhhccccchhhhhhhhhc 

MEM 

SEQ NFEEVARSKDIEVLEGKPI YVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 

SEG 

PRD cceeecccceeeeeeccceeeeecccccccchhhhhhhhhchhhhhhhcceeeeeecccc 

MEM 

SEQ EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYS YLTEP 

SEG ' 

PRD ccccceeeeccccccccccccccccccccccccccccccccchhhhhhhhhhhhheeecc 

MEM 

SEQ GMSPQSPCERTDIRMAIVADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 

SEG 

PRD ccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcceeeeecccchhhhhhhhhh 

MEM 
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SEQ KWVTRDGKNATTDALTSVLTKINRIDI VTLLEGPI FDYGNISGTRSFADENNVFHDPVDG 

SEG 

PRD hhhhcccccccchhhhhhhhhhcceeeeeeeccccccccccccccccccccccccccccc 

MEM 

SEQ YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 

SEG 

PRD cccceeeeeccccccccccccccccccccceeeccccccccccccccccccccccccccc 

MEM 

SEQ GPPVVTAEDASLEDSKLEDSVPLTEMPEAVM 

SEG 

PRD ccceeeecccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphf kd2_24p5 . 3 ) 
(No Pfam data available for DKFZphf kd2_24p5 . 3) 



396 



12/13/10, EAST Version: 2.4.2 



WO 01/12659 



PCT/IB00/01496 



DKFZphfkd2_3il3 



group: transmembrane protein 

DKFZphf kd2_3il3 encodes a novel 406 amino acid protein with C. elegans cosmid Y37D8A and A. 
thaliana H71412 hypothetical protein. 

The novel protein contains 3 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 



similarity to A. thaliana and 
membrane regions: 3 

complete cDNA, complete cds. 

Sequenced by BMFZ 

Locus: /map="17" 

Insert length: 2052 bp 

Poly A stretch at pos . 2032, 



C . elegans; 
EST hits 

no polyadenylation signal found 



1 AGTGACGTGA GCGGGTTCCG GTTGTCTGGA GCCCAGCGGC GGGTGTGAGA 
51 GTCCGTAAGG AGCAGCTTCC AGGATCCTGA GATCCGGAGC AGCCGGGGTC 
101 GGAGCGGCTC CTCAAGAGTT ACTGATCTAT GAAATGGCAG AGAATGGAAA 
151 AAATTGTGAC CAGAGACGTG TAGCAATGAA CAAGGAACAT CATAATGGAA 
201 ATTTCACAGA CCCCTCTTCA GTGAATGAAA AGAAGAGGAG GGAGCGGGAA 
251 GAAAGGCAGA ATATTGTCCT GTGGAGACAG CCGCTCATTA CCTTGCAGTA 
301 TTTTTCTCTG GAAATCCTTG TAATCTTGAA GGAATGGACC TCAAAATTAT 
351 GGCATCGTCA AAGCATTGTG GTGTCTTTTT TACTGCTGCT TGCTGTGCTT 
401 ATAGCTACGT ATTATGTTGA AGGAGTGCAT CAACAGTATG TGCAACGTAT 
451 AGAGAAACAG TTTCTTTTGT ATGCCTACTG GATAGGCTTA GGAATTTTGT 
501 CTTCTGTTGG GCTTGGAACA GGGCTGCACA CCTTTCTGCT TTATCTGGGT 
551 CCACATATAG CCTCAGTTAC ATTAGCTGCT TATGAATGCA ATTCAGTTAA 
601 TTTTCCCGAA CCACCCTATC CTGATCAGAT TATTTGTCCA GATGAAGAGG 
651 GCACTGAAGG AACCATTTTT TTGTGGAGTA TCATCTCAAA AGTTAGGATT 
701 GAAGCCTGCA TGTGGGGTAT CGGTACAGCA ATCGGAGAGC TGCCTCCATA 
751 TTTCATGGCC AGAGCAGCTC GCCTCTCAGG TGCTGAACCA GATGATGAAG 
801 AGTATCAGGA ATTTGAAGAG ATGCTGGAAC ATGCAGAGTC TGCACAAGAC 
851 TTTGCCTCCC GGGCCAAACT GGCAGTTCAA AAACTAGTAC AGAAAGTTGG 
901 ATTTTTTGGA ATTTTGGCCT GTGCTTCAAT TCCAAATCCT TTATTTGATC 
951 TGGCTGGAAT AACGTGTGGA CACTTTCTGG TACCTTTTTG GACCTTCTTT 
1001 GGTGCAACCC TAATTGGAAA AGCAATAATA AAAATGCATA TCCAGAAAAT 
1051 TTTTGTTATA ATAACATTCA GCAAGCACAT AGTGGAGCAA ATGGTGGCTT 
1101 TCATTGGTGC TGTCCCCGGC ATAGGTCCAT CTCTGCAGAA GCCATTTCAG 
1151 GAGTACCTGG AGGCTCAACG GCAGAAGCTT CACCACAAAA GCGAAATGGG 
1201 CACACCACAG GGAGAAAACT GGTTGTCCTG GATGTTTGAA AAGTTGGTCG 
1251 TTGTCATGGT GTGTTACTTC ATCCTATCTA TCATTAACTC CATGGCACAA 
1301 AGTTATGCCA AACGAATCCA GCAGCGGTTG AACTCAGAGG AGAAAACTAA 
1351 ATAAGTAGAG AAAGTTTTAA ACTGCAGAAA TTGGAGTGGA TGGGTTCTGC 
1401 CTTAAATTGG GAGGACTCCA AGCCGGGAAG GAAAATTCCC TTTTCCAACC 
1451 TGTATCAATT TTTACAACTT TTTTCCTGAA AGCAGTTTAG TCCATACTTT 
1501 GCACTGACAT ACTTTTTCCT TCTGTGCTAA GGTAAGGTAT CCACCCTCGA 
1551 TGCAATCCAC CTTGTGTTTT CTTAGGGTGG AATGTGATGT TCAGCAGCAA 
1601 ACTTGCAACA GACTGGCCTT CTGTTTGTTA CTTTCAAAAG GCCCACATGA 
1651 TACAATTAGA GAATTCCCAC CGCACAAAAA AAGTTCCTAA GTATGTTAAA 
1701 TATGTCAAGC TTTTTAGGCT TGTCACAAAT GATTGCTTTG TTTTCCTAAG 
1751 TCATCAAAAT GTATATAAAT TATCTAGATT GGATAACAGT CTTGCATGTT 
1801 TATCATGTTA CAATTTAATA TTCCATCCTG CCCAACCCTT CCTCTCCCAT 
1851 CCTCAAAAAA GGGCCATTTT ATGATGCATT GCACACCCTC TGGGGAAATT 
1901 GATCTTTAAA TTTTGAGACA GTATAAGGAA AATCTGGTTG GTGTCTTACA 
1951 AGTGAGCTGA CACCATTTTT TATTCTGTGT ATTTAGGATG AAGTCTTGAA 
2001 AAAAACTTTA TAAAGACATC TTTAATCATT CCAAAAAAAA AAAAAAAAAA 
2051 AA 



BLAST Results 



Entry AC004686 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 17, clone 
hRPC. 1073_F_15; HTGS phase 1, 8 unordered pieces. 
Score = 4142, P = 6.1e-199, identities = 830/832 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 1351 bp; peptide length: 406 
Category: similarity to unknown protein 



1 MAENGKNCDQ RRVAMNKEHH NGNFTDPSSV NEKKRREREE 
51 LITLQYFSLE ILVILKEWTS KLWHRQSIVV SFLLLLAVLI 
101 QYVQRIEKQF LLYAYWIGLG ILSSVGLGTG LHTFLLYLGP 
151 ECNSVNFPEP PYPDQIICPD EEGTEGTIFL WSIISKVRIE 
201 GELPPYFMAR AARLSGAEPD DEEYQEFEEM LEHAESAQDF 
251 LVQKVGFFGI LACASIPNPL FDLAGITCGH FLVPFWTFFG 
301 MHIQKIFVII TFSKHIVEQM VAFIGAVPGI GPSLQKPFQE 
351 HKSEMGTPQG ENWLSWMFEK LVVVMVCYFI LSIINSMAQS 
401 SEEKTK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_3il3, frame 2 

TREMBL : CEY37D8A_20 gene: " Y37D8A. 22" ; Caenorhabditis elegans cosmid 
Y37D8A, N = 1 , Score = 905, P = 8.8e-91 

TREMBL :ATAC 9 8_2 gene: "YUP8H12.2"; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N - 1, Score = 470, P - l.le-44 

PIR:H71412 hypothetical protein - Arabidopsis thaliana, K = 1, Score = 
293, P = 6e-24 



>TREMBL : CEY3 7 D8 A_2 0 gene: " Y37D8A. 22" ; Caenorhabditis elegans cosmid 
Y37D8A 

Length =4 57 

HSPs: 



Score 


= 905 


(135.8 bits). Expect = 8.8e-91, P = 8.8e-91 




Identities = 


= 167/317 (52%), Positives = 228/317 (71%) 




Query: 


38 


REERQNI VLWRQPLITLQYFSLEI LVILKEWTSKLWHRQSI VVSFLLLLAVLIATYYVEG 


97 






R ER+ IV WR+P I + Y +EI + E K+ +++++ + + + + Y+ G 




Sbjct: 


93 


RMERETI VFWRRPHI VI PYALMEIAHLAVELFFKILAHKTVLLLTAI SIGLAVYGYHAPG 


152 


Query: 


98 


VHQQYVQRIEKQFLLYAYWIGLGILSSVGLGTGLHTFLLYLGPHIASVTLAAYECNSVNF 


157 






HQ++VQ IEK L +4-+W+ LG + LSS-t-GLG+GLHTFL+YLGPHIA+VT+AAYEC S++F 




Sbjct: 


153 


AHQEHVQTIEKHILWWSWWVLLGVLSSIGLGSGLHTFLIYLGPHIAAVTMAAYECQSLDF 


212 


Query: 


158 


PEPPYPDQI ICPDEEGTEGTIFLWSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGA 


217 






P+PPYP+ I CP + + F W I++KVR+E+ +WG GTA+GELPPYFMARAAR+SG 




Sbjct: 


213 


PQPPYPESIQCPSTKSSIAVTF-WQIVAKVRVESLLWGAGTALGELPPYFMARAARISGQ 


271 


Query : 


218 


EPDDEEYQEFEEMLE-HAESAQD FASRAKLAVQKLVQKVGFFGILACASIPNPLFD 


272 






EPDDEEY+EF E++ ES D RAK V+ +■ ++GF GIL ASIPNPLFD 




Sbjct: 


272 


EPDDEEYREFLELMNADKESDADQKLSIVERAKSWVEHNIHRLGFPGILLFASIPNPLFD 


331 


Query: 


273 


LAGITCGHFLVPFWTFFGATLIGKAIIKMHIQKIFVIITFSKHIVEQMVAFIGAVPGIGP 


332 






LAGITCGHFLVPFW+FFGATLIGKA++KMH+Q FVI+ FS H E V + +P +GP 




Sbjct: 


332 


LAGITCGHFLVPFWSFFGATLIGKALVKMHVQMGFVILAFSDHHAENFVKILEKIPAVGP 


391 


Query: 


333 


SLQKPFQEYLEAQRQKLH 350 








+++P + LE QR+ LH 




Sbjct: 


392 


YIRQPISDLLEKQRKALH 409 





Pedant information for DKFZphf kd2_3il3, frame 2 



Report for DKFZphf kd2_3il3. 2 
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HIASVTLAAY 
ACMWGIGTAI 
ASRAKLAVQK 
ATLIGKAIIK 
YLEAQRQKLH 
YAKRIQQRLN 
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[ LENGTH] 406 

[MW] 46298.17 

[pi] 6.47 

[HOMOL] TREMBL : CEY37D8A_20 gene: "Y37D8A.22"; Caenorhabditis elegans cosmid Y37D8A le- 
79 

[PROSITE] MYRISTYL 10 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 3 

[KW] LOW_COMPLEXITY 9.85 % 



SEQ MAENGKNCDQRRVAMNKEHHNGNFTDPSSVNEKKRREREERQNIVLWRQPLITLQYFSLE 

SEG xxxxxxxxxx 

PRD ccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhh 

MEM MMM>3#MMMMMMMMMMMMMMMMMMMMM 

SEQ I L VI LKEWTS KLWHRQS I VVS FLLLLAVLI AT Y YVEGVHQQYVQRI EKQFLL Y AYW I GLG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhhhhh 

MEM MM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ILSSVGLGTGLHTFLLYLGPHIASVTLAAYECNSVNFPEPPYPDQIICPDEEGTEGTIFL 

SEG xxxxxxxxxxx 

PRD hccccccccceeeeeeeccchhhhhhhhhhhccccccccccccccccccccccccceeee 

MEM 

SEQ WSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGAEPDDEEYQEFEEMLEHAESAQDF 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhhhhhhhhhccccccccccchhhhhhhhcccccchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ASRAKLAVQKLVQKVGFFGILACASIPNPLFDLAGITCGHFLVPFWTFFGATLIGKAIIK 

SEG 

PRD hhhhhhhhhhhhhhhcceeeeeeeecccccccccccccccceeeeeeehhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

SEQ MHIQKIFVIITFSKHIVEQMVAFIGAVPGIGPSLQKPFQEYLEAQRQKLHHKSEMGTPQG 

SEG 

PRD hhhhheeeeeeechhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhcccccccc 



MEM 



SEQ ENWLSWMFEKLVWMVCYFILSIINSMAQSYAKRIQQRLNSEEKTK 

SEG 

PRD cchhhhhhhhhheeehhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

MEM 



Prosite for DKFZphf kd2_3il3 . 2 



PS00001 


23->27 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


69->72 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


29->33 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


215->219 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


236->240 


CK2 PHOSPHO" 


"site 


PDOC0Q006 


PS00008 


120->126 


MYRISTYL 




PDOC00008 


PS00008 


126->132 


MYRISTYL 




PDOC00008 


PS00008 


173->179 


MYRISTYL 




PDOC00008 


PS00008 


195->201 


MYRISTYL 




PDOC00008 


PS00008 


197->203 


MYRISTYL 




PDOCOOOOS 


PS00008 


259->265 


MYRISTYL 




PDOC00008 


PS00008 


275->281 


MYRISTYL 




PDOC00008 


PS00008 


325->331 


MYRISTYL 




PDOC00008 


PS00008 


329->335 


MYRISTYL 




PDOC00008 


PS00008 


356->362 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf kd2_3il3 .2) 
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DKFZphf kd2_3ol7 



group: metabolism 

DKFZphf kd2_3ol7 encodes a novel 72 amino acid protein with similarity to bos taurus NADH- 
ubiquinone oxidoreductase B33 subunit (EC 1.6.5.3) (EC 1.6.99.3). 

NADH: ubiquinone oxidoreductase is the first enzyme in the respiratory electron transport chain 
of mitochondria. It is a a membrane-bound multi-subunit protein. The bovine heart enzyme 
contains about 40 different polypeptides. The novel protein is the human orthologue of bovine 
B22. 

The new protein can find application in modulation of the respiratory electron transport chain 
pathways of mitochondria. 



strong similarity to bovine NADH- UBIQUINONE OXIDOREDUCTASE B22 subunit 

complete cDNA, complete cds, EST hits, 

in frame stop codon at -274 will be checked 

ESTs HS1291620/AA883920 show no stop codon at this side 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 693 bp 

Poly A stretch at pos. 670, polyadenylation signal at pos . 659 



1 CAGCAGGCGT GCAGTTTCCC GGCTCTCCGC GCGGCCGGGG AAGGTCAGCG 

51 CCGTAATGGC GTTCTTGGCG TCGGGACCCT ACCTGACCCA TCAGCAAAAG 

101 GTGTTGCGGC TTTATAAGCG GGCGCTACGC CACCTCGAGT CGTGGTGCGT 

151 CCAGAGAGAC AAATACCGAT ACTTTGCTTG TTTGATGAGA GCCCGGTTTG 

201 AAGAACATAA GAATGAAAAG GATATGGCGA AGGCCACCCA GCTGCTGAAG 

251 GAGGCCGAGG AAGAATTCTG GTAACGTCAG CATCCACAGC CATACATCTT 

301 CCCTGACTCT CCTGGGGGCA CCTCCTATGA GAGATACGAT TGCTACAAGG 

351 TCCCAGAATG GTGCTTAGAT GACTGGCATC CTTCTGAGAA GGCAATGTAT 

4 01 CCTGATTACT TTGCCAAGAG AGAACAGTGG AAGAAACTGC GGAGGGAAAG 

451 CTGGGAACGA GAGGTTAAGC AGCTGCAGGA GGAAACGCCA CCTGGTGGTC 

501 CTTTAACTGA AGCTTTGCCC CCTGCCCGAA AGGAAGGTGA TTTGCCCCCA 

551 CTGTGGTGGT ATATTGTGAC CAGACCCCGG GAGCGGCCCA TGTAGAAAGA 

601 GAGAGACCTC ATCTTTCATG CTTGCAAGTG AAATATGTTA CAGAACATGC 

651 ACTTGCCCTA ATAAAAAATC AGTAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



Entry S28256 from database PIR: 

NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 
>TREMBL:MIBTCIB22_1 gene: "CI-B22"; product: "NADH-ubiquinone 
oxidoreductase complex B22 subunit"; B. taurus mitochondrion cI-B22 
mRNA for B22 subunit of the NADH-ubiquinone oxidoreductase complex 
Score = 933, P = 5.2e-93, identities = 163/179, positives = 172/179, 
frame +2 



Medline entries 



92389317 

Sequences of 20 subunits of NADH : ubiquinone oxidoreductase from RT bovine heart mitochondria. 
Application of a novel strategy for RT sequencing proteins using the polymerase chain reaction 

Peptide information for frame 2 



ORF from 56 bp to 271 bp; peptide length: 72 
Category: strong similarity to known protein 



1 MAFLASGPYL THQQKVLRLY KRALRHLESW CVQRDKYRYF ACLMRARFEE 

51 HKNEKDMAKA TQLLKEAEEE FW*RQHPQPY IFPDSPGGTS YERYDCYKVP 

101 EWCLDDWHPS EKAMYPDYFA KREQWKKLRR ESWEREVKQL QEETPPGGPL 

151 TEALPPARKE GDLPPLWWYI VTRPRERPM 
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BLASTP hits 

Sequences producing significant alignments: (bits) Value 

sp|Q02369|NI2M_BOVIN|0D36CE17281FB735 (NDUFB9 ..) NADH-UBIQUINONE .. . 141 7e-34 
tr IU41534 IQ18036 | D34BCCB6E8FBCD5F (C16A3 . 4 ) SIMILAR TO NADH-UBIQ .. . 53 3e-07 

>sp | Q02369 | NI2M_BOVIN | 0D36CE172 81FB735 (NDUFB9 . . ) NADH-UBIQUINONE 
OXIDOREDUCTASE B22 SUBUNIT (EC 1.6.5.3) (EC 1.6.99.3) 
(COMPLEX I-B22) (CI-B22) . [BOS TAURUS] 
Length = 178 

Score = 141 bits (351), Expect = 7e-34 
Identities = 63/71 (88%), Positives = 68/71 (95%) 

Query: 2 AFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLMRARFEEHKNEKDMAKAT 61 

AFL+SG YLTHQQKVLRLYKRALRHLESWC+ RDKYRYFACL+RARF+EHKNEKDM KAT 
Sbjct: 1 AFLSSGAYLTHQQKVLRLYKRALRHLESWCIHRDKYRYFACLLRARFDEHKNEKDMVKAT 60 

Query: 62 QLLKEAEEEFW 72 

QLL+EAEEEFW 
Sbjct: 61 QLLREAEEEFW 71 

>tr|U41534 I Q18036 I D34BCCB6E8FBCD5F (C16A3 . 4 ) SIMILAR TO 

NADH-UBIQUINONE OXIDOREDUCTASE B22 . [CAENORHABDITIS 

ELEGANS] 

Length = 163 

Score =52.7 bits (124), Expect = 3e-07 

Identities = 25/64 (39%), Positives = 41/64 (64%), Gaps = 1/64 (1%) 

Query: 10 LTHQQKVLRLYKRALRHLESWCVQRD-KYRYFACLMRARFEEHKNEKDMAKATQLLKEAE 68 

L+H+QKV RLYKR LR +++W + + R+ C++RARF+ + +E D K+ LL + 

Sbjct: 12 LSHRQKVTRLYKRCLREVDNWYGGNNLEVRFQKCI IRARFDANADEVDTRKSQI LLADGC 71 

Query: 69 EEFW 72 
+ W 

Sbjct: 72 RQLW 75 

Alert BLASTP hits for DKFZphf kd2_3ol7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_3ol7, frame 2 



Report for DKFZphf kd2_3ol7 . 2 

[ LENGTH ] 72 

[MW) 8839.28 

[pi] 9.26 

[HOMOL] PIR:S28256 NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 

2e-34 

[KW] All_Alpha 

SEQ MAFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLMRARFEEHKNEKDMAKA 
PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhcchhhhhhh 

SEQ TQLLKEAEEEFW 
PRD hhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_3ol7 . 2 ) 
(No Pfam data available for DKFZphf kd2_3ol7 . 2) 
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DKFZphf kd2_4 6a6 
group: kidney derived 

DKFZphf kd2_4 6a 6 encodes a novel 315 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map="228.6 cR from top of Chrl5 linkage group" 
Insert length: 2774 bp 

Poly A stretch at pos. 2751, polyadenylation signal at pos . 2732 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



CTCGCGAGCG 
CTGCTCCTCC 
CAGAAGATCT 
CCCTGGACCA 
GGTGCCAAAC 
CATTTGTGGT 
TCCTCATGGC 
GGTCTGCGAT 
AATGGAGCCT 
TTGCCTGAGG 
TGTCCAAGCC 
ATAGGAACCA 
AGCATTGGGT 
AGCAGATAGT 
CAACAGATGC 
ATTCAAGAAT 
TGAAAGACCC 
TTCCTCATGA 
TGGATGGCAA 
TGGAGAGCAC 
CTGTCTCTGA 
GCCCTTATCA 
AGTTTTTAGT 
TAAGAGAGTG 
TACTTTTTTT 
TGTGATTTAC 
TTTAAGTAGC 
TGTCACCTTT 
CTCTGAGCTA 
GATGTGAGCT 
TAAGAAAATG 
CTAGAAAGAA 
GCAAAGGATT 
GGGATATGTT 
TGGAGACAGT 
GTCCTAGAGT 
CACCTGCAAA 
GATCACTGCC 
TAGGCGATGA 
CTCTCTGCCC 
ATCCTTTGTC 
TGGCAAGTTT 
TTTGTTTTTG 
TTACTGCTGT 
CGCACACCAG 
TACACAGCAT 
TCACACCCCT 
TGGCAGAAGG 
GGGGACAGCT 
CTGGACATTG 
CTGTTCTGAA 
TCTTTGAGGT 
AAAAAAGGTG 
TACTAGGCCA 



CAGCTATGGC 
GTCTTCTCAG 
TATTGTGGAA 
TTGATAATAA 
AAATTTCTTG 
TTACTTTGAC 
TTCCACTGGC 
AGAGTGTCTG 
CAAACATGGC 
AGGATGATGA 
CTGAATGCCA 
AGGCTTTAGC 
CAGCAGATCC 
ACTGAATCCC 
CCAGGTTGAT 
TAGCCAGTCT 
TTTTCAAAGT 
GCAAAGAAAA 
TCGGGGGAGA 
TGAATTATTC 
GATACCTCTC 
TGTTGGCTGC 
AGGAGGTTAA 
AGGAATACAG 
TTGTTCTAGG 
TCAAGTTGAA 
ATTTCCAGCA 
CCTGGGTGAT 
TGATGCTTTT 
ATGTGGGGCC 
CCTCTGGGCA 
TCAAAAAGCC 
TCTATTCCAG 
GTATGTTAGA 
ATGTGATAAC 
TCTCCCCTGC 
ACAAGGCACA 
AAAGTGGGAG 
ATTCCTGAGC 
TTCCAAGCCT 
TTGTTAGAGT 
TTAAAGGAAG 
TAGACTTTGT 
GGCTCTGAAC 
CTGAGAACTG 
GAAAGAAACA 
CCAGACACTA 
AATGGAATGC 
CAGTGACTGG 
CTTAGTGACC 
AGACTTTGAG 
GATTGCATTA 
GAACATGTTT 
TCTGGTTAGA 



TGCTGGCGTA 
GAGACCAGCT 
GTGACTTCCA 
ATACTATTCA 
TTACTGCAGA 
AGCACACGAA 
AAAAGCATGG 
AAGATGGTAT 
TTTGAATTGG 
CTTCCCAGAA 
ATGTGTGGTC 
CTTCTCAACT 
CTGTCACCCA 
TCTCTGATCA 
AGCATTGTGG 
TACCACTGGA 
TAAAGGAAAT 
GTGCATGCAG 
CAGAGATGAA 
ATACTAGGGT 
TACTCAGCCC 
CTGACTTGTT 
GGAGAAATCT 
TGATAGTAAT 
AATGAGGGTA 
GACAACCTCC 
TTCACACTTG 
TTGGGTTTTC 
ATTGGGAGGA 
GAAGTCTCAG 
TTCTTTTGAA 
AGTGTGGATT 
TGGGAAGGAA 
GAGAACCTTA 
ATACCGTGAT 
TGCTTGAGAT 
TTTCCCCCTT 
CACTAAGGGG 
ACCTTGTTTT 
GTAACCTCGG 
GGGTCAGCCC 
AGTGGAAAGT 
AATGCATATC 
TGGCACATAG 
GTTCTGGCCT 
GGTTGGGTTA 
CCTTATAAGC 
TACAGGGGCC 
AGCATTCAGG 
TTTTGTTCCT 
TCTGTGGTTC 
GGGAAGTTGG 
TCCTTAAAAG 
AAAAACAGAC 



CCCTGTGCGT 
GGTCCAACAT 
ATGATGCTGT 
GCAGACATCA 
GATTGCAGAA 
AATCGGGCCT 
TTACCTGAGG 
AAACCGACAA 
TAGAACTTAG 
TCTACAGGAG 
CAATGTAGTG 
CATTGACTGG 
GAGCAACCCC 
TCGGGGTGGT 
ATCCCATGTT 
GGAGGAGATG 
GAAAGACAAG 
AAAAGGTGGC 
ATTGAAGGCC 
TTGACCAACA 
AGTCATATTT 
IATAGGGTCC 
TTTTTTTCCT 
GAGTGAGGAT 
GGATAAATCT 
AGGCCATTCC 
ATACTGCACA 
TCCATTCAAG 
AAGGAGGCAG 
CCCGCAGCTA 
GTATAGTGTC 
TTTAGGCTGT 
ACCTCTCTAC 
AGGAGTCCTT 
TTTCATGAAG 
GCCAGAGCTG 
TCTCTTTAAA 
TGGGTGGGGA 
TCTTCCAAGG 
AGGACTATCT 
CAGAGGAACT 
ACTGCAAATA 
ATTAGCCCTC 
TACAGTGGAT 
AGGTGGGCTC 
GGAGCAGAAA 
ACTGCAGAAC 
AGCAGGAGTG 
AAGAGGCTTT 

ACCACCAGCC 
CTCTGGGATT 
ATGGAAGGTT 
CAGACTAGAA 



TAGTCACCAG 
ACCCTTGGAA 
GAGATTTTAT 
ATCTATGTGT 
TCTGTCCAAG 
TGATAGTGTC 
TGATGATCTT 
AAAGCTCAAG 
TCCAGAGGAG 
TAAAGCGAAT 
ATGAAGAA-'G 
AACAAACCAT 
ATTTGCCAGC 
GCATCTAACA 
AGATCTGGAT 
TGGAGAATTT 
GCTGCGACGC 
CAAAGCATTC 
TTTCATCTGA 
AAGATGCTAG 
TGCCAAAATT 
CCT7AATT7T 
CAGTATATTG 
TTCTTAAATA 
CAGAGGTCTG 
TGGTCAACCT 
TCAGGAGTTG 
GAGCTTGTAG 
CTGCAGAATT 
AGTCTCTACC 
TGAGCTCATG 
AATAAATGAG 
TGAGTTGTGG 
GTATGGGCCA 
AAATTCTTCT 
TGTTGTTGCA 
GCCAAAGAGA 
AGTGAAATGT 
TTCGTAGCTC 
TTTGTTCTCT 
GATAAGCAAA 
AAAATCCTTA 
ACTGTGATCA 
GGAAGGTGCC 
TAGAACCATT 
GAAATAAGGC 
CTGAAACAGA 
ACCACAGGGA 
CCAGGGAACA 
TTTTCTTTTA 
CATCAGTGTT 
GCAAAAAAAA 
TTAGAAAATA 
AAAGCTGTGA 
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2701 ATTTGATTTT GTAGATTAAA CAAAGCCAGA TGATTAAAAT GTGATTTATT 
2751 TATAAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS463358 from database EMBL : 
human STS WI-14364. 
Length = 472 
Minus Strand HSPs: 

Score = 1605 (240.8 bits), Expect = 5.0e-68, P = 5.0e-68 
Identities = 347/361 (961) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 16 bp to 960 bp; peptide length: 315 
Category: putative protein 
Classification: unset 



1 MAAGVPCALV TSCSSVFSGD QLVOHTLGTE DLIVEVTSND AVRFYPWTID 
51 NKYYSADINL CVVPNKFLVT AEIAESVQAF VVYFDSTRKS GLDSVSSWLP 
101 LAKAWLPEVM ILVCDRVSED GINRQKAQEW SLKHGFELVE LSPEELPEED 
151 DDFPESTGVK RIVQALNANV WSNVVMKNDR NQGFSLLNSL TGTNHSIGSA 
201 DPCHPEQPHL PAADSTESLS DHRGGASNTT DAQVDSIVDP MLDLDIQELA 
251 SLTTGGGDVE NFERPFSKLK EMKDKAATLP HEQRKVHAEK VAKAFWMAIG 
301 GDRDEIEGLS SDGEH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_46a6, frame 1 

PIR:T04362 probable GTP-binding protein yptm3 - maize, N — 1, Score = 
87, P = 0.21 

PIR:S71585 GTP-binding protein GB2 - Arabidopsis thaliana, N = 1, Score 
=86, P = 0.27 



>PIR:T04362 probable GTP-binding protein yptm3 - maize 
Length = 210 

HSPs: 

Score = 87 (13.1 bits), Expect = 2.4e-01, P = 2.1e-01 
Identities = 34/160 (21%), Positives = 67/160 (41%) 

Query: 48 TIDNKYYSADINLCVVPNKFL-VTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWL 10 S 

TIDNK I F +T ++ +D TR+ + ++SWL A+ 

Sbjct: 49 TIDNKPIKLQIWDTAGQESFRSITRSYYRGAAGALLVYDITRRETFNHLASWLEDARQHA 108 

Query: 107 PE VMIL--VCDRVSEDGINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKR 161 

VM++ CD ++ ++ ++++ +HG +E S + ++ F ++ G 

Sbjct: 109 NANMTVMLIGNKCDLSHRRAVSYEEGEQFAKEHGLVFMEASAKTAQNVEEAFIKTAGT — 166 

Query: 162 IVQALNANVWSNVVMKNDRNQGFSLLNSLTGTNHSIGSADPC 203 

I + + ++ N G+++ NS G S AC 

Sbjct: 167 IYKKIQDGIFDVSNESNGIKVGYAVPNSSGGGAGSSSQAGGC 208 



Pedant information for DKFZphf kd2_46a6, frame 1 



Report for DKFZphf Jcd2_4 6a6. 1 



[LENGTH] 315 
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[MW] 34505.54 

[pi] 4.55 

[KM) Alpha_Beta 

[KW) LOW_COMPLEXITY 6.67 % 



SEQ MAAGVPCALVTSCSSVFSGDQLVQHTLGTEDLIVEVTSNDAVRFYPWTIDNKYYSADINL 

SEG 

PRD cccccceeeeecccccccccceeeeccccceeeeeeccccceeeecccccccccccccee 

SEQ CVVPNKFLVTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWLPEVMILVCDRVSED 

SEG 

PRD eeecccchhhhhhhhhhheeeeeeecccccccccccccccccccccccceeeeccccccc 

SEQ GINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKRIVQALNANVWSNVVMKNDR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhcccceeeeccccccccccccccccccchhhhhhhhcccceeeeeeccc 

SEQ NQGFSLLNSLTGTNHSIGSADPCHPEQPHLPAADSTESLSDHRGGASNTTDAQVDSIVDP 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccch 

SEQ MLDLDIQELASLTTGGGDVENFERPFSKLKEMKDKAATLPHEQRKVHAEKVAKAFWMAIG 

SEG 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhc 

SEQ GDRDEIEGLSSDGEH 

SEG 

PRD ccccccccccccccc 



(No Prosite data available for DKFZphf kd2_4 6a6 . 1) 
(No Pfam data available for DKFZphf kd2_4 6a6 . 1) 
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DKFZphf kd2_46bl0 



group: kidney derived 

DKFZphf kd2_46bl0 . 1 encodes a novel 315 amino acid protein with similarity to C.elegans cosmide 
F25B5.3 

The novel protein contains a HTH-LYSR- family PROSITE pattern. Proteins of the lysR family are 
bacterial transcriptional regulatory proteins which bind DNA using a helix-turn-helix motif. 
Most of these proteins are transcription activators and usually negatively regulate their own 
expression. They all possess a potential ' helix- turn-helix 1 DNA-binding motif in their N- 
terminal section. The ' helix-turn-helix ' motif is missing in DKFZphf kd2_4 6a6 . 1 . 
No informative BLAST results, no predictive PFAM or SCOP motive. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to C.elegans F25B5.3 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 1285 bp 

Poly A stretch at pos. 1266, no polyadenylation signal found 



1 CAGTCTACGC GAGCTGCCTG TTTTTTTCCT GCTTGGACGC GCATGAGGGC 

51 CCCGTCCATG GACCGCGCGG CCGTGGCGAG GGTGGGCGCG GTAGCGAGCG 

101 CCAGCGTGTG CGCCCTGGTG GCGGGGGTGG TGCTGGCTCA GTACATATTC 

'151 ACCTTGAAGA GGAAGACGGG GCGGAAGACC AAGATCATCG AGATGATGCC 

201 AGAATTCCAG AAAAGTTCAG TTCGAATCAA GAACCCTACA AGAGTAGAAG 

251 AAATTATCTG TGGTCTTATC AAAGGAGGAG CTGCCAAACT TCAGATAATA 

301 ACGGACTTTG ATATGACACT CAGTAGATTT TCATATAAAG GGAAAAGATG 

351 CCCAACATGT CATAATATCA TTGACAACTG TAAGCTGGTT ACGGATGAAT 

401 GTAGAAAAAA GTTATTGCAA CTAAAGGAAA AATATTACGC TATTGAAGTT 

451 GATCCTGTTC TTACTGTAGA AGAGAAGTAC CCTTATATGG TGGAATGGTA 

501 TACTAAATCA CATGGTTTGC TTGTTCAGCA AGCTTTACCA AAAGCTAAAC ' 

551 TTAAAGAAAT TGTGGCAGAA TCTGACGTTA TGCTCAAAGA AGGATATGAG 

601 AATTTCTTTG ATAAGCTCCA ACAACATAGC ATCCCCGTGT TCATATTTTC 

651 GGCTGGAATC GGCGATGTAC TAGAGGAAGT TATTCGTCAA GCTGGTGTTT 

7 01 ATCATCCCAA TGTCAAAGTT GTGTCCAATT TTATGGATTT TGATGAAACT 

751 GGGGTGCTCA AAGGATTTAA AGGAGAACTA ATTCATGTAT TTAACAAACA 

801 TGATGGTGCC TTGAGGAATA CAGAATATTT CAATCAACTA AAAGACAATA 

851 GTAACATAAT TCTTCTGGGA GACTCCCAAG GAGACTTAAG AATGGCAGAT 

901 GGAGTGGCCA ATGTTGAGCA CATTCTGAAA ATTGGATATC TAAATGATAG 

951 AGTGGATGAG CTTTTAGAAA AGTACATGGA CTCTTATGAT ATTGTTTTAG 

1001 TACAAGATGA ATCATTAGAA GTAGCCAACT CTATTTTACA GAAGATTCTA 

1051 TAAACAAGCA TTCTCCAAGA AGACCTCTCT CCTGTGGGTG CAATTGAACT 

1101 GTTCATCCGT TCATCTTGCT GAGAGACTTA TTTATAATAT ATCCTTACTC 

1151 TCGAAGTGTT CCCTTTGTAT AACTGAAGTA TTTTCAGATA TGGTGAATGC 

1201 ATTGACTGGA AGCTCCTTTT CTCCACCTCT CTCAACACAC TCCTCACCGT 

1251 ATCTTTTAAC CCATTTAAAA AAAAAAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 43 bp to 1050 bp; peptide length: 336 
Category: similarity to unknown protein 
Classification: unset 

Prosite motifs: HTH_LYSR_FAMILY (16-47) 
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1 MRAPSMDRAA VARVGAVASA SVCALVAGVV LAQYIFTLKR KTGRKTKIIE 
51 MMPEFQKSSV RIKNPTRVEE IICGLIKGGA AKLQIITDFD MTLSRFSYKG 
101 KRCPTCHNII DNCKLVTDEC RKKLLQLKEK YYAIEVDPVL TVEEKYPYMV 
151 EWYTKSHGLL VQQALPKAKL KEIVAESDVM LKEGYENFFD KLQQHSIPVF 
201 IFSAGIGDVL EEVIRQAGVY HPNVKVVSNF MDFDETGVLK GFKGELIHVF 
251 NKHDGALRNT EYFNQLKDNS NIILLGDSQG DLRMADGVAN VEHILKIGYL 
301 NDRVDELLEK YMDSYDIVLV QDESLEVANS ILQKIL 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6bl0, frame 1 

SWISSPROT: YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III., N = 1, Score = 524, P = 2.2e-50 

TREMBL:AC005499_12 gene: "T6A23.12"; Arabidopsis thaliana chromosome 
II BAC T6A23 genomic sequence, complete sequence., N = 2, Score = 194, 
P = 1.4e-26 



>SWISSPROT : YQT3CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III. 

Length = 376 

HSPs: 



Score 


= 524 


(78.6 bits), Expect = 2.2e-50, P = 2.2e-5D 




Identities ■ 


= 112/300 (37%), Positives - 174/300 (58%) 




Query: 


44 


RKTKIIEMMPEFQ— KSSVRIKNPTRVEEI I CGLI KGGAAKLQI ITDFDMTLSRFSYK-G 


100 






+KT ++ ++ + + + + +PT V + ++ GGA K +I+DFD TLSRF+ + G 




Sbjct: 


73 


KKTDVVPLLMNYLLGEEQILVADPTAVAAKLRKMWGGAGKTVVISDFDYTLSRFANEQG 


132 


Query: 


101 


KRCPTCHNIID-NCKLVTDECRKKLLQLKEKYYAIEVDPVLTVEEKYPYMVEWYTKSHGL 


159 






+R T H + D N + E +K + LK KYY IE P LT+EEK P+M +W+ SH L 




Sbjct: 


133 


ERLSTTHGVFDDNVMRLKPELGQKFVDLKNKYYPIEFSPNLTMEEKIPHMEKWWGTSHSL 


192 


Query: 


160 


LVQQALPKAKLKEI VAESDVMLKEGYENFFDKLQQHSIPVFI FSAGIGDVLEEVIRQA-G 


218 






+V + K +++ V +S ++ K+G E+F + L H+IP+ IFSAGIG+++E ++Q G 




Sbjct: 


193 


IVNEKFSKNTIEDFVRQSRIVFKDGAEDFIEALDAHNIPLVIFSAGIGNI IEYFLQQKLG 


252 


Query: 


219 


VYHPNVKVVSNFMDFDETGVLKGFKGELIHVFNKHDGAL-RNTEYFNQLKDNSNIILLGD 


277 






N +SN + FDE F LIH F K+ + + T +F+ + N+ILLGD 




Sbjct: 


253 


AIPRNTHFISNMILFDEDDNACAFSEPLIHTFCKNSSVIQKETSFFHDIAGRVNVILLGD 


312 


Query: 


278 


SQGDLRMADGVANVEHILKIGYLNDRVDEL--LEKYMDSYDIVLVQDESLEVANSILQKI 


335 






S GD+ M GV LK+GY N +D+ L+ Y + YDIVL+ D +L VA 1+ I 




Sbjct: 


313 


SMGDIHMOVGVEROGPTLKVGYYNGSLDDTAALQHYEEVYDIVLIHDPTLNVAQKIVDII 


372 



Pedant information for DKFZphf kd2_46bl0, frame 1 



Report for DKFZphf kd2_46bl0. 1 



[LENGTH] 336 

[MW] 37948.37 

[pi] 6.67 

[HOMOL] SWISSPROT: YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME III. 
3e-51 

[PROSITE] HTH_LYSR_FAMILY 1 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 7.44 % 



SEQ MRAPSMDRAAVARVGAVASASVCALVAGVVLAQYIFTLKRKTGRKTKI IEMMPEFQKSSV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccchhhhhcchhhhhhheeehhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ RIKNPTRVEEI ICGLIKGGAAKLQIITDFDMTLSRFSYKGKRCPTCHNI IDNCKLVTDEC 

SEG 

PRD eecccchhhhhhhhhhccccceeeeecccccceeeecccccccccccccccccchhhhhh 

MEM 
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SEQ RKKLLQLKEKYYAIEVDPVLTVEEKYPYMVEWYTKSHGLLVQQALPKAKLKEIVAESDVM 

SEG 

PRD hhhhhhhhhhhheeeccccccccccchhhhhhccccchhhhhhccchhhhhhhhhhhhcc 

MEM 

SEQ LKEGYENFFDKLQQHSIPVFI FSAGIGDVLEEVIRQAGVYHPNVKVVSNFMDFDETGVLK 

SEG 

PRD ccccchhhhhhhhhcccceeeeecccchhhhhhhhhhcccccceeeeeecccccccccee 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ GFKGELIHVFNKHDGALRNTEYFNQLKDNSNIILLGDSQGDLRMADGVANVEHILKIGYL 

SEG 

PRD eccceeeeeeecccccccccchhhhhhhhceeeeecccccccccccccccccceeeeeec 

MEM 

SEQ NDRVDELLEKYMDSYDIVLVQDESLEVANSILQKIL 

SEG 

PRD cchhhhhhhhhhhhheeeeeecchhhhhhhhhhccc 

MEM 



Prosite for DKFZphf kd2_4 6bl0. 1 
PS00044 16->47 HTH_LYSR_FAMILY PDOC00043 

(No Pfam data available for DKFZphf kd2_46bl0. 1) 
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DKFZphf kd2_46dl3 



group: kidney derived 

DKFZphf kd2_46dl3 encodes a novel 506 amino acid protein with weak similarity to KE03 protein 
The novel protein contains a RGD site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motive 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to KE03 protein 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 



Locus: /map="227.6 cR from top of Chrl linkage group" 
Insert length: 3346 bp 

Poly A stretch at pos . 3328, polyadenylation signal at pos . 3308 



1 CTCTCGCGAG AGGAGCAAGA GGAAGATGGC CGTGCCCTGT TTTTCGGTGT 

51 AAGGCAGCAG ACGGCGGCTG CGACGGCGAG ACTGAGATCC TGGTGTCGTG 

101 GGCACCTGAG TTCTAGCTTC CCCCAGCGAG CGCGCGTCCC TTCGTGCCTA 

151 GGCGAGAGCC GGCTCTTCCC CGGGAGATGC GTTTGTCCCA GGCTCGGGGG 

201 CTCAGTGGGA GTTCATGCTG CGCTGGAGGC TCTTGGCCAC CGCTCTAATC 

251 GCCTTGTGCC GCCGCAGCGC CAGCTCCGTC GCCAGCGGTG AGCCTCCCGA 

301 TTCCCCCCCT TGCCCCTGGC GGCGGCGATG ACCGGGGAGA AGATCCGCTC 

351 ACTGCGGAGG GACCACAAGC CCAGCAAAGA AGAAGGGGAC CTGCTGGAGC 

401 CCGGGGATGA AGAAGCGGCG GCTGCCCTCG GCGGTACCTT TACCAGAAGC 

451 AGGATTGGCA AGGGCGGCAA AGCTTGTCAT AAGATCTTCA GTAACCATCA 

501 CCACCGGCTA CAGCTGAAGG CAGCTCCGGC CTCCTCCAAT CCCCCCGGCG 

551 CCCCGGCTCT GCCGCTGCAC AATTCCTCCG TGACTGCCAA CTCCCAGTCC 

601 CCGGCCCTTC TGGCCGGCAC CAACCCCGTT GCTGTCGTCG CGGATGGAGG 

651 CAGTTGCCCC GCACACTACC CGGTGCACGA GTGCGTCTTC AAGGGGGATG 

701 TGAGGAGACT CTCCTCTCTC ATCCGCACGC ACAATATCGG GCAGAAAGAT 

751 AATCACGGAA ATACTCCTTT ACACCTTGCT GTGATGTTAG GAAATAAAGT 

801 TACAGCTCTT TTGAGGAAGC TTAAGCAGCA ATCCAGGGAA AGTGTTGAAG 

851 AAAAACGACC TCGATTATTA AAAGCCCTGA AAGAGCTAGG TGACTTTTAT 

901 CTAGAACTTC ACTGGGATTT TCAAAGCTGG GTGCCTTTAC TTTCCCGAAT 

951 TCTGCCTTCC GATGCATGTA AAATATACAA ACAAGGTATC AATATCAGGC 

1001 TTGACACAAC TCTCATAGAC TTTACTGACA TGAAGTGCCA ACGAGGGGAT 

1051 CTAAGCTTCA TTTTCAATGG GGATGCGGCG CCCTCTGAAT CTTTTGTAGT 

1101 ATTAGACAAT GAACAAAAAG TTTATCAGCG AATACATCAT GAGGAATCAG 

1151 AGATGGAAAC AGAAGAAGAG GTGGATATTT TAATGAGCAG TGATATTTAC 

1201 TCTGCAACTT TATCAACAAA ATCAATTTCT TTCACGCGTG CCCAGACAGG 

1251 ATGGCTTTTT CGGGAAGATA AAACAGAAAG AGTAGGAAAC TTTTTGGCAG 

1301 ACTTTTACCT GGTGAATGGA CTTGTTATAG AATCAAGGAA AAGAAGAGAA 

1351 CATCTCAGTG AAGAGGATAT TCTTCGAAAT AAGGCCATCA TGGAGAGTTT 

14 01 GAGTAAAGGT GGAAACATAA TGGAACAGAA TTTTGAGCCG ATTCGAAGAC 

14 51 AGTCTCTTAC ACCGCCTCCT CAGAACACTA TTACATGGGA AGAATATATA 

1501 TCTGCTGAAA ATGGAAAAGC TCCTCATCTG GGTAGAGAAT TGGTGTGCAA 

1551 AGAGAGTAAG AAAACGTTTA AAGCTACGAT AGCCATGAGC CAGGAATTTC 

1601 CCTTAGGGAT AGAGTTATTA TTGAATGTTT TAGAAGTAGT AGCTCCCTTC 

1651 AAGCACTTTA ACAAGCTTAG AGAATTTGTT CAGATGAAGC TTCCTCCAGG 

1701 CTTTCCTGTA AAAT TAG AT A TACCTGTGTT TCCCACAATC ACAGCCACTG 

1751 TGACTTTTCA GGAGTTTCGA TACGATGAAT TTGATGGCTC CATCTTTACT 

1801 ATACCTGATG ACTACAAGGA AGACCCAAGC CGTTTTCCTG ATCTTTAACT 

1851 GACGTGGAAA AGGATGCCGT CTAACCAAGG AAAGAAAATA CAGAGACCCT 

1901 AGAAGTGGAT CCAAATAGAA GGGACAAATG CTTTCAGTGA AGAAAAGGGA 

1951 ATTACACATT GAATCGACAC ATCAGTAATA CGATACAGTG AAATGGGCCT 

2001 CTAATAAGAA TTTCAGCGAG TTTTCTGATG TGCCATTTTT TGTCTTTTTA 

2051 AAAATATACA TATTATAAAT GTAATAGTTT GACACATTAA TGACCCTAAG 

2101 ACCTGCGTAT GTGAAGCAGC TATGAGTGCT GTGATTTGTT TTTAAAAATT 

2151 TTTACACTTC TTGTTGAAAT ATATATGCAT ATAAATATAT CTATATCTAT 

2201 ATCTATATCT AAAACACTCC TGGACCATTA ACGTAAATTA AATGTCTTAA 

2251 GAGATATGGA GCCCTTTTAA ACTTGTCATC TTTATGCAAG GTGACATTTA 

2301 TAAATATTCC TTCGAGCTTT GTTTTCATAA AATGTAAACT ATGTAACATT 

2351 ATGTATAGTT CAGTAATTTG AATGTTTGTT CAATATAATG AACTAGAAGG 

24 01 AATGCAATTT TCTGTAGATG AATGAACCAA ATGGTAACCA TTAAACAATT 

24 51 GCATTTATAT GTTGCAATAC ATTTCAGAAG GAGCGTTCAC TCTGCAGGGA 
2501 ATAAGGTACC TCCTTTAGCA CCTTAGTGCA ATTCATTGTG GTGCTATTTG 

25 51 TTTTTACCTG AATGTTTGTT ACTAATCTTC CTTTCATAGA ACCTCTATTT 
2601 TTTTTTTTTC TAAACTTGAG TTTGAGTCCT TGTTATGGTC ATCATAAGGT 
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2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 



AATGGTTAGC 
AAAAAAATCC 
GATTATTTTG 
AAAAAACAAT 
AATAAGATAA 
AAAGTCCAGA 
TCACAGGAAA 
TACTCATTTT 
AGGTCCAAGT 
AGTATCATGT 
ATCCTGATGA 
AGTAGCTTAA 
GACCCTTATT 
AAATTTTAAA 



ATGTTTAAAG 
AAATTTTTAA 
TTTGTTTTTA 
ACACATATTT 
AAACATTTTA 
AGTATACACA 
ATATTGATTT 
TGCACTTAAA 
ATGAAAATAA 
TGTATTAAAG 
ATGTCTCAAG 
ACTTTTTTCA 
GAAAATATGA 
TAAACATCTT 



ATATTCCTCT 
ACTTGCTTCC 
GTAGAATATG 
TGGACAACCC 
TATGCTAACA 
AGATTGATTA 
CATTGTCTCC 
ATTTTTCTTA 
ATTAGGGGGA 
AGCTTACTTA 
AATGCATCTG 
GGATTTTAGG 
TTTAAAAATC 
TAAAGCTGAA 



TCCAAATCTC 
TAATAAGTAC 
GATGCATTGG 
TACATATTTA 
GAATATATTT 
CTCCTATTAT 
AAAGTGATAA 
TTTATTCCAA 
TTAATGTATA 
GATTGATGTT 
TCAAGTTTTT 
TAATTTGAAA 
CAAAGCATAA 
AAAAAAAAAA 



AGCACTTTAA 
ACATCGGTCT 
TGTCAGTTTT 
ATCCTTTCAA 
GTTACAAGTT 
TTTTTTTAAA 
AATCTTGTAT 
GGTGGTTTGA 
ACAGTTATAA 
TTTAAAATGT 
TAGACTGACC 
GGAGTTTAGA 
ACCGTAAGAA 
AAAAAA 



BLAST Results 



Entry HS121353 from database EMBL: 
human STS WI-14729. 
Score = 1697, P = 1.9e-69, identities = 363/379 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 328 bp to 1845 bp; peptide length: 506 
Category: similarity to unknown protein 



1 MTGEKIRSLR RDHKPSKEEG DLLEPGDEEA AAALGGTFTR SRIGKGGKAC 

51 HKIFSNHHHR LQLKAAPASS NPFGAPALPL HNSSVTANSQ SPALLAGTNP 

101 VAVVADGGSC PAHYPVHECV FKGDVRRLSS LIRTHNIGQK DKHGNTPLHL 

151 AVMLGNKVTA LLRKLKQQSR ESVEEKRPRL LKALKELGDF YLELHWDFQS 

201 WVPLLSRILP SDACKIYKQG INIRLDTTLI DFTDMKCQRG DLSFIFNGDA 

251 APSESFVVLD NEQKVYQRIH HEESEMETEE EVDILMSSDI YSATLSTKSI 

301 SFTRAQTGWL FREDKTERVG NFLADFYLVN GLVIESRKRR EHLSEEDILR 

351 NKAIMESLSK GGNIMEQNFE PIRRQSLTPP PQNTITWEEY ISAENGKAPH 

401 LGRELVCKES KKTFKATIAM SQEFPLGIEL LLNVLEVVAP FKHFNKLREF 

451 VQMKLPPGFP VKLDIPVFPT ITATVT FQEF RYDEFDGSIF TIPDDYKEDP 
501 SRFPDL 



BLASTP hits 



Entry CEC01F1_3 from database TREMBL : 

gene: "C01F1.6"; Caenorhabdi tis elegans cosmid C01F1 . 

Score = 371, P = 4.5e-61, identities = 69/138, positives = 96/138 

Entry CEC18F10_9 from database TREMBL: 

gene: "C18F10.7"; Caenorhabditis elegans cosmid C18F10. 

Score = 383, P = 3.4e-39, identities = 103/349, positives = 182/349 

Entry AF064604_1 from database TREMBL: 

product: "KE03 protein"; Homo sapiens KE03 protein mRNA, partial cds . 
Score = 348, P = 8.3e-32, identities = 95/295, positives = 148/295 



Alert BLASTP hits for DKFZphf kd2_46dl3, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6dl3, frame 1 



Report for DKFZphf kd2_46dl3 . 1 



[LENGTH] 506 

[MW] 57003.12 

[pi] 6.40 
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t HOMOL ] 


TREMBL : CEC18F10 9 


gene : 


[BLOCKS] 


BL01288E 




[PROSITE] 


RGD 1 




[PROSITE] 


MYRISTYL 7 




[PROSITE] 


CAMP PHOSPHO SITE 


2 


[PROSITE] 


CK2 PHOSPHO SITE 


9 


[PROSITE] 


PKC PHOSPHO_SITE 


6 


[PROSITE] 


ASN GLYCOS YLATION 


1 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


7.51 



"C18F10.7"; Caenorhabditis elegans cosmid C18F10. 2e-35 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MTGEKIRSLRRDHKPSKEEGDLLEPGDEEAAAALGGTFTRSRIGKGGKACHKIFSNHHHR 

xxxxxxxxxxxx 

ccceeeeeccccccccccccccccccchhhhhhhccccccccccccceeeeeeecchhhh 

LQLKAAPASSNPPGAPALPLHNSSVTANSQSPALLAGTNPVAVVADGGSCPAHYPVHECV 

. . . . xxxxxxxxxxxxxxxx 

hhhhhhccccccccceeecccccccccccccceeecccccceeeecccccccccccceee 

FKGDVRRLSSLIRTHNIGQKDNHGNTPLHLAVMLGNKVTALLRKLKQQSRESVEEKRPRL 

eccchhhhhhhhhhcccccccccccccceeeecccchhhhhhhhhhhhcchhhhhhhhhh 

LKALKELGDFYLELHWDFQSWVPLLSRILPSDACKIYKQGINIRLDTTLIDFTDMKCQRG 

hhhhhhccccceeehhhhhccceeeeccccccceeeeeccceeeeeeeeecccccccccc 

DLSFIFNGDAAPSESFVVLDNEQKVYQRIHHEESEMETEEEVDILMSSDIYSATLSTKSI 

xxxxxxxxxx 

ceeeeeccccceeeeeeeecccceeeehhhhhhhhhhhhhhhhhhhhccceeeecccccc 

SFTRAQTGWLFREDKTERVGNFLADFYLVNGLVTESRKRREHLSEEDILRNKAIMESLSK 

eeeecccceeeecccchhhhhhheeeeeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhc 

GGNIMEQNFEPIRRQSLTPPPQNTITWEEYISAENGKAPHLGRELVCKESKKTFKATIAM 

cceeeccccccccccccccccccccccccccccccccccccccccc hhhhhhhhhhhhhh 

SQEFPLGIELLLNVLEVVAPFKHFNKLREFVQMKLPPGFPVKLDIPVFPTITATVTFQEF 

hhcccohhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeeeeeeeeeehhhhhhhcc 

RYDEFDGSI FTI PDDYKEDPSRFPDL 

cccccccceeeccccccccccccccc 



Prosite for DKFZphf kd2_46dl3 . 1 



PS00001 


82->86 


ASN GLYCOS YLATION 


PDOC0D001 


P300004 


126->130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


373->377 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


8->ll 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


296~>299 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


316->319 


PKC PHOSPHO^ 


SITE 


PDOC00005 


PS00005 


336-5339 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


410->413 


PKC PHOSPHO 


"site 


PDOC0D005 


PS00005 


413->416 


PKC PHOSPHO' 


SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


172->176 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


228->232 


CK2 PHOSPHO 


"site 


PDOC03006 


PS00006 


274->278 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


278->282 


CK2 PHOSPHO" 


"site 


PDOC0000S 


PS00006 


344->348 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


386->390 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


476->480 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


491->495 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


35->41 


MYRISTYL 




PDOC00008 


PS00008 


46->52 


MYRISTYL 




PDOC00008 


PS00008 


108->114 


MYRISTYL 




PDOC00008 


PS00008 


138->144 


MYRISTYL 




PDOC00008 


PSQ0008 


155->161 


MYRISTYL 




PDOC00008 


PS00008 


320->326 


MYRISTYL 




PDOC00008 


PS00008 


487->493 


MYRISTYL 




PDOC00008 


PS00016 


239->242 


RGD 




PDOC00016 



(No Pfam data available for DKFZphf kd2_4 6dl3 . 1 ) 
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DKFZphf kd2_4 6 j 20 



group: metabolism 

DKFZphf kd2_346j20 encodes a novel 224 amino acid protein similar to 2~hydroxyhepta-2 , 4-diene- 
1,7-dioate isomerase. 

The new protein seems to be the human ortholog of 2-hydroxyhepta-2 , 4-diene-l , 7-dioate 
isomerase . 

The new protein can find application in modulating the homoprotocatechuate degradative pathway 
and as a enzyme for biotechnologic production processes. 



strong similarity to 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase 
complete cDNA, complete cds, EST hits, 

potential start at Bp 16 matches kozak consensus ANCatgG 
strong similarity to proteins of worm plant archea and bacteria 
2-hydroxyhepta-2 , 4-diene-l , 7-dioate isomerase is part of 
the tyrosine metabolism (degradation of tyrosine late step) EC 5. 3. In- 
complete cds according to similar C.elegans and A.thaliana protein 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1706 bp 

Poly A stretch at pos . 1686, polyadenylation signal at pos. 1667 



1 CACTTGATGG GAATCATGGC AGCATCCAGG CCATTGTCCC GCTTCTGGGA 
51 GTGGGGAAAG AACATCGTCT GCGTGGGGAG GAACTACGCG GACCACGTCA 
101 GGGAGATGCG CAGCGCGGTG TTGAGCGAGC CCGTGCTGTT CCTGAAGCCG 
151 TCCACGGCCT ACGCGCCCGA GGGCTCGCCC ATCCTCATGC CCGCGTACAC 
201 TCGCAACCTG CACCACGAGC TGGAGCTGGG CGTGGTGATG GGCAAGCGCT 
251 GCCGCGCAGT CCCCGAGGCT GCGGCCATGG ACTACGTGGG CGGCTATGCC 
301 CTGTGCCTGG ATATGACCGC CCGGGACGTG CAGGACGAGT GCAAGAAGAA 
351 GGGGCTGCCC TGGACTCTGG CGAAGAGCTT CACGGCGTCC TGCCCGGTCA 
401 GCGCGTTCGT GCCCAAGGAG AAGATCCCTG ACCCTCACAA GCTGAAGCTC 
451 TGGCTCAAGG TCAACGGCGA ACTCAGACAG GAGGGTGAGA CATCCTCCAT 
501 GATTTTTTCC ATCCCCTACA TCATCAGCTA TGTTTCTAAG ATCATAACCT 
551 TGGAAGAAGG AGATATTATC TTGACTGGGA CGCCAAAGGG AGTTGGACCG 
601 GTTAAAGAAA ACGATGAGAT CGAGGCTGGC ATACACGGGC TGGTCAGTAT 
651 GACATTTAAA GTGGAAAAGC CAGAATATTG AGTTATTTCT TAACAAGTTT 
701 CGAGAGAGAA GGGAGCAAGA CAAGAGCAAG CAACGGCTAT TAAATGTCAC 
751 AATCCTTTAA TTAGAAACCA TTTATTGGCC GGACGCGGTG GCTCACGCCT 
801 GTAATCGCAG CACTTTGGGA GGCCGAGGCG GGCGGCTCAC GACGTCAGGA 
851 GATCCAGACC ATCTTGGCTA ACAGGGTGAA ACCCCGTCTC TACTAAAAAT 
901 ACAAAAAATT AGCCGGGCGT GGTGGCGGGC GCCTGTAGTC CCAGCTACTC 
951 TGGAGGCTGA GGCAGGAGAA TCAATTGAAC CCGGGAGGCG GAGCTTACAG 
1001 TGAGCTGAGA TTGCGCCACT GTACTCCTGG GCAACAGCGA GACTCCGTCT 
1051 CAAAAAAAAA AAAAAAAAAA AGAAACCATT TATTTTAAAA ATGATTAGAT 
1101 TGCTATGCCT CAACTCATAG AAGATGAACC CTTCAAGAAA ACGTGAAGTA 
1151 GAACGGGTGG GCCAGAAATG AAAACAGGCA AGTAAAGTAT TTCTTCGGAA 
1201 AACATTTTAT CAAACCAAAT GTTAAAAAGA CTTTCCTTTT GTAAAACTGG 
1251 ATTAGAGAAG ACTTTTCAGT GGGTTATCTC TAGGATGATC AGTAGTTCAG 
1301 CACTTAAAAA CTGCAGAGAA AACTGAAAGT TATGTTCCAG ATAACTTTCC 
1351 GTTGTTTACC AAATTTTCTT AGATTTGGTC ATCATCAGGA AGCATTTGTA 
1401 AAAATAAAAA TCTCCACAAA TTACTGGCCC ATCTCGGACT TGCTGAATCA 
1451 ATTTGATAGG ATTAATCTCC AGTGAAGCTG TGTTTACAGG GCATTCCAAG 
1501 TGATTCTTAT CAGGAAATGT GAAAAACACT CCTGTACATA ATCGGTTAAT 
1551 TTAAAATTTT ACTTAATAAG TGAACAAGTA ATGAAGATTT CACCTGTTTA 
1601 CTTAGGGTAT CTACCCAGAC CCATCGATTC TGAGTTCGGG AGATGATTTT 
1651 GAAATTACTG TTTTCCAAAT AAAGGTGCTC CCTTCCAAAA AAAAAAAAAA 
1701 AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 
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94039092: Purification, nucleotide sequence and some properties of a bifunctional 
isomerase/decarboxylase from the homoprotocatechuate degradative pathway of Escherichia coli 
C. 

Peptide information for frame 1 



ORF from 7 bp to 678 bp; peptide length: 224 
Category: strong similarity to known protein 



1 MGIMAASRPL SRFWEWGKNI VCVGRNYADH VREMRSAVLS EPVLFLKPST 
51 AYAPEGSPIL MPAYTRNLHH ELELGVVMGK RCRAVPEAAA MDYVGGYALC 
101 LDMTARDVQD ECKKKGLPWT LAKSFTASCP VSAFVPKEKI PDPHKLKLWL 
151 KVNGELRQEG ETSSMIFSIP YIISYVSKII TLEEGDIILT GTPKGVGPVK 
201 ENDEIEAGIH GLVSMTFKVE KPEY 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_46j20, frame 1 

PIR:S44919 ZK688.3 protein - Caenorhabditis elegans, N = 1, Score = 
537, P = 8.7e-52 

PIR:D71109 probable 2-hydroxyhepta-2 , 4-diene-l, 7-dioate isomerase - 
Pyrococcus horikoshii, N = 1, Score = 529, P = 6.1e-51 

PIR:C71425 hypothetical protein - Arabidopsis thaliana, N = 1, Score = 
519, P = 7e-50 

PIR:AS4864 probable 2-hydroxyhepta-2 , 4-diene-l, 7-dioate isomerase bll80 
- Escherichia coli, N = 1, Score = 474, P = 4.1e-45 



>PIR:S44919 ZK688.3 protein 
Length = .214 

HSPs: 



Caenorhabditis elegans 



Score = 537 (80.6 bits), Expect = 8.7e-52, P = 8.7e-52 
Identities = 99/211 (464), Positives = 138/211 (65*) 

Query: 10 LSRFW EWGKN I VCVGRNYADH VREMRSAVLS EPVLFLKPSTAYAPEGS PI LMP A YTRNLH 69 

L+ F IVCVGRNY DH E+ +A+ +P+LF+K ++ EG PI+ P +NLH 

Sbjct: 4 LAGFRNLATKI VCVGRNYKDHALELGNAI PKKPMLFVKTVNSFI VEGEPI VAPPGCQNLH 63 

Query: 70 HELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWTLAKSFTASC 129 

E+ELGVV+ K+ + ++ AMDY+GGY + LDMTARD QDE KK G PW LAKSF SC 
Sbjct: 64 QEVELGVVISKKAS RISKS DANDY I GGYTVAL DM? ARDFQDEAKKAGAPWFLAKSFDGSC 123 

Query: 130 P VSAFVPKEKI PDPHKLKLWLKVNGELRQEGETSSMIFSIPYIISYVSKI ITLEEGDIIL 189 

P+ F+P IP+PH ++L+ K+NG+ +Q T MI F IP ++ Y ++ TLE GD++L 
Sbjct: 124 PIGGFLPVSDIPNPHDVELFCKINGKDQQRCRTDVMIFDIPTLLEYTTQFFTLEVGDVVL 183 

Query: 190 TGTPKGVGPVKENDEIEAGIHGLVSMTFKVE 220 

TGTP GV + D IE G+ ++ F V+ 
Sbjct: 184 TGTPAGVTKINSGDVIEFGLTDKLNSKFNVQ 214 

Pedant information for DKFZphf kd2_4 6 j 20 , frame 1 

Report for DKFZphf kd2_4 6 j 20 . 1 

[LENGTH] 224 

[MW] 24843.07 

[pi] 6.96 

[HOMOL] PIR:S44919 ZK688.3 protein - Caenorhabditis elegans 8e-55 

[FUNCAT] r general function prediction [M. jannaschii, MJ1656] 9e-40 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YNL168c] 4e-38 

[EC] 5.3.3.10 5-Carboxymethyl-2-hydroxymuconate delta-isomerase le-35 

[PIRKW] isomerase le-35 

[PIRKW] intramolecular oxidoreductase le-35 

[SOPFAM] 2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase le-46 

[PROSITE] MYRISTYL 4 

[PROSITE] AMI DAT ION 1 
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[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 3 

[KW] Alpha_Beta 



SEQ MGIMAASRPLSRFWEWGKNIVCVGRNYADHVREMRSAVLSEPVLFLKPSTAYAPEGSPIL 

PRD cccccccccchhhhhhcceeeeeecchhhhhhhhhccccccceeeecccccccccccccc 

SEQ MPAYTRNLHHELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWT 

PRD cccccchhhhhhheeeccccccccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccc 

SEQ LAKSFTASCPVSAFVPKEKIPDPHKLKLWLKVNGELRQEGETSSMIFSIPYIISYVSKII 

PRD cccccccccccceeeecccccccccceeeeecccccccccccccceeechhhhhhhhhhh 

SEQ TLEEGDIILTGTPKGVGPVKENDEIEAGIHGLVSMTFKVEKPEY 

PRD hccccceeeeccccccccccccceeeeeeccccccccccccccc 



Prosite for DKFZphfkd2 4 6j20.1 



PS00005 


104->107 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


192->195 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


216->219 


PKC_PHOSPHO~ 


"site 


PDOC00005 


PS00006 


104->108 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


181->185 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00008 


2->8 


MYRISTYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


116->122 


MYRISTYL 




PDOC00008 


PS00008 


191->197 


MYRISTYL 




PDOC00008 


PS00009 


78->82 


AMI DAT ION 




PDOC00009 



(No Pfam data available for DKFZphf kd2_4 6 j 2C . 1 ) 
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DKFZphf kd2_4 6kl9 



group: transcription factors 

DKFZphf kd2_46kl9 . 3 encodes a novel 130 amino acid protein similar to rat Dcoh, a bifunctional 
protein-binding transcriptional co-activator. 

Dcoh is a bifunctional protein, completed with biopterin. It serves as dimerization cofactor 
of hepatocyte nuclear factor-1 and catalyzes the dehydration of the biopterin cofactor of 
phenylalanine hydroxylase . 

The new protein can find application in modulating/blocking the expression of genes controlled 
by the hepatocyte nuclear factor-1. 



strong similarity to pterin-4-alpha-carbinolamine dehydratase 

potential start at Bp 102 according to similar proteins, 
both genomic sequences are from chromosome 5, 

Sequenced by MediGenomix 

Locus: map="5" 

Insert length: 5641 bp 

Poly A stretch at pos. 5617, polyadenylation signal at pos . 5598 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



CAGCCCTCGG 
ACGCGGCGCT 
CATGTCATCA 
TACTTGACCT 
ATCTACAAAG 
GTCCCGAGTT 
TCAATGTATA 
GAACTGACCA 
TGCTTCTGTG 
GATGGCTGTG 
ATGGCTCATA 
CACCTACATT 
CTGGAACTCA 
TCTATGGAAA 
TTGCTTAAAG 
GAGGCCCCCA 
TTCTCTGATG 
CACTGGAGTA 
ACCTCCACAG 
CTACAATGAA 
TGATAGGATT 
TGTTTTTGCT 
AAGCCTGCCT 
GGAGCGGTCT 
ACTCTGTAGC 
TTGGCCATCA 
GTAAATTAAT 
TGCCATTGGC 
AGGTCACGCA 
TGGGCTTCTC 
TATGTATGAT 
GTTTGCTTTC 
CATCATTGGC 
CACCAAGAAC 
CTCACCTCTC 
TTTGTTTTTC 
AAGTATACTA 
GTGAATGAGC 
CACACACTTA 
ACATATCAGG 
GAGCAGCTGC 
GAGACGTTAG 
GCTGATCCCT 
GAATTGTGGG 
ATTTTAAATG 
TATAGGTTGC 
GTTGTAGAAT 
CTTACAGCCC 
GAGCATTTAA 
CCTGAAAGCC 
CCTGAGAGTT 



CAGACGGCCA 
TGTTGGCGGC 
GGTACTCACA 
TAAAGCAGCA 
AATTCTCCTT 
GCCCTACAAG 
CAACAAGGTC 
AAAAAGATGT 
TGATTTCTTC 
TTAACATATG 
ATGACAGTGG 
AGGGTTTGAC 
CAGACTTTAC 
TGCTCATGGT 
TAACTCACGT 
GGTTCCTGTC 
TGGTAAGCTT 
GAGAGGAGTT 
ATAGCAAACC 
GTTAATGAAA 
TAGGAAACCT 
ATAGACAAAA 
CGGTTAATAT 
GTACACTTTC 
TTTCAGTTTT 
TATGTGAGCT 
GACTGTCCAG 
TGACTCTCCC 
GAGCATGAGC 
ATCCCAGGAT 
TTCAGTAGGC 
CACTCACTCA 
TTCAGAAACA 
AACTGGGCTC 
CAAGCAGCAT 
CCTGAAAGTA 
CTGAGTTTCC 
ACAGGGATCC 
CTGAGGGCCT 
GCAGGTAGAA 
CCCAGGAGGC 
GGGCATATAA 
GAGGGAAACA 
GACATTAATC 
GAGAAAATGA 
CCACAAAGTA 
ACCAGGGACA 
AAGAACTTTG 
TACAACACAG 
AAAGGAGTCA 
GAACAGAGCA 



ATGGCGGCGG 
GCTGCGAGGC 
GGTTGATTGC 
GGATGGTCGG 
CCACAATTTT 
CAGAGAAGAT 
CAGATAACTC 
GAAGCTGGCC 
CAAAATACAT 
TCACGTGTAG 
TGAAGACCTG 
ATAGGTCTAT 
TATAGAGAAT 
GGTAAATTCC 
TTCAATTTGA 
TGTTCCAAAT 
TGGCTTTCTT 
AAACAGACAT 
GGGCCGACAC 
GTTCTGAAAA 
CTGGATAAAT 
AGCAGCAGCA 
ATTGAACTAT 
TGATTCAGCA 
GTAAAGTTAT 
TTGTGTTTCA 
AGGACTTCAG 
CGGCTATCTG 
TGCTGCTGAA 
GCCTGCCCTG 
CCTGGATCAG 
GCTGGAGTTT 
GCATTCATCT 
TTCTCTGTCA 
GAAAGAATTC 
TGCTTTGGTG 
TGGAGATGAA 
CTGATGCCAT 
TCTGTGTGCC 
ACAGATGGAG 
CCCTGTGGAT 
CTAAAGGACA 
ATGAAGACGG 
ACGGTGATTC 
GTACGTAAGA 
TTTTCCTACC 
GCAGAGATGG 
GTGTCCAGGA 
GGCTACCCAG 
GGAGAAGGTG 
AAAATCCCTA 



TGCTCGGGGC 
CAGAGCCTAG 
AGAGGAGAGG 
AATTAAGTGA 
AATCAGGCAT 
GAATCATCAC 
TCACCTCACA 
AAGTTTATTG 
AAGTCTGAGA 
CACAGTGGAG 
CGAATGAAGT 
GTTATGGCTC 
CAAAGATCCC 
AACAGAATGA 
AAGAGATATT 
CTTTGCATGA 
CTGTTTTCTT 
GACCTTTGAC 
ATGGTTGACG 
TAGTGATTAC 
ACCTTAAGCA 
TGTACATTGT 
TGGACCACTA 
TTCAGAAACA 
CGGAAAAACA 
ATGCCAGTTA 
GGTCACCAAG 
TGGCTGAGAT 
AGGGCACAGG 
CCCACCAATC 
CTTGTCACCT 
CATTTCCAGA 
GTGGCTGTGC 
CTTTCAGTGG 
TTTACATTTT 
CTTAAAGAGA 
ATCCTGTTGT 
TATTTTGTAT 
CTAGGGGATT 
AGCTGATGCG 
GGATGTTGGG 
TAGCAGGAGT 
AGAAGATGGG 
TTAAAACTTT 
TGTTATTTCC 
ATGAATGGTC 
TGGGGTAGTT 
GATTGACCAA 
ATCCCACTGT 
AGTGGGGTGA 
TTACTTTTGT 



GCTCGGGGCG 
GGCTAGCGGC 
AACCAAGCTA 
GAGAGATGCC 
TTGGCTTTAT 
CCAGAATGGT 
TGACTGTGGT 
AAAAAGCAGC 
GGCTAAACTT 
AAAGCAGGAT 
TGCTAGTTAA 
GCTGCATCTG 
GTATCCGAAG 
AACACCAAAC 
GTCAAAATTG 
TGACAGTGGT 
TCTAAAAGAT 
CTCTTGCATG 
ATGTCCTTTT 
TTTCTGACAT 
TGGCTGTTTA 
ATTTGGACAC 
GGGTTAGTAG 
TTCTAGGTGG 
TCGGGAGGGT 
CTCAGGATTA 
CTGCTGCACC 
GGTGCTGCTT 
AGATGGCCCT 
CATGAGAAGA 
CTGGTTTCCT 
CTAAAGTCTT 
TGATGTAGTA 
GCTACCTTCC 
TAATCTCTTT 
GAAGTCACAA 
CCCTAGCTAT 
ATTCATACGG 
GAGCACAGTG 
GGCTGTCTTA 
CAGGAGCCCT 
TATAGGAGGA 
GCTAAAGTTT 
GCTGTTGATG 
CAGTTCAGTA 
ATATATACTT 
ACTTCCTTTT 
TTTAGCCACT 
CCTGATTTGC 
ATATATTAAT 
ACTTAAAACA 
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2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 
4651 
4701 
4751 
4801 
4851 
4901 
4951 
5001 
5051 
5101 
5151 
5201 
5251 
5301 
5351 
5401 
5451 
5501 
5551 
5601 



TCTCTGCCAC 
TGCACATCCC 
CTCCTTTCTT 
TTCTGTCATG 
CTCCAACGCC 
TGGGACTACA 
TTTTTTGTAG 
TGGGCTCAAG 
CCTTTTTTTT 
GGAGTGCAAT 
CAGGTGATTC 
TGCCACCATG 
TTTCGCCATG 
GCCTGCCTTG 
CCTCGTCGCT 
ACTATGCTGC 
ACCTTAGCCT 
CCTCACTATT 
ACCCATCACA 
CTGGATGTGC 
TATATATGAC 
TCAGCTGGGG 
GTGACTTCCA 
CTTTTTGCTG 
ATAGACCCAC 
CAACAGAATG 
CCCTTCTCTG 
AGAGATTTTC 
GAAAATGTCT 
CACCAGTTGA 
AAGTAAATAT 
GTGTGAGGGC 
GTCTGCTTAC 
TTGGTGAGGC 
ACCAGTTTGA 
TTTATAATAG 
TATATTAGAA 
TAATCCCTGC 
TGAGACCGGC 
CAGATTAGCC 
TGCTGAGGCA 
CCAGGATTGC 
CTCAGAAAAA 
TTTTTTAACT 
AACAGTAATA 
TAAACAGGAA 
GTTAAAATTT 
CACAGGTGAG 
TTTCGCTGCA 
TCTAAATCCT 
GATGGACATA 
GTTGTTTTCA 
TAGTTGAAGT 
ATGGTTCTCA 
GTTTTTGAGA 
TCACATTTAA 
TGCATCTCCC 
TGTCTTTGGT 
GGTATTCAGT 
GATGATTATC 
ACTGTGTTCC 
TAAAGATTTG 



ATGTGCTCAC 
ATCCTATGCC 
TTTTTCTTTT 
CAGACTGGAG 
TGGACTGAAG 
GGTGCATGCC 
AGACAGTCTC 
TTATCTTGCT 
TTTTTTTTGG 
GGCACGATCT 
TCCATCCTCG 
CGCGGCTAAT 
TTGGCCGGAC 
GCCTCCCAAA 
GATTTTTATT 
TCAGGCTGAT 
CCCAAGTTGC 
ACCTTCTTTG 
ATCTTTTCTT 
TCTCTTTCTC 
CCTTAACTTT 
GCACTTCTTG 
GCTTTAACCC 
AATGGAGGCA 
TTGGGAGCTC 
CCCAAGAGTG 
CTGCCTCTCA 
TGTTGGATGC 
CAACACCGTC 
CACTACATCG 
TATCTATCTC 
AGGGTGTCAG 
CAAAATAAGT 
TGGCATGCTA 
CCAGTTTGAC 
ACCCCATCTC 
AAGTATATAC 
ACTTTGAGAG 
CTGGCCAGCG 
CGGAGTGATG 
GGAGAATCCC 
ACCACTGCAC 
AAAAAAAAGA 
TATGAGAATG 
CGTACTCTGA 
AAAGAATCAC 
TGGCATATTT 
ATAATTTTGA 
CACAAAAACA 
GAAAAGTGCA 
CCATAATTTA 
AGCTTCTCAG 
TATTTCATTT 
GGCTTTTCAA 
GTGCTCAATC 
GTCATTGCTA 
TGATTGGTGT 
GTTTATTCCA 
TTTTTATTAC 
AGTTTTGCTT 
CAGTGCCAAC 
TTGAGTCAAT 



TCTTTATATT 
TGCAGTTAGC 

TGCAGTGGTG 
TGATCCTCCT 
ACCACACCCA 
ACTATCTTGC 
GCCTCAGCCT 
AGATGGAGTT 
TGGCTCACTG 
GCCTACTGAG 
TTGTGTATTT 
TGGTCTTAGA 
GTGCTGGGAT 
TCTTATTTTT 
CTCAAACTCC 
TGGGATTATA 
CTTCTCTTGT 
GTCCTTCCAG 
TTAGAGCCCA 
TTCTAACACA 
AAGGGAGGGC 
AGAGCCTCAT 
CTCAGTCTCC 
CCCACTTCAG 
ACCTCATAAA 
GAATCCAGAC 
TAAAAGCAAG 
ACCAGCGCCA 
GTGGGATTTT 
CATTGCCTGT 
TGTTCTCTCA 
TGCATGTCCT 
TTACCTTTAT 
TGTTAGATAA 
AAATCAGATA 
ATTCTGGCTG 
GCTGGGGCGG 
TGGCGAAACC 
GTGTGCACCT 
TTTAACCTGG 
ICCAGCCTGG 
AGAGGAAAAA 
TGTTCATTTC 
GAAAAATTGC 
CTATAACCTC 
CCTGCTGATT 
ACAGAGAATT 
AAAGATATAA 
TAGACATATT 
CTTACACAGT 
TGCTGGAAAA 
CAGGTTATAT 
AAGAGCTGGT 
ATACTACACT 
ATTTTATAAA 
TGCTGTAGAA 
TGAATAGATT 
TGATGTGAGC 
AGTAGACTGG 
TAGATTGCTT 
GAAAAAAAAA 



CTGTTTAGGT 
CAACTCAGGG 
TTTTAAGAGA 
TGATCACAGC 
GCCTTGGCCT 
CCTAATTTTT 
TCGGGCTGGT 
CCCATGGGTA 
TCGCTCTTGT 
CAGTCTCCAC 
TAGCTGAGAT 
TTTTTTAGTA 
CTCCTGACCT 
TACAGGCATG 
TTTTTAGAGA 
TGGCCTCAAG 
AGTGTGAGCC 
TTTCTTTTGT 
GTGTTTTCCA 
GAGAACTTGC 
TTATTAAGGG 
CTTTGTGTGG 
GATTGCTGGG 
TTGGGAAGAG 
GGGCCTACAC 
GCAAGGATTC 
GCTAAGGAAA 
GAATAAAAGT 
CTCGAGAGTC 
GCCCAACATT 
TAAGAAATGT 
GCCTCTTCCC 
TGACAATCTG 
GTGCCCTGTA 
TCAGAAGGCT 
ATGAAAATTA 
GGCACGGTGG 
ATCACTTGAG 
CCATCTCTAC 
GTTGTCCCAG 
GGGGCGAAGG 
GTGACGGAAC 
GAAAAATATA 
ATTTGTAACA 
AAAGCACAGA 
ACCATCCATA 
TTTTCTACTG 
TTGTATCTTT 
AAATGGATCA 
TTAGTGCCTG 
CCTTTTTGTT 
AATACTGAGA 
TATCTTGGGT 
CAGTTTTTAT 
GTTGCCAGCA 
CAAAAACAAT 
CATATTTGGA 
GTGTGCCCAT 
ATGTGTATGG 
CAATATTTAG 
GATATGTAGT 
AAAAAAAAAA 



GGTTTATATG 
TTTATATTGC 
TGGGGTCTCG 
TCATTGTAAC 
CTCTGGTAGC 
TTTATTTTTA 
CCTGAACTCC 
ATCTTTATTT 
CGCCCAGGCT 
CTCCTGGGTT 
TACAGGCAAC 
AGAGATGGGG 
CAAGCGACCT 
AGCCGCTATG 
TGGGGGTCTC 
TGATCCTCCC 
ACTATCCCTA 
TCTAAGTCAA 
GTGCTGTGCC 
TTTTCCCCCT 
CCTGTGTCTA 
TCTGTTTCTA 
TGCCCATAGC 
AGAATCCATG 
ACTGGTAATG 
CCTTCGTGGC 
ATCCCTAAGC 
TGAAAATTTG 
ATTTCTAGTT 
CAAGAAATTT 
GCTAGTAGAA 
TCAGATACTC 
GTTTCTATGA 
GACTTGAATG 
TTTCTCTTTT 
CATATCTTGA 
CTCACGCCTG 
GTCAGGAGTT 
TAAAAATACA 
CTACTCAGGA 
TTGCAGTGAG 
GGGACTCTGT 
TATTCTATAT 
TATAATGGGA 
TAAATCCAAA 
GACAGACACT 
CTGATTTTTG 
GGTTTTTGTG 
TAAACATTTT 
TATTTCACAA 
AGATGTTTAA 
TAGACATGTT 
CAGAGAATGA 
GCCTCTGGCA 
TTAGATCTTA 
GGTTTTACTT 
GAAGTTTGTT 
TTTCTCTTGG 
GTGATTATTT 
TCTTGCTGTC 
TGCCACTCAA 
A 



BLAST Results 



Entry AC004764 from database EMBL: 

Homo sapiens chromosome 5, PI clone 255g5 (LBNL H61), complete 
sequence. 

Score = 11057, P = 0.0e+00, identities = 2217/2224 
Bp 428-5625 of cDNA == Bp 2912-8107 of AC004764 

Entry HSAC1555 from database EMBL: 

Homo sapiens (subclone l_d8 from BAC H75) DNA sequence, complete 
sequence . 

Score = 575, P = 5.1e-30, identities = 115/115 
Bp -240- 430 of cDNA == HSAC1555 splice pattern 
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Medline entries 



93186787 : 

Phenylalanine hydroxylase-stimulating protein/pterin-4 
alpha-carbinolamine dehydratase from rat and human liver. 
Purification, characterization, and complete amino acid 
sequence . 

93101632 : 

Identity of 4a-carbinolamine dehydratase, a component of 
the phenylalanine hydroxylation system, and DCoH, a 
transregulator of homeodomain proteins. 

95242099: 

Crystal structure of DCoH, a bifunctional, protein-binding 
transcriptional coactivator 



Peptide information for frame 3 



ORF from 21 bp to 410 bp; peptide length: 130 
Category: strong similarity to known protein 



1 MAAVLGALGA TRRLLAALRG QSLGLAAMSS GTHRLIAEER NQAILDLKAA 
51 GWSELSERDA IYKEFSFHNF NQAFGFMSRV ALQAEKMMHH PEWFNVYNKV 
101 QITLTSHDCG ELTKKDVKLA KFIEKAAASV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6kl9, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6kl9, frame 3 



Report for DKFZphfkd2_46kl9.3 



[LENGTH] 130 

[MW] 14377.56 

[pi] 9.17 

[HOMOL] PIR:A47189 pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1.96) - rat 4e-34 

[FUNCAT] 01.07.99 other vitamin, cofactor, and prosthetic group activities [S. 

cerevisiae, YHL018w] 5e-04 

[SCOP] dldchg_ 4.38.1.1.1 Pterin-4a-carbinolamine dehydratas 4e-50 

[EC] 4.2.1.96 Tetrahydrobiopterin dehydratase 6e-34 

[PIRKW] nucleus 6e-34 

[PIRKW] carbon-oxygen lyase 6e-34 

[PIRKW] homotetramer 6e-34 

[PIRKW] hydro-lyase 6e-34 

[PIRKW] cytosol 6e-34 

[PIRKW] acetylated amino end 6e-34 

[PIRKW] homodimer 6e-34 

[SUPFAM] pterin-4-alpha-carbinolamine dehydratase 6e-34 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKCPHOSPHOSITE 4 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 14.62 % 

SEQ MAAVLGALGATRRLLAALRGQSLGLAAMSSGTHRLIAEERNQAILDLKAAGWSELSERDA 

SEG . xxxxxxxxxxxxxxxxxxx 

ldChB CCCCHHHHHHHHHHHHHHCCEEECCCCE 



SEQ IYKEFSFHNFNQAFGFMSRVALQAEKMNHHPEWFNVYNKVQITLTSHDCGELTKKDVKLA 

SEG 

ldChB EEEEEECCCHHHHHHHHHHHHHHHHHHCCCCEEEETTTEEEEEECBTTTTBTCCHHHHHH 

SEQ KFIEKAAASV 
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SEG 

ldchB HHHHHHHHHH 



Prosite for DKFZphf kd2_46kl9 . 3 



PS00005 


11->14 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


32->35 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


56->59 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


113->116 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


56->60 


CK2_PHOSPHO" 


SITE 


PDOC00006 


PS00006 


105->109 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


113->117 


CK2_PHOSPHO" 


"site 


PDOC00006 


PS00008 


6->12 


MYRISTYL 




PDOC00008 


PS00008 


20->26 


MYRISTYL 




PDOC0C008 



(No Pfam data available for DKFZphf kd2_46kl9 . 3 ) 
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DKFZphf kd2_4 6m4 



group: signal transduction 

DKFZphf kd2_4 6m4 .3 encodes a novel 198 amino acid putative GTP-binding protein related to the 
SAR-1 family of Ras superfamily members. 

SARI proteins are involved in vesicular transport between the endoplasmic reticulum and the 
Golgi apparatus . 

The new protein can find clinical application in modulating the transport of vesicles to the 
Golgi Apparatus, thus enabling post-translational modifications of the vesicles contents. 
Blocking of the molecule is expected to result modulation/blocking of secretory pathways. 



nearly identical to mouse GTP-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map="438.9 cR from top of ChrlO linkage group" 
Insert length: 299S bp 

Poly A stretch at pos . 2969, polyadenylation signal at pos . 2958 



1 ACATCCGGCG AGTAGCTGGC GGTCCCGGGT GCTGCTGGTT AGTGTGCTCT 
51 GAGGGAGGGT CCGAGCCAGC CGCTGTTTTG CCGGAGGAGC CCCTCAGGCC 
101 GTAGTAAGCA TTAATAATGT CTTTCATCTT TGAGTGGATC TACAATGGCT 
151 TCAGCAGTGT GCTCCAGTTC CTAGGACTGT ACAAGAAATC TGGAAAACTT 
201 GTATTCTTAG GTTTGGATAA TGCAGGCAAA ACCACTCTTC TTCACATGCT 
251 CAAAGATGAC AGATTGGGCC AACATGTTCC AACACTACAT CCGACATCAG 
301 AAGAGCTAAC AATTGCTGGA ATGACCTTTA CAACTTTTGA TCTTGGTGGG 
351 CACGAGCAAG CACGTCGCGT TTGGAAAAAT TATCTCCCAG CAATTAATGG 
4 01 GATTGTCTTT CTGGTGGACT GTGCAGATCA TTCTCGCCTC GTGGAATCCA 
451 AAGTTGAGCT TAATGCTTTA ATGACTGATG AAACAATATC CAATGTGCCA 
501 ATCCTTATCT TGGGTAACAA AATTGACAGA ACAGATGCAA TCAGTGAAGA 
551 AAAACTCCGT GAGATATTTG GGCTTTATGG ACAGACCACA GGAAAGGGGA 
601 ATGTGACCCT GAAGGAGCTG AATGCTCGCC CCATGGAAGT GTTCATGTGC 
651 AGTGTGCTCA AGAGGCAAGG TTACGGCGAG GGTTTCCGCT GGCTCTCCCA 
701 GTATATTGAC TGATGTTTGG ACGGTGAAAA TAAAAGAGTT TTACTTCTCT 
751 GGACTGATCC TATTCACAGC TTCCTCATGA ACTTTTCTAA TAGAACAAGG 
801 ATAGCTCTCC AACCATGTCT GGCGTTGAGA AGCCAAGAGT CTCTGTCAAC 
851 TCTCTCATTG CCCAGTGGTG ACATGTGCTC TTCTCCACAC TGTTGGGAGG 
901 TAATGCTGCC CCACGTGCTG GTGCAGGTCA GTATCCTGGG ACTTGGAAGC 
951 TGGCAGGATT TGCCGGGTAA AGCTGTATGC CATCATGGGG CACCTGAAAA 
1001 GAAAAACACG TCTCACCACT GTGGTTGATT CAAAAGAAAG TGATTCTATT 
1051 TTTTAAAGAA AGCGTTGTTA ATGTAATTGG TATCCCTCCT AACTTTTTGA 
1101 GTTCACAATT TACTTGGTCC AGAGTTTTCT ATTCTTTTTT TTTTTTTAAA 
1151 CTAATGAATG ACATTTAGAT ACTTCATAAA ATTATGAACA GATATGGAGG 
1201 CCAGAGCTCA TTTGGGTAAA CTTACTCCTG CTGAGTTAGC AGGTTGGTGA 
1251 GAGAAGCTCC CCTGAGCTCA CCTGTCTCTC TGACTGCCTT GGAGTAGGTG 
1301 GCATAACCTT GTGCACAGAG AACTAGAAAA GGGGCAGAAC CCCGGCCTTG 
1351 CAGTTGTGGC AGGTTTCCAC TGTGGTAAGC TAGGTTCATT CCTCATCAAG 
14 01 GAATGTGTAG CAGATTGTTC ACTGTGGAGG AGGTAATTAT AGAATGGGTT 
14 51 ATTGTTGTTA TTCTTACTCA TGAAGTTACA GATTTTAGCC AGTCTTTGCT 
1501 TTTATACTTT TGTGAAATTT AATTTCTCTC TATAGCACCT TCCTTTTTCG 
1551 TTTTCAGTTA TCAAAAGTGA CTTTGACCTC ATAAGAGAGT TGAGAACATC 
1601 TCTCGTGTCA CATACTGCAG GTGCATCAGT TACTTTTGCA CAGATTCTAG 
1651 GGGGACATTT TTCTGAATAG GAAGACAGGA CAAAGTTAAC AGCTTAAGGG 
1701 CTCTTAATTC TGTGAGTTGA GGACTTAAAA GTATTGTAGC ATTTGTTTGG 
1751 ATCCATGAAA AATGTATTCA GTGGGCTTTA AAATTTCCAT TTGCAGAATT 
1801 TGGTCTCTCA GGCTGTTTGG GAGCTCTTTT TTTTACATTT TTTCTCCTTT 
1851 GACACCTATT TTATTGGTGT TTAAAGTAAA GGTTAACATC TGTAGCTTTT 
1901 CCAGGTTTTT TTTTTTTTTT TTGATATGAA ATTGTCTTTC TCCATTGCAG 
1951 AAATAAGCTA GGGAAACACT AACCCAAAAA CTTTCTGTAG AGCTGTTCCT 
2001 TTGGAGGCAG CATCACTTAT TGGCAGTAAA GACTCAGTAT AAAAGCACCA 
2051 GCATCCCTAC TTGGGTGATG GGGATTAATT TTATAGCATT CCATTTTCCT 
2101 AGTGCCACAT GTGAAATTGG ATTTTGATGA TCTTAATCTA TATTCTACCC 
2151 TTATAATAAA AGATCAAAAG ATATATCTCC TATGAACAGA TTGGAGATAG 
2201 GAGATGAAAA GTTGGGAGGA TGCCTTTATT CTAATGTGAG GGTAGGGAAA 
2251 ATGTGGATAA CATTACTGGG GTGAAGGAGG CATTGTTCTT TAGTTGGAGT 
2301 TCTCATTTTT ATTCTCCAGT ACTGACTTGT GGGGAAAGCA TACTTTTTCA 
2351 CTGCCAGGTA CTGAATGCAG AGGCTCAGTG AAGTATATAT GTGGGAAGTG 
24 01 CATGCATTTC GTTTATTAGC AAACATAGCT GGATTAAGAC GAAGTTGTTG 
24 51 GTTTGGAAAG GGGTTAAAGC CTTAAGTGAA CAAATCTAGC TAACAGTGAA 
2501 TGAACTAGGT AATATAACTT GCATATTTTT AATTTCCTTT GGTTAAAGGT 
2551 CCCCCATACT TCTCTGTTCG GAGACATGAG AAGTATGATT ACTTCAGTGT 
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2 601 TAGTTTTCTT AATTTTTTTT TTCCCCTATT TGTCCCTTGT CACTTTGTTG 
2 651 CAAGCTAGAA ATCTGTGGGT TATACATAGG GCAGCTCTTT GCGAAAGTGG 
2701 TTTATTCCAC TGGAGAAAGG GGATTGAAAA TCAGTTAGAA CCAATGTATT 
27 51 TCTTGCCCCA CGGAACACTA TTCCTATAAG ATAGCTGAAA GAAGCTGCTG 
2801 TGAGGAGCTC AGCTCCAACA CAGGATCAGC ACCTTGTATA GGAATTCCCA 
2851 TGAATTATGA CTTCTCATTC TGTTTTATCA GAGTGCATAT ATGTCCTACT 
2901 TCAGGAAAAG TAAAACAGTC ATTTACGAAA GAAAGTCAAT CTGTATCCTA 
2 951 AGCATTTTAA TAAAAAGTTA AAACAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS679348 from database EMBL : 
human STS WI-16722. 
Length = 265 
Minus Strand HSPs: 

Score = 1242 (186.4 bits). Expect = 2.8e-50, P = 2.8e-50 
Identities = 260/265 (98%) 



Medline entries 



94085558: 

Molecular analysis of SARl-related cDNAs from a mouse 
pituitary cell line. 



Peptide information for frame 3 



ORF from 117 bp to 710 bp; peptide length: 198 
Category: strong similarity to known protein 



1 MSFIFEWIYN GFSSVLQFLG LYKKSGKLVF LGLDNAGKTT LLHMLKDDRL 

51 GQHVPTLHPT SEELTIAGMT FTTFDLGGHE QARRVWKNYL PAINGIVFLV 

101 DCADHSRLVE SKVELNALMT DETISNVPIL ILGNKIDRTD AISEEKLREI 

151 FGLYGQTTGK GNVTLKELNA RPMEVFMCSV LKRQGYGEGF RWLSQYID 



BLASTP hits 



Entry S39543 from database PIR: 
GTP-binding protein - mouse 
Length = 198 

Score - 1029 (362.2 bits), Expect = 5.1e-104, P = 5,le-104 
Identities = 197/198 (99%), Positives = 198/198 (100%) 

Entry SARA_MOUSE from database SWISSPROT: 
GTP-BINDING PROTEIN SAPA. 
Length = 198 

Score = 1012 (356.2 bits), Expect = 3.2e-102, P = 3.2e-102 
Identities = 195/198 (98%), Positives - 196/198 (98%) 

Entry CEZK180_4 from database TREMBL: 

gene: "ZK180.4"; Caenorhabditis elegans cosmid ZK180. 
Length = 193 

Score = 679 (239.0 bits), Expect = 6.3e-67, P = 6.3e-67 
Identities = 125/197 (63%), Positives = 161/197 (81%) 



Alert BLASTP hits for DKFZphf kd2_46m4, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_46m4, frame 3 



Report for DKFZphf kd2_46m4 . 3 



[LENGTH] 198 

[MW] 22367.00 

[pi] 6.21 

[HOMOL] PIR:S39543 GTP-binding protein - mouse le-112 
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[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YPL218w] 

le-58 

[ FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YPL218w] le-58 

[ FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR094w] 2e-23 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 
palmitylation, f arnesylation and processing) [S. cerevisiae, YPL051w] 4e-22 

[FUNCAT] 30.08 organization of golgi [S. cerevisiae, YDL192w] 3e-20 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YBR164C] 3e-19 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YMR138w] 2e-09 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YMR138w] 2e-09 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YHR168w] 7e-05 

[FUNCAT] 30.02 organization of plasma membrane [S. cerevisiae, YHR005c] le-04 

[FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YKL154w] 

le-04 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YHR005c] le-04 

[ FUNCAT] 10.05.07 g-proteins [S. cerevisiae, YHR005C] le-04 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YKL154w] 
le-04 

[FUNCAT] 03.19 cellular import [S. cerevisiae, YMLOOlw] 3e-04 

[BLOCKS] BL00395A Alanine racemase pyridoxal-phosphate attachment site proteins 

[BLOCKS] BL01019B ADP-ribosylation factors family proteins 

[BLOCKS] BL01019A ADP-ribosylation factors family proteins 

[BLOCKS] BL01020D SARI family proteins 

[BLOCKS] BL01020C SARI family proteins 

[BLOCKS] BL01020B SARI family proteins 

[BLOCKS] BL01020A SARI family proteins 

[SCOP] dlplj 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 7e-36 

[SCOP] dlguaa_ 3.25.1.3.10 RaplA (Human (Homo sapiens) 8e-40 

[SCOP] dlrrf 3.25.1.3.5 ADP-ribosylation factor 1 (ARF1) [rat (Rattu 2e-55 

[SCOP] dlhurb_ 3.25.1.3.4 ADP-ribosylation factor 1 (ARF1) [human (Horn le-58 

[SCOP] dlgota2 3.25.1.3.3 (1-54,171-326) Transducin (alpha subunit) [ra 2e-33 

[SCOP] dltadb2 3.25.1.3.2 (1-30,152-316) Transducin (alpha subunit 6e-36 

[PIRKW] glycoprotein 4e-19 

[PIRKW] monomer le-16 

[PIRKW] P-loop 3e-64 

[ PIRKW] lipoprotein 4e-19 

[PIRKW] GTP binding 3e-64 

[SUPFAM] ADP-ribosylation factor 5e-22 

[PROSITE] ATP_GTP_A 1 

[PROSITE] MYRI ST YL 3 

[PROSITE] SARI 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 1 

[PFAM] ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

IKW] Alpha_Beta 

[KW] 3D 

SEQ MSFIFEWIYNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPTLHPT 
lhurA TTTTTCCCCEEEEEETTTTCHHHHKHHHCCCCEEEEEEETTEE 



SEQ SEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHSRLVESKVELNALMT 

lhurA EEEEEETTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHH 

SEQ DETISNVPILILGNKIDRTDAI SEEKLREI FGLYGQTTGKGNVTLKELNARPMEVFMCSV 

lhurA TTTTTTTEEEEEEETTTTTTTCCHHHHHHHHCGG 

SEQ LKRQGYGEGFRWLSQYID 

lhurA 



Prosite for DKFZphf kd2_46m4 . 3 



PS00001 


162->166 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


25->28 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


158->161 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


164->167 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


60->64 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


72->76 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


111->115 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


164->168 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


32->38 


MYRISTYL 




PDOC00008 


PS00008 


68->74 


MYRISTYL 




PDOC00008 


PS00008 


155->161 


MYRISTYL 




PDOC00008 


PS00017 


32->40 


ATP GTP A 




PDOC00017 


PS01020 


171->197 


SARI 




PDOC00782 
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Pfam for DKFZphf kd2_46m4 . 3 



HMMJSAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

*GMgWf SIFrkMWGlWNKEMRILMLGLDNAGKTTILYMLKlgEIVTTIPT 
++ FS++++++GL++K++++++LGLDNAGKTT+L+MLK++++ +++PT 
9 -YNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPT 56 

IGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGI IWVVDSaDRD 
+++++E++++ +++F+++D+GG++++R++W++Y P+++GI+++VD+AD++ 
57 LHPTSEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGIVFLVDCADHS 106 

RMeEaKqELHaMLNEEELrDAPlLIFANKQDLPgAMSesEIREaLGLHel 
R+ E+K+EL+A++++E ++++P+LI++NK+D+ +A+SE+++RE+ GL+ + 
107 RLVESKVELNALMTDETISNVPILILGNKIDRTDAISEEKLREIFGLYGQ 156 

RCn RPWYlQMCCAVtGEGLYEGMDWLSNYInkRkK* 

+++ RP++++MC++++++G++EG++WLS+YI 
157 TTGKGNVTLKELNARPMEVFMCSVLKRQGYGEGFRWLSQYI 197 
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DKFZphf kd2_47a4 



group: transcription factor 

DKFZphf kd2_47a4 . 1 encodes a novel 280 amino acid protein with similarity to zinc finger 
proteins . 

The new protein is a putative transcription factor with one C2H2 zinc fingers. 

The new protein can find application in modulating/blocking the expression of gene's controlled 
by this transcription factor. 

similarity to C.elegans F46B6.7 

potential frame shift at 1092, will be checked see BLASTX 
Sequenced by MediGenomix 
Locus: map="7q31" 
Insert length: 1756 bp 

Poly A stretch at pos . 1737, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



CCCTTTTCTT 
TCGCCCGAAT 
GTAGGTTATG 
AGTCCAGGTG 
TTTCTGTGAA 
ACATGATTAT 
GATTTCCAAA 
CATCACAGAT 
AAGAACAAGA 
ATTCTTAGAG 
ACAGCAGCAA 
GCAATGAAGA 
AGAGAACATG 
TGAATTTTTG 
ACTGTGAGAA 
AAAAAACAGC 
TTATGTCATC 
TGGAAGATGA 
TGGGAAGAAC 
AGAAACAATT 
ATCTTCTCAA 
AAACTGGTCA 
GCCATGTGAA 
ACTAAACACA 
GTATTATTTT 
ACAGTGAAAG 
ATCAGTGAAG 
GAACCAGTTG 
AGAAGCAATT 
TTTAAATTTG 
CAAAAATGAA 
AGAACATGAA 
TTGGCTGTAA 
AAACTCATAA 
GTTCAAATTA 
AAAAAA 



TTCTGCCGGG 
GCAGGAAGAC 
GGGTAAGGAT 
GCACCACCAC 
GAACATTTTC 
TGAGCATAAG 
GGTACATTTT 
TTTTGTAGTG 
GAATTATTTT 
AAGAGCTTCA 
GAACGAAATG 
ATTCCTTGGA 
CTTTCAACAT 
TGTACATTAC 
GACCTTCAGG 
ATCGTAAGAT 
AATTATTTGG 
TCGGGAGTTG 
ACCCTGCCTC 
GAGAAGTTGT 
AATAAAGTCA 
ATTTTATTCG 
GTTCAAATCC 
CTTCGCTGCT 
CCAACCTATG 
TGACCTGACA 
ATACATCTAA 
CTACTATAAG 
TTTCATGTTT 
AACATCAACA 
TGTTCTTTTC 
AAAAAAATGA 
AGTTTTATTG 
TTATATATAG 
TTTATAAACC 



TAATGGCTGC 
CGTGATGGGA 
TGTATCCTGG 
TTTAGAAGGT 
CTGTGGCTGA 
ATTGTCATAG 
ATATTGGAGG 
TAATAAGAAT 
TTGTTATGTG 
GAAACAGAGA 
ATAACAATTT 
AACAGATCTG 
TGGATTGCCA 
AG A AAAAGC T 
GGCAAAAATA 
TAATCCTAAG 
AACTTGGAAA 
CTGGACCATC 
TGCAGTCTGC 
ATGTCCACAT 
GAACTTGGAT 
GAGGCAAGTT 
AAAGCAGACT 
CCCCGATAGA 
AAAATGACAC 
GCTCAGGAAC 
ACTGTATGCT 
AGTACTTGAA 
TTCTCCTATG 
AAAGATTGGT 
AAAAAATAAA 
AGTAGGAAAA 
TGTGATCATC 
AAGTATATGT 
TGATTTTTCA 



TTCCAAGACC 
GCTGCAGCAC 
AGCCGCTTTC 
TCTCCATCTG 
ACAAGACAAA 
CTGATGTCAA 
AAAAGGTTCA 
TAATTCCACT 
ACGTTTTACC 
CTGAGAGAAA 
TCATGGCGTT 
TTATTTTGAA 
GACAACATTG 
TGACAATTTG 
CACTTAAAGA 
AAC AGAGAAT 
ATCGTGGGAG 
AGGAAGATGA 
TTATTTTGTG 
GGAGGATGCA 
TAAATTTCTA 
CACCAATGCA 
TAAGAACTCA 
AAGACGTGGG 
TCTCCTGTGT 
AAAATGAAAA 
TTGAAACAAA 
AACCTAGAAG 
AGACAGATAT 
CCTTGGTGAA 
GTAGAAAAAT 
TAAGATGAAG 
TTAAATTATC 
CAATTACAAA 
ATCAGCGAAA 



CAGGGGGCTG 
AGTCGGGGGT 
CCTGCCAGAA 
TGCCTTGTAT 
CTTCTGAAGC 
GTTGGTTGCT 
CTGAACAGCC 
GCTCCATTTG 
AGAAGATAGA 
TTCTGGAACA 
TGTATGTTTT 
CCACATGGCC 
TAAACTGCAA 
CAGTGCTTGT 
TCACATGAGG 
ATGACAGATT 
GAAGTTC AGT 
CTGGTCTGAT 
AAAAGCAAGC 
CACGAATTTG 
TCAGCAAGTG 
GATGATGGCT 
CATGGAAGAA 
ATCAACTGGA 
ACACTATCTG 
TGTTCCCATC 
GCAGTATTTT 
AAACTACCAC 
GAAAGAACAA 
ATAAACTTTT 
GCACTTACTA 
ACTTTGTATT 
TCACTTCATT 
GAAATGAAAT 
AAAAAAAAAA 



BLAST Results 



Entry AC004112 from database EMBL: 

Homo sapiens BAC clone RG313E03 from 7q31, complete sequence. 
Score = 2660, P = 3.0e-241, identities = 534/535 
> 10 exons 

Entry AC004111 from database EMBL: 

Homo sapiens BAC clone RG103H13 from 7q31, complete sequence. 
Score = 598, P = 5.8e-17, identities = 128/137 
1 exon 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 253 bp to 1092 bp; peptide length: 280 
Category: similarity to unknown protein 



1 MIIEHKIVIA DVKLVADFQR YILYWRKRFT EQPITDFCSV IRINSTAPFE 
51 EQENYFLLCD VLPEDRILRE ELQKQRLREI LEQQQQERND NNFHGVCMFC 
101 NEEFLGNRSV ILNHMAREHA FNIGLPDNIV NCNEFLCTLQ KKLDNLQCLY 
151 CEKTFRGKNT LKDHMRKKQH RKINPKNREY DRFYVINYLE LGKSWEEVQL 
201 EDDRELLDHQ EDDWSDWEEH PASAVCLFCE KQAETIEKLY VHMEDAHEFD 
251 LLKIKSELGL NFYQQVKLVN FIRRQVHOCR 

BLA3TP hits 

Entry CEF4 6B6_6 from database TREMBLNEW: 

product: "F46B6.7"; Caenorhabditis elegans cosmid F46B6 
>TREMBL:CEF4 6B6_6 product: "F46B6.7"; Caenorhabditis elegans cosmid 
F4 6B6 

Score = 630, P = l.le-61, identities = 123/289, positives = 183/289 
Entry AF059531_1 from database TREMBLNEW: 

gene: "PRMT3"; product: "protein arginine N-methyltransf erase 3"; Homo 
sapiens protein arginine N-methyltransferase 3 (PRMT3) mRNA, partial 
cds. >TREMBL : AF05 953 1_1 gene: "PRMT3"; product: "protein arginine 
N-methyltransferase 3"; Homo sapiens protein arginine 
N-methyltransferase 3 (PRMT3) mRNA, partial cds. 
Score = 120, P = 1.5e-04, identities = 23/78, positives = 42/78 

Entry YB9M_YEAST from database SHISSPROT: 

34.7 KD PROTEIN IN SHM1-MRPL37 INTERGENIC REGION. 

Score = 112, P = 4.6e-04, identities = 43/165, positives = 71/165 



Alert BLASTP hits for DKFZphf kd2_4 7a 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_47a4, frame 1 



Report for DKFZphf kd2_47a4 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[BLOCKS] 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 



280 

33921.94 
5.63 

TREMBL : CEF4 6B6_5 gene: 



'F46B6.7"; Caenorhabditis elegans cosmid F46B6 le-56 



BL01032B Protein phosphatase 2C proteins 
BL00028 Zinc finger, C2H2 type, domain proteins 
MYRISTYL 1 

1 
1 

3 
2 
2 
2 



ZINC_FINGER_C2H2 
CAMP_PHOS PHO_SI TE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
Zinc finger, C2H2 type 
Alpha_Beta 

LOW COMPLEXITY 8.21 % 



SEQ MIIEHKIVIADVKLVADFQRYILYWRKRFTEQPITDFCSVIRINSTAPFEEQENYFLLCD 

SEG 

PRD cccccceeehhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccchhhhheeeecc 

SEQ VLPEDRILREELQKQRLREILEQQQQERNDNNFHGVCMFCNEEFLGNRSVILNHMAREHA 

SEG XXXXXXXXXXXXXXXXXXXXXXX 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccccceeeehhhhhhhh 

SEQ FNIGLPDNIVNCNEFLCTLQKKLDNLQCLYCEKTFRGKNTLKDHMRKKQHRKINPKNREY 
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SEG 

PRD hcccccccccchhhhhhhhhhhhhhhhheeecccccccchhhhhhhhhhhcccccccccc 

SEQ DRFYVINYLELGKSWEEVQLEDDRELLDHQEDDWSDWEEHPASAVCLFCEKQAETIEKLY 

SEG 

PRD ceeeeeeeeccccchhhhhhhhcchhhhhhcccccccccccccccchhhhhhhhhhhhhh 

SEQ VHMEDAHEFDLLKIKSELGLNFYQQVKLVNFIRRQVHQCR 

SEG 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphf kd2_47a4 . 1 



PS00001 


44 


->48 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


107- 


>111 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


27 


->31 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


154- 


>157 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


160- 


>163 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


160- 


>164 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


194- 


>198 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


215- 


>219 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


178- 


>185 


TYR PHOSPHO_SITE 


PDOC00007 


PS00007 


13 


->22 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


124- 


>130 


MYRISTYL 


PDOC00008 


PS00028 


148- 


>171 


ZINC FINGER C2H2 


PDOC00028 



Pfam for DKFZphf kd2_47a4 . 1 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + C+KTFR + +L+ HMR H 
Query 148 CLY--CEKTFRGKNTLKDHMRKK-QH 170 
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DKFZphfkd2_4b6 



group: kidney derived 

DKFZphf kd2_4b6 encodes a novel 133 amino acid protein with similarity to Homo sapiens clone 
25003 partial CDS. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to Homo sapiens clone 25003 

complete cDNA, complete cds, few EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1936 bp 

Poly A stretch at pos . 1916, polyadenylation signal at pos . 1890 



1 GGGAGACTTG CAATGAAGTT AGAATGAACA GGAGGAGTCT GCAGCTTTTC 
51 AGTGCCTGGG ATAACTATAG TTTAAAGATC ATTGTGTAAA ATAGGATTTT 
101 TAGTCAGCAT GCATTGTTTT AAACCGACTA ACTGATAGCC TAAAACTTTA 
151 TTTTTGCATT TTGCCAATCC TTGGAGTTTT GTTTTGCAGA ATTAAGAAAA 
201 AAATGAATGT ATGATCATCT GAAAAGGGCT TTCTCTCAAT CCCACTTCAT 
251 GGCATGACCT CTGCTGGATC ATTAGTTCTA GCCAGAGAAG TAGCAAAGGA 
301 ACATGACGTC TGAGACCTCC CTTCCCTCAT CAGTGGGGCT GACTGAGCTG 
351 GGGGCTTGAA GCCGGAGGTA ACCTTTCCTG TCGAATGTTT CTTTAGAGAA 
. 401 TGGCAATGGT CTCTGCGATG TCCTGGGTCC TGTATTTGTG GATAAGTGCT 
4 51 TGTGCAATGC TACTCTGCCA TGGATCCCTT CAGCACACTT TCCAGCAGCA 
501 TCACCTGCAC AGACCAGAAG GAGGGACGTG TGAAGTGATA GCAGCACACC 
551 GATGTTGCAA CAAGAATCGC ATTGAGGAGC GGTCACAAAC AGTAAAGTGT 
601 TCCTGTCTAC CTGGAAAAGT GGCTGGAACA ACAAGAAACC GGCCTTCTTG 
651 CGTCGATGCC TCCATAGTGA TTTGGAAATG GTGGTGTGAG ATGGAGCCTT 
701 GCCTAGAAGG AGAAGAATGT AAGACACTCC CTGACAATTC TGGATGGATG 
751 TGCGCAACAG GCAACAAAAT TAAGACCACG AGAATTCACC CAAGAACCTA 
801 ACAGAAGCAT TTGTGGTAGT AAAGGAAAAC CAACCCTCTG GAAAATACAT 
851 TTTGAGAATC TCAAACATCT CACATATATA CAAGCCAAAT GGATTTCTTA 
901 CTTGCACTTT GACTGGCTAC CAGATAATCA CAGTGCGTTT AGTGTGTGTA 
951 ACGAAATATC CTACAGTGAG AAGACACAGC GTTTTGGCAT CACCATGGAA 
1001 AGTGGGCTTA AAAAAGGGTC TTCTCAGTGA AATTTTTGGG CATCATGAAG 
1051 AACGATCAAC TATCTTCTAA TTTGAATCTA TAGTTACTTT GTACCATTTG 
1101 AAATATATGT ATATATATAT ATATAATATT TTGAAATATT ATCTATTCTC 
1151 TTCAAGAAAT GAACAGTACC ACAGTTTGAG ACGGCTGGTG TACCCCTTTG 
1201 AGTTTTGGAT GTTTTGTCTG TTTTGCTTTG TTTTGTTAGT CATTTCTTTT 
1251 TCTAACGGCA AGGAAGATAT GTGCCCTTTT GAGAATTCAA GATGGCACTG 
1301 ACACGGGAAG GCCAGCTACA GGTGGACTCC TGGAATTTGA GGCATCATAA 
1351 TGATACTGAA TCAAGAACTT CCTTCTGCTT CTACCAGATG GCCCAAGGAA 
1401 GCACATCGTC CTGTTTTATT GCTTTCTACC CTGTGCAATA TTAGCATGCA 
14 51 AGCTTGGCTT ACATAGTCAT ACTTTATATT CAATTGATAT ATAATAACCG 
1501 TTCTAACCTC TTCCAGGAAA ATATTTTTAG AACTACTAGC TTTTCCACTT 
1551 AGAAGAAAAT GAGGATTCTT AAGGGAGCCA CTCCACCATG CTATTAAGAC 
1601 TCTGGCAGAG TTATGGGTAG GATATGGATC CCTACATGAA TAAGTCCTGT 
1651 AAATACAATG TCTTAAGGCT TTGTATAGCT GTCCTAGACT GCAGAAATGT 
1701 CCTCTGATTA AATCCAAAGT CTGGCATCGT TAACTACATA GTGCTGTAGC 
1751 AACAAGTCTT ATCATGGCAT CTCTTTCTAT GTTTGGTTTG CTTTTTCCAA 
1801 GAGTATTCAG GTCTCCTCTT GTGAGATAGG AAGGCCATGA AAACAATTAG 
1851 ATTTCAAGAT GATCTATGTG ACCAAATGTT GGACAGCCCT ATTAAAGTGG 
1901 TAAACAACTT CTTTCTAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 400 bp to 798 bp; peptide length: 133 
Category: similarity to unknown protein 
Classification: no clue 



1 MAMVSAMSWV LYLWISACAM LLCHGSLQHT FQQHHLHRPE GGTCEVIAAH 
51 RCCNKNRIEE RSQTVKCSCL PGKVAGTTRN RPSCVDASIV IWKWWCEMEP 
101 CLEGEECKTL PDNSGWMCAT GNKIKTTRIH PRT 



ELASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4b6, frame 1 

TREMBLNEW : AF13 18 51_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds . , N = 1, Score = 242, P = 1.7e-20 



>TREMBLNEW:AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds. 
Length = 165 



HSPs : 



Score = 242 (36.3 bits), Expect = 1.7e-20, P = 1.7e-20 
Identities = 44/89 (49%), Positives = 58/89 (65%) 



Query: 42 GTCEVIAAHRCCNKNRIEERSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPC 101 

GTCE++ R ++ R QT +C+C G++AGTTR RP+CVDA 1+ K WC+M PC 

Sbjct: 76 GTCEI VTLDRDSSQPRRTI ARQTARCACRKGQIAGTTRARPACVDARI I KTKQWCDMLPC 135 

Query: 102 LEGEECKTLPDNSGWMCAT-GNKIKTTRI 129 

LEGE C L + SGW C G +IKTT + 
Sbjct: 136 LEGEGCDLLINRSGWTCTQPGGRIKTTTV 164 



Pedant information for DKFZphf kd2_4b6, frame 1 



Report for DKFZphf kd2_4b6 . 1 



[LENGTH] 133 

[MW] 15030.64 

[pi] 8.49 

[HOMOL] TREMBLNEW: AF1318511 product: "Unknown"; Homo sapiens clone 25003 mRNA 

sequence, partial cds. 4e-20 

[KW] Alpha_Beta 

[KW] SIGNAL_PEPTIDE 26 



SEQ MAMVSAMSWVLYLWISACAMLLCHGSLQHTFQQHHLHRPEGGTCEVIAAHRCCNKNRIEE 
PRD ccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccceeeeeeecccccchhhh 



SEQ RSQTVKCSCLPGKVAGTTRNRPSCV DAS I VI WKWWCEMEPCLEGEECKTL PDNSGWMCAT 

PRD hhhhhhccccccccccccccccccceeeeeehhhhhhccccccccceeeecccccceeec 



SEQ GNKIKTTRIH PRT 

PRD ccccccccccccc 



(No Prosite data available for DKFZphf kd2_4b6. 1 ) 
(No Pfam data available for DKFZphf kd2_4b6. 1) 
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DKFZphf kd2_4c8 



group: kidney derived 

DKFZphf kd2_4c8 encodes a novel 153 amino acid protein with partial similarity to huntington's 
associated protein HAP1. 

The novel protein contains a leucine zipper involved in protein-protein interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to KIAA0549 and HAP1 

potential frame shift at Bp -1350-1500 will be checked 

Sequenced by GBF 

Locus : unknown 

Insert length: 3182 bp 

Poly A stretch at pos. 3162, polyadenylation signal at pos . 3135 



1 GGGCTTCCCC CATAGAATTT TTCTTTTCAT TGCCCACTTT ACTGTTTTGG 
51 CTCCAGACTG TCGTTAAGAA TGTACAGCCT AATTCTGGTG TGTTTCGGGA 
101 TATTCTTCTG TCCAGTATTC TGGAAGGGCG GGGAGGCATG GCAGCGTTTT 
151 ACTTGACGTT GATGGTGCTG TGAAGTCCAT TCTTTCCTCT GCAAGACTAC 
201 TGACTATGCA GAAATTTATC GAAGCGGATT ATTATGAACT AGACTGGTAT 
251 TATGAAGAAT GCTCGGATGT TTTATGTGCT GAAAGAGTTG GCCAGATGAC 
301 TAAGACATAT AATGACATAG ATGCTGTCAC TCGGCTTCTT GAGGAGAAAG 
351 AGCGGGATTT AGAATTGGCC GCTCGCATCG GCCAGTCGTT GTTGAAGAAG 
4 01 AACAAGACCC TAACCGAGAG GAACGAGCTG CTGGAGGAGC AGGTGGAACA 
4 51 CATCAGGGAG GAGGTGTCTC AGCTCCGGCA TGAGCTGTCC ATGAAGGATG 
501 AGCTGCTTCA GTTCTACACC AGCGCAGCGG AGGAGAGTGA GCCCGAGTCC 
551 GTTTGCTCAA CCCCGTTGAA GAGGAATGAG TCGTCCTCCT CAGTCCAGAA 
601 TTACTTTCAT TTGGATTCTC TTCAAAAGAA GCTGAAAGAC CTTGAAGAGG 
651 AGAATGTTGT ACTTCGATCC GAGGCCAGCC AGCTGAAGAC AGAGACCATC 
701 ACCTATGAGG AGAAGGAGCA GCAGCTGGTC AATGACTGCG TGAAGGAGCT 
7 51 GAGGGATGCC AATGTCCAGA TTGCTAGTAT CTCAGAGGAA CTGGCCAAGA 
801 AGACGGAAGA TGCTGCCCGC CAGCAAGAGG AGATCACACA CCTGCTATCG 
851 CAAATAGTTG ATTTGCAGAA AAAGGCAAAA GCTTGCGCAG TGGAAAATGA 
901 AGAACTTGTC CAGCATCTGG GGGCTGCTAA GGATGCCCAG CGGCAGCTCA 
951 CAGCCGAGCT GCGTGAGCTG GAGGACAAGT ACGCAGAGTG CATGGAGATG 
1001 CTGCATGAGG CGCAGGAGGA GCTGAAGAAC CTCCGGAACA AAACCATGCC 
1051 CAATACCACG TCTCGGCGCT ACCACTCACT GGGCCTGTTT CCCATGGATT 
1101 CCTTGGCAGC AGAGATTGAG GGAACGATGC GCAAGGAGCT GCAGTTGGAA 
1151 GAGGCCGAGT CTCCAGACAT CACTCACCAG AAGCGTGTCT TTGAGACAGT 
1201 AAGAAACATC AACCAGGTTG TCAAGCAGAG ATCTCTGACC CCTTCTCCCA 
1251 TGAACATCCC CGGCTCCAAC CAGTCCTCGG CCATGAACTC CCTCCTGTCC 
1301 AGCTGCGTCA GCACCCCCCG GTCCAGCTTC TACGGCAGCG ACATAGGCAA 
1351 CGTCGTCCTC GACAACAAGA CCAACAGCAT CATTCTGGAA ACAGAGGCAG 
1401 CCGACCTGGG AAACGATGAG CGGAGTAAGA AGCCGGGGAC GCCGGGCACC 
14 51 CCCAGGCTCC CACGACCTGG AGACGGCGCT GAGGCGGCTG TCCCTGCGCC 
1501 GGGAGAACTA CCTCTCGGAG AGGAGGTTCT TTGAGGAGGA GCAAGAGAGG 
1551 AAGCTCCAGG AGCTGGCGGA GAAGGGCGAG CTGCGCAGCG GCTCCCTCAC 
1601 ACCCACTGAG AGCATCATGT CCCTCGGCAC GCACTCCCGC TTCTCCGAGT 
1651 TCACCGGCTT CTCTGGCATG TCCTTCAGCA GCCGCTCCTA CCTGCCTGAG 
1701 AAGCTCCAGA TCGTGAAGCC GCTGGAAGGT GATCACGCGG GGCCTCGGCC 
1751 CCTCTCTGTC CTCCTGGGGG ACTCCCTTTG GTCCCTGATC CACCTGCGGA 
1801 AGGCGGGGCA CCTCTGTCAC GCCTACTCCT TTTTCTTCCG CGACAGCCAC 
1851 CCGCGCTGCT GGTTTGAGTT CCTCTGAGGG TGGTGCTCAG CCTAGGCCTC 
1901 CGTCCCTCCC CTCTGGCTGG CAGGTGTGAC AATGCACACA TAGGCCATGA 
1951 AACTCGCCGA GGAAAGACAA GCATGTGCAC TGTGGTCTTC TAGTTCTTTC 
2001 CTTTGCCTTT AGAACCTTAG AAATAAAAAC TTTTGTGGCG GTAGAGGCAC 
2051 TGCTAACTGA TTCAAAAATT AATTAGGTTT TGCCTGTGGG TGTGAGGAAT 
2101 GCAGAAAATT AATGCTTTAG CTTTTCTGCA GTTTTGGTGT CGGGGAGAGG 
2151 TTCCAAGCAA ACTCTATTAA ATGGGGATTT TTTTTTCCCC ATAACCACCT 
2201 GAATGTGATT TGTGGGCTTA TGTGTTCTGA TTTGAACTTC ATATAGCAAG 
2251 GTTGTGGCTT TTGGCAGATG CAGTATGTTC TGAGCGCGGC TCCTAGAGTC 
2301 TACAATTTGG AGTCCAGGAA GGGGTGGCTG TGGAGACAAG TGAGTTTTGT 
2351 ACCTCCGTAA GCCACCCTTT TTCAGGGTCA GTTCATGTGT TAGTATCAGG 
24 01 GGCATCTCAG ATGATTAAAC TCATGGGAAA AACTTCCTCC TTCCCTCTCT 
2451 CCCTCTTGCC CTCCTGCCTC TTTTTTTTTT TTT TTTTTTT AATTTGGGCA 
2501 CTTATAAAAT GTTTTCCCTC TACCTGCTGC TACTCTGCCA AGAGCCACCA 
2551 AGTGCTTATA TTTTTCATTT TTTACTCCTT TAGTTTGGAA AGCCATATAC 
2601 GTTTGAGAAG GTGTTTTAAA ACTCTGTGTT ACACTTACGA TGCAAAGCCA 
2651 AATCAGAACT TCTGTAAGGC AGAACTTTCC CAACTTTAAA AAAATTATTG 
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2701 TCCCCTCTAG GAGCCTTCTT AGACGTTTTT TCCTAATCAC CCCCCAAAGA 
2751 CATTTTAATA CCACATATAT ATTGTTTATG TACTATATGT ATATACATAA 
2801 ACAATACATA AGCAATACAT CTGTGGTATT AAAATTAAAA AGAATCCAAT 
2851 TATGTTTACC TCAAAAGAAC CTGTTTTTGC TTCTTGGGAG CAATATTGCC 
2901 CCTGTGAGAC TGCATGCTAT AAGGTAAGGT TGTGCTTGTT AAAGACCCAA 
2951 GACATGACTG GGTTCCACAG TCTCCAAAGG AAGAGGGTGG GCTAGTTTGT 
3001 TTTTATTATT ATTTTAAAAT TGTATAATTG GGGTCTTTCT TAGAGTTCAG 
3051 AAAAGGTATA GCTTACTCTT TTTTAATTGT TTATTTAGTT GTAAGCTTAG 
3101 TGATTGTTTT CTGATCCACA TTGTGTGTGT TCTTCAATAA AATCTTTCAT 
3151 TTCTGCAATT TTAAAAAAAA "AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 206 bp to 1531 bp; peptide length: 442 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (139-161) 



1 MQKFIEADYY ELDWYYEECS DVLCAERVGQ MTKTYNDIDA VTRLLEEKER 

51 DLELAARIGQ SLLKKNKTLT ERNELLEEQV EHIREEVSQL RHELSMKDEL 

101 LQFYTSAAEE SEPESVCSTP LKRNESSSSV QNYFHLDSLQ KKLKDLEEEN 

151 VVLRSEASQL KTETITYEEK EQQLVNDCVK ELRDANVQIA SISEELAKKT 

201 EDAARQQEEI THLLSQIVDL QKKAKACAVE NEELVQHLGA AKDAQRQLTA 

251 ELRELEDKYA ECMEMLHEAQ EELKNLRNKT MPNTTSRRYH SLGLFPMDSL 

301 AAEIEGTMRK ELQLEEAESP DITHQKRVFE TVRNINQVVK QRSLTPSPMN 

351 IPGSNQSSAM NSLLSSCVST PRSSFYGSDI GNVVLDNKTN SIILETEAAD 

401 LGNDERSKKP GTPGTPRLPR PGDGAEAAVP APGELPLGEE VL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8, frame 2 

PIR:S72555 huntingtin-associated protein HAP1 - human (fragment), N = 
1, Score = 234, P = 8.6e-19 

TREMBL : CEUT27A3_7 gene: "T27A3.1"; Caenorhabditis elegans cosmid 
T27A3., N = 1, Score = 226, P = 9.9e-16 

PIR:S67495 huntingtin-associated protein HAP1-A - rat, N = 1, Score = 
215, P = 1.6e-14 



>PIR:S72555 huntingtin-associated protein HAP1 - human (fragment) 
Length = 320 

HSPs : 

Score = 234 (35.1 bits), Expect = 8.6e-19, P = 8 . 6e-19 
Identities = 66/189 (34%), Positives = 110/189 (58%) 

Query: 109 EESEPESVCSTPLKRNE— SSSSVQNYFH LDSLQKKLKDLEEENWLRSEASQLKTE 163 

EE+E + C+ P + S ++ + H L++LQ+KL+ LEEEN LR EASQL T 
Sbjct: 28 EEAEEDLQCAHPCDAPKLISQEALLHQHHCPQLEALQEKLRLLEEENHQLREEASQLDT- 86 

Query: 164 TITYEEKEQQLVNDCVKELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKK 223 

E++EQ L+ HCV++ +A+ Q+A +SE L + E+ RQQ+E+ L +Q++ LQ++ 
Sbjct: 87 LEDEEQMLILECVEQFSEASQQMAELSEVLVLRLENYERQQQEVARLQAQVLKLQQR 143 

Query: 224 AKACAVENEELVQHLGAAKDAQRQLTAE — LRELEDKYAECME — MLHEAQEELKNL-RN 278 
+ E E+L + L + K+ Q QL E L ++ AE + + + + + RN 
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Sbjct: 144 CRMYGAETEKLQKQLASEKEIQMQLQEEETLPGFQETLAEELRTSLRRMISDPVYFMERN 203 

Query: 279 KTMP — NTTSRRY 289 

MP +T+S RY 
Sbjct: 204 YEMPRGDTSSLRY 216 



Peptide information for frame 3 



ORF from 1416 bp to 1874 bp; peptide length: 153 
Category: similarity to known protein 
Classification: unset 



1 MSGVRSRGRR APPGSHDLET ALRRLSLRRE NYLSERRFFE EEQERKLQEL 
51 AEKGELRSGS LTPTESIMSL GTHSRFSEFT GFSGMSFSSR SYLPEKLOIV 
101 KPLEGDHAGP RPLSVLLGDS LWSLIHLRKA GHLCHAYSFF FRDSHPRCWF 
151 EFL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8, frame 3 

TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds . , N = 1, Score = 252, P 
= 5.5e-21 



>TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds. 
Length =4 69 

HSPs: 

Score = 252 (37.8 bits), Expect = 5.5e-21, P = 5.5e-21 
Identities = 57/98 (58%), Positives = 69/98 (70%) 

Query: 8 GRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGSLTPTESI 67 

G+ P G DL TAL RLSLRR+NYLSE++FF EE +RK+Q LA++ E SG +TPTES+ 
Sbjct: 27 GQPGPSGDSDLATALHRLSLRRQNYLSEKQFFAEEWQRKIQVLADQKEGVSGCVTPTESL 86 

Query: 68 MSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEG 105 

SL T SE T S S R ++PEKLQIVKPLEG 
Sbjct: 87 ASLCTTQ--SEITDLSSAS-CLRGFMPEKLQIVKPLEG 121 



Pedant information for DKFZphf kd2_4c8, frame 2 



Report for DKFZphf kd2_4c8 . 2 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
cds. 5e-2 
[FUNCAT] 
5e-08 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
6e-08 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
jannaschi 
[FUNCAT] 
myosin-1 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 



442 

50020. 14 
4.77 

TREMBL : AF040723_1 product: "neuroanl"; Homo sapiens neuroanl mRNA, complete 



08.07 vesicular transport (golgi network, etc. 



[S. cerevisiae, YDL058w] 



30.04 organization of cytoskeleton [S. cerevisiae, YIL149c] 5e-08 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-08 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YILl38c] 

99 unclassified proteins [S. cerevisiae, YGR130c] 2e-07 

09.10 nuclear biogenesis [S. cerevisiae, YDR356w] le-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-06 
1 genome replication, transcription, recombination and repair [M. 
MJ1643] le-06 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

isoform] 3e-06 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-06 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] 4e-06 

30.10 nuclear organization [S. cerevisiae, YKR095w] 4e-06 
03.13 meiosis [S. cerevisiae, YNL250w] 2e-05 

03.19 recombination and dna repair [S. cerevisiae, YNL250w] 2e-05 
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[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 

5e-05 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YNL079c] 5e-05 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YNL079c] 5e-05 

[FUNCAT] 10.05.99 other pheromone response activities [S. cerevisiae, YHR158c] 

le-04 

[FUNCAT] 30.13 organization of chromosome structure [S. cerevisiae, YDR285w] le-04 

[FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YNL272c] 3e-04 

[FUNCAT] 08.16 extracellular transport [S. cerevisiae, YNL272c] 3e-04 

[BLOCKS] BL01289B 

[BLOCKS] BL00415M Synapsins proteins 

[EC] 3.6.1.32 Myosin ATPase 2e-07 

[PIRKW] tandem repeat 2e-07 

[PIRKW] heterodimer le-06 

[PIRKW] endocytosis 9e-07 

[PIRKW] heart le-06 

[PIRKW] transmembrane protein 4e-07 

[ PIRKW] zinc finger 9e-07 

[PIRKW] metal binding 9e-07 

[PIRKW] DNA binding 3e-06 

[PIRKW] muscle contraction 2e-07 

[PIRKW] acetylated amino end 3e-06 

[PIRKW] actin binding 2e-07 

[PIRKW] mitosis le-06 

[PIRKW] microtubule binding le-06 

[PIRKW] ATP 2e-07 

[PIRKW] chromosomal protein le-06 

[PIRKW] receptor 3e-08 

[PIRKW] thick filament 2e-07 

[PIRKW] phosphoprotein 8e-06 

[PIRKW] glycoprotein 3e-08 

[PIRKW] skeletal muscle 3e-06 

[PIRKW] DNA condensation le-06 

[PIRKW] alternative splicing 2e-06 

[PIRKW] coiled coil 2e-07 

[PIRKW] P-loop 2e-07 

[PIRKW] heptad repeat 4e-07 

[ PIRKW] methylated amino acid 2e-07 

[PIRKW] peripheral membrane protein 9e-07 

[PIRKW] cardiac muscle 6e-06 

[PIRKW] hydrolase 2e-07 

[PIRKW] muscle 2e-06 

[ PIRKW] cytoskeleton 2e-06 

[ PIRKW] Golgi apparatus 4e-07 

[PIRKW] calmodulin binding 9e-07 

[SUPFAM] myosin motor domain homology 2e-07 

[SUPFAM] tropomyosin TPM1 2e-06 

[SUPFAM] giantin 4e-07 

[SUPFAM] protein kinase C zinc-binding repeat homology 2e-06 

[SUPFAM] human early endosome antigen 1 9e-07 

[SUPFAM] unassigned kinesin-related proteins 4e-07 

[SUPFAM] M5 protein 8e-08 

[SUPFAM] cytoskeletal keratin 3e-06 

[SUPFAM] myosin heavy chain 2e-07 

[SUPFAM] conserved hypothetical P115 protein le-06 

[SUPFAM] centromere protein E le-06 

[SUPFAM] pleckstrin repeat homology 2e-06 

[SUPFAM] kinesin motor domain homology 4e-07 

[PROSITE] LEUCINE_ZIPPER 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 6.7 9 % 

[KW] COILED_COIL 27.15 % 



SEQ MQKFIEADYYELDWYYEECSDVLCAERVGQMTKTYNDIDAVTRLLEEKERDLELAARIGQ 

SEG xxxxxxxxxxxxxxx. . . 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS C 

SEQ SLLKKNKTLTERNELLEEQVEHIREEVSQLRHELSMKDELLQFYTSAAEESEPESVCSTP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LKRNESSSSVQNYFHLDSLQKKLKDLEEENVVLRSEASQLKTETITYEEKEQQLVNDCVK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
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SEQ ELRDANVQIASISEELAKKTEDAARQQEEITHLLSQIVDLQKKAKACAVENEELVQHLGA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ AKDAQRQLTAELRELEDKYAECMEMLHEAQEELKNLRNKTMPNTTSRRYHSLGLFPMDSL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ AAEIEGTMRKELQLEEAESPDITHQKRVFETVRNINQVVKQRSLTPSPMNIPGSNQSSAM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhh 

COILS 

SEQ NSLLSSCVSTPRSSFYGSDIGNVVLDNKTNSI ILETEAADLGNDERSKKPGTPGTPRLPR 

SEG xxxxxxxxxxx 

PRD hhhhhcccccccccccccccceeeeeccccceeecccccccccccccccccccccccccc 

COILS 

SEQ PGDGAEAAVPAPGELPLGEEVL 

SEG xxxx 

PRD cccccccccccccccccccccc 

COILS 

Prosite for DKFZphf kd2_4c8 . 2 
PS00Q29 139->161 LEUCINE_ZIPPER PDOC00029 

(No Pfam data available for DKFZphf kd2_4c8. 2) 

Pedant information for DKFZphf kd2_4c8 , frame 3 



Report for DKFZphf kd2_4c8 . 3 

[LENGTH] 153 

[MW] 17642.03 

[pi] 9.38 

[HOMOL] TREMBL:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo sapiens 

mRNA for KIAA0549 protein, partial cds. 2e-12 
[KW] Alpha_Beta 
[KW] LOW_COMPLEXITY 12.42 % 

SEQ MSGVRSRGRRAPPGSHDLETALRRLSLRRENYLSERRFFEELQSRKLQELAEKGELRSGS 

SEG xxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccc 

SEQ LTPTESIMSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEGDHAGPRPLSVLLGDS 

SEG 

PRD cccccceeeccccceeeccccccccccccccccchhhhhhhhcccccccccceeeeeccc 

SEQ LWSLIHLRKAGHLCHAYSFFFP.DSHPRCWFEFL 

SEG 

PRD chhhhhhhhhcccccceeeeecccccccccccc 

(No Prosite data available for DKFZphf kd2_4c8 . 3) 
(No Pfam data available for DKFZphf kd2_4c8 .3) 
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DKFZphf kd2_4 kl 4 



group: intracellular transport and trafficking 

DKFZphf kd2_4kl4 . 3 encodes a novel 254 amino acid putative GTP-binding protein nearly identical 
to Rab 6. 

Rab proteins are members of the Ras superfamily of GTPases. Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. 

rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 

The new protein can find application in modulating the transport of vesicles inside the Golgi 
apparatus . 



strong similarity to Rab6 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 3084 bp 

Poly A stretch at pos . 3061, polyadenylation signal at pos. 3043 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 



GGGGCACTCA 
CGCCGCGCGT 
CTCCGTCTCT 
GCCCTAGCCT 
CCTCCTTCCC 
CGGGAGGCGG 
CCGGGCTTCT 
AGCTCGGCGC 
GCATCGGTCC 
CCACCATGTC 
CTGGTGTTCC 
ATTCAGGTAT 
ACTTTTTATC 
CTGTGGGATA 
CATCCGTGAT 
ACTCATTCCA 
GGAAGTGATG 
CAAGAGGCAA 
ATGTTACGTT 
CTCTTTCGAC 
CGGAAGCAGA 
AAACAGTCAG 
ACCCTTCCTC 
TGGCTTGAAC 
TGCCTGTCTC 
AGCGTCTTCA 
TCCGGTGATA 
TTTTAATGTT 
GAGTATGGCT 
CTTGCTCTTC 
CCTAGCCTCC 
GTCCGACCAA 
CTGTAGGCTG 
AATTACTTAG 
TGCTTGAATG 
AACATACTGA 
ACACAAATGG 
TAACCAAGTG 
ATAATGAAGT 
AGTTATTCTG 
TGCTTGGGTT 
TAAAAGTTTA 
ACTACGATCA 
CCAGCAAAAG 
GAGGTCAAAT 
GCCCCTAGAG 
ATGGAAATAA 
TTTTTAAGGT 
TAGTTTAGTG 
TGGCGATATT 



GCAGGTTGGG 
GAGAGATCCC 
CTCCCGCAGG 
TGGGAAGCCA 
TTCCCAGCCG 
CGGCGGCTGC 
CCAGCCGGGC 
CCGCCCTTCC 
GGGAGGTCTC 
CGCGGGCGGA 
TGGGGGAGCA 
GACAGTTTTG 
AAAAACTATG 
CGGCGGGTCA 
TCTGCTGCAG 
GCAAACTACA 
TTATCATCAC 
GTGTCAGTTG 
TATTGAAACT 
GTGTAGCAGC 
GAAGACATGA 
CGAAGGGGGT 
AGAAGCCCCC 
CTTTTCCCTT 
GTGGAGGTGA 
TTATTTATAT 
ACTTTAAAAA 
ATGATAATGT 
TGGTTAACGA 
TCACCTCTCC 
CCCCACTTCC 
GCCCACTGGA 
TGGACCAAGA 
ATCTTCTTTG 
AGAAAAGCCT 
GTACTTATAA 
TCCTTTCACA 
CTTCAGAACA 
TCTAATGAAC 
TTTGTTTAAA 
TTCTTTCTTA 
ATTCCTTACA 
AAACTACTGA 
AACCCTCAGC 
TGAAGACGGA 
GCAGATCAGG 
AACTGAATAT 
TGGGTCTTTC 
ATTTAGTTTT 
TCTTTGCTTT 



CTGCGGCGGC 
GGATACATCT 
TCTCTGAGCC 
AAGCACACCC 
CGGGCCTCGC 
CAGTCTGTGG 
TCCTCCACCG 
GCTCGCCTTT 
TGGGCTGAGG 
GACTTCGGGA 
AAGCGTTGCA 
ACAACACCTA 
TACTTGGAGG 
GGAACGTCTC 
CTGTAGTAGT 
AAGTGGATTG 
GCTAGTAGGA 
AGGAGGGAGA 
AGGGCAAAAA 
AGCTTTGCCG 
GTGACATAAA 
TGTTCCTGCT 
TTACTCTTTC 
CATTAATAAC 
TCTATTAGCT 
TTTACAAAAA 
TTAGATACAT 
ACTTCAAAAT 
GCAGTATGTT 
CTTACCCCGT 
TCAAAACAAA 
ATTATCCTTT 
TGTCCAGAAT 
AGGTCAGAAT 
CCTGGTGCAT 
GTAGCAGAAC 
TGTGCTTTAT 
GGTTTTTAGT 
TATTTCTCCC 
AAGTAAGAAA 
AAAAAATAAT 
GAAGAACCAG 
ATTAGCAGAA 
AGAATAGCAA 
AGACGGAAAC 
TAAGCATACA 
TATGCAGATT 
AGGCTGGTTT 
ATATTTAAGC 
TTTTTTTTAA 



GGCGGCTGGG 
GCGGTTTGGG 
GGGTGCGGAA 
CTGGCTCCCG 
TCCGTGCTCG 
CGAGCCCTGC 
GCCCTTGCAG 
TTCGTCAGCC 
CGGCGACAGC 
ATCCGCTGAG 
AAGACATCTT 
TCAGGCAATA 
ATGGAACAAT 
CGTAGCCTCA 
TTACGATATC 
ATGATGTCAG 
AATAGAACAG 
GAGGAAAGCC 
CTGGATACAA 
GGAATGGAAA 
ACTGGAAAAG 
ACTCTCCCAT 
ATTGACTGCA 
GTTTTGCAAT 
TCACAAGCAC 
GCCAAATTAT 
TTTCTTAACA 
GATGGAAATC 
CACAGCCTGC 
TCCCTATTTC 
CAAGAGATGG 
AATTTTACAG 
TATTCTTGAG 
TCAGCGATCA 
CTTCAAAATG 
ATAAAATGTA 
TAGACTCTGG 
ATTTACTTCT 
AAGGTTTTAA 
CCTCTGTAAG 
ACTATGCAGG 
TGGAAGAATT 
ATAACGATAT 
AAACTTTGCT 
CGGAAACCGT 
TAGTAGAGGG 
TATGCCTTAT 
TGGTTTGTAT 
TACGATTAAT 
CAACTTTCCA 



GAAGCCGAAG 
CTCCGCCACC 
GGAGGGAACG 
CCGACACCGC 
GCTACTCTGC 
TGCCCTCCAG 
GGGCACAGAG 
GGCTGGAGGA 
TCCTCTAGTT 
GAAATTCAAG 
TGATCACCAG 
ATTGGCATTG 
CGGGCTTCGG 
TTCCCAGGTA 
ACAAATGTTA 
AACAGAAAGA 
ATCTTGCTGA 
AAAGGGCTGA 
TGTAAAGCAG 
GCACACAGGA 
CCTCAGGAGC 
GTCATCTTCA 
GTGTGAATAT 
TCATCATTGC 
AAAAAAAGTC 
TTCAGCATAT 
TTTTTTTCTT 
TCAACAGTAT 
TTTATCTCTC 
CGTGTTCTTA 
CAAAGCAGCA 
ATACCACTTG 
CACTGATGTA 
CGGTAGGCAG 
AGTCCTAAAG 
TTTCTGACTA 
GAGAGAAAAG 
TCATGGTAAG 
AATTGTCAAG 
CAATAGATTT 
CAAGACACCA 
TAAATTTGGC 
CTAAAGCTTA 
CAGGACATTT 
TTTCTTGTAA 
AAAGGAGAGA 
TTTTTAGCAT 
TAGATCTGTA 
ATTTTTTCTT 
TTTTTAGATG 
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2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 



TTTCGTTGAA 
AAACACTGCA 
TTATTGTGAG 
TTTGCAGGAA 
TATGTTGTAA 
GTATCTTCAT 
AGAACCTTAG 
GCCACTTTGT 
AATGCTTACA 
ATGGGATGGA 
CAAGGGTTGA 
ATTGGTTCAC 



TCTATTTAGA 
AACAAATATA 
ACTGCTGTGT 
GAAAACTTCG 
ACGTTACTTA 
ACTTCCTCAT 
TCCCCTCTCT 
AATATTCAGA 
GATAATCATT 
GTTATAAAGT 
CTCTTTGTTT 
TATGAAAAAA 



GCTTCACCAT 
CTAGGAGTGT 
AAGCTAATAA 
AGTTACAGGT 
ACACAGTATA 
CCCCTCATTG 
TTCCTCTTCC 
GAGCACTTGG 
AGCCCACATA 
GCTTTTATAA 
TATTTTGACA 
AAAAAAAAAA 



GGCAATATGT 
GCCCTTTTAA 
ACACATTTGT 
CAGGAAAAGC 
AAGATGAAAA 
CAACAAAACC 
TCCTCCACTT 
ATTATGGATC 
CCAGTAACTT 
TCCAATATAA 
TGGCATGTCC 
AAAA 



ATTTCCCTTA 
TCTTTACTAG 
AAAAACATTG 
CTGCTGAATT 
GACAACAAAA 
TTAAACTGGG 
CCCACTTATT 
TGAATAGAGA 
ATACTTAAAG 
TTGCTAAAGG 
TGAAATAAAT 



BLAST Results 



No BLAST result 



Medline entries 



98382468: 
Rab proteins. 

97203146: 

GTP-bound forms of rab6 induce the redistribution of Golgi 
proteins into the endoplasmic reticulum. 



Peptide information for frame 3 



ORF from 456 bp to 1217 bp; peptide length: 254 
Category: strong similarity to known protein 
Classification: unset 

Prosite motifs: BACTERIAL OPSIN_RET (45-57) 



1 MSAGGDFGNP LRKFKLVFLG EQSVAKTSLI TRFRYDSFDN TYQAIIGIDF 
51 LSKTMYLEDG TIGLRLWDTA GQERLRSLIP RYIRDSAAAV VVYDITNVNS 
101 FQQTTKWIDD VRTERGSDVI ITLVGNRTDL ADKRQVSVEE GERKAKGLNV 
151 TFIETRAKTG YNVKQLFRRV AAALPGMEST QDGSREDMSD IKLEKPQEQT 
201 VSEGGCSCYS PMSSSTLPQK PPYSFIDCSV NIGLNLFPSL ITFCNSSLLP 
251 VSWR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 kl4 , frame 3 

PIR:G34323 GTP-binding protein Rab6 - human, N - 1, Score - 944, P = 
6.5e-95 

TREMBL:CET25G12_2 gene: "T25G12.4"; Caenorhabditis elegans cosmid 
T25G12., N = 1, Score = 756, P = 5.4e-75 

TREMBL : NTNTRAF1 gene: "Nt-rab6"; Nicotiana tabacum SRI Nt-rab6 mRNA, 
complete cds., N = 1, Score = 698, P = 7.6e-69 

TREMBL:D84 314_1 product: "rab6"; Drosophila melanogaster mRNA for 
rab6, complete cds., N = 1, Score = 836, P = 1.9e-83 

PIR:T01588 small GTP-binding protein F16B22.10 - Arabidopsis thaliana, 
N = 1, Score = 704, P - 1.8e-69 



>PIR:G34323 GTP-binding protein Rab6 - human 
Length = 208 

HSPs: 

Score = 944 (141.6 bits), Expect = 6.5e-95, P = 6.5e-95 
Identities = 186/208 (89*), Positives = 190/208 (91%) 
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Query : 


1 


MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAI IGIDFLSKTMYLEDG 


60 






MS GGDFGNPLRKFKLVFLGEQSV KTSLITRF YDSFDNTYQA IGIDFLSKTMYLED 




Sbjct: 


1 


MSTGGDFGNPLRKFKLVFLGEQSVGKTSLITRFMYDSFDNTYQATIGIDFLSKTMYLEDR 


60 


Queiry ; 


61 


TIGLRLWDTAGQERLRSLIPRYIRDSAAAVWYDITNVNSFQQTTKWIDDVRTERGSDVI 


120 






T+ L+LWDTAGQER RSLIP YIRDS AVWYDITNVNSFQQTTKWIDDVRTERGSDVI 




Sbjct: 


61 


TVRLQLWDTAGQERFRSLIPSYIRDSTVAVWYDITNVNSFQQTTKWIDDVRTERGSDVI 


120 


Query: 


121 


ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 


180 






I LVGN+TDLADKRQVS+EEGERKAK LNV FIET AK GYNVKQLFRRVAAALPGMEST 




Sbjct: 


121 


IMLVGNKTDLADKRQVSIEEGERKAKELNVMFIETSAKAGYNVKQLFRRVAAALPGMEST 


180 


Query: 


181 


QDGSREDMSDIKLEKPQEQTVSEGGCSC 208 








QD SREDM DIKLEKPQEQ VSEGGCSC 




Sbjct: 


181 


QDRSREDMIDIKLEKPQEQPVSEGGCSC 208 





Pedant information for DKFZphf kd2_4kl4 , frame 3 



Report for DKFZphf kd2_4kl4 . 3 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
[FUNCAT] 
7e-60 
[FUNCAT] 
[FUNCAT] 
YOR089C] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-33 
[FUNCAT] 
YGL2 1 Ow] 
[FUNCAT] 
[ FUNCAT ] 
8e-27 
[FUNCAT] 
2e-21 
[FUNCAT] 
[ FUNCAT ] 
2e-21 
[ FUNCAT ] 
[ FUNCAT ] 
cerevisiae, 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 



254 

28385 .29 
7.58 

PIR:G34323 GTP-binding protein Rab6 - human le-102 
08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YLR2 62c] 



2e-33 



30.08 organization of golgi [S. cerevisiae, YLR262c] 7e-60 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 

08.19 cellular import [S. cerevisiae, YOR089c] 2e-33 

08.13 vacuolar transport [S. cerevisiae, YOR089c] 2e-33 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YOR089c] 



3e-28 



09.09 biogenesis of intracellular transport vesicles 



[S. 



cerevisiae, 



30.02 organization of plasrr.a merrJbrane [S. cerevisiae, YFL005w] 8e-27 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL005w] 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, YORlOlw] 



11.10 cell death [S. cerevisiae, YORIOIm] 2e-21 

01.03.13 regulation of nucleotide metabolism [S. 



cerevisiae, YORlOlw] 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 2e-21 
03.99 other cell growth, cell division and dna synthesis activities 

YORlOlw] 2e-21 

10.04.07 g-proteins (S. cerevisiae, YORlOlw] 2e-21 
03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 6e-19 
11.01 stress response [S. cerevisiae, YNL098c] 6e-19 
03.10 sporulation and germination [S. cerevisiae, YNLQ98C] 6e-19 

04.07 rna transport (S. cerevisiae, YOR185c) 6e-16 
30.10 nuclear organization [S. cerevisiae, YOR185C] 6e-16 
08.01 nuclear transport [S. cerevisiae, YOR185C] 6e-16 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 4e-13 
10.02.07 g-proteins [S. cerevisiae, YPR165w] 4e-13 

10.99 other signal-transduction activities [S. cerevisiae, YCR027c] 2e-09 
10.05.07 g-proteins (S. cerevisiae, YLR229c] 8e-08 

03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YLR229C] 8e-08 
[FUNCAT] 03.01 cell growth [S. cerevisiae, YNLlSOc] le-05 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR094w] 5e-05 

[BLOCKS] BL01115A GTP-binding nuclear protein ran proteins 

[SCOP] dlas3_2 3.29.1.4.12 Transducin (alpha subunit) , insertion domai le-32 

[SCOP] dlmhl 3.29.1.4.2 Racl [Human (Homo sapiens) 2e-51 

[SCOP] d5p21 3.29.1.4.1 cH-p21 Ras protein [human (Homo sapiens) 7e-53 

[SCOP] dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARF1) [human (Horn le-46 

[SCOP] dla2kc_ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) [Do 6e-60 

[PIRKW] nucleus 2e-14 

[PIRKW] cell cycle control 5e-15 

[PIRKW] membrane trafficking 3e-71 

[PIRKW] endoplasmic reticulum le-29 

[PIRKW] phosphoprotein le-29 

[PIRKW] prenylated cysteine 2e-36 

[PIRKW]' signal transduction 5e-15 

[PIRKW] transforming protein 5e-30 

[PIRKW] purine nucleotide binding le-28 

[PIRKW] alternative splicing le-18 

[PIRKW] P-loop 3e-71 



[S. 
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[ PIRKW] 


lipoprotein 2e — 36 


[PIRKW] 


proto-oncogene le-20 


[PIRKW] 


methylated carboxyl end le-20 


[ PIRKW] 


membrane protein le-29 


[PIRKW] 


GTP binding 3e-71 


[PIRKW] 


thiolester bond le-29 


[PIRKW] 


Golgi apparatus le-29 


[SUPFAM] 


ras transforming protein le-76 


[PROSITE] 


BACTERIAL_OPSIN_RET 1 


[PFAM] 


Ras family (contains ATP/GTP binding P-loop) 


[KW] 


Alpha Beta 


[KW] 


3D 



SEQ MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAI IGI DFLSKTMYLEDG 

1 kao- CCEEEEEEECTTTTCHHHHHHHHHHCCCCCCCTTTTC-EEEEEEEEETTE 

SEQ TIGLRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 

lkao- EEEEEEEECCTTTTCHHHHHHHHHHCCEEEEEEETTTHHHHHHHHHHHHHHHHHTTTCCC 

SEQ ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 

1 kao- EEEEEETTTTGGGCCCCHHHHHHHHHHHCCCEEECTTTTHHHHHHHHHHH 

SEQ QDGSREDMSDIKLEKPQEQTVSEGGCSCYSPMSSSTLPQKPPYSFIDCSVNIGLNLFPSL 

lkao- 



SEQ ITFCNSSLLPVSWR 
lkao- 



Prosite for DKFZphf kd2_4kl4 . 3 
PS00327 45->57 BACTERIAL OPSIN RET PDOC00291 



Pfam for DKFZphf kd2_4kl4. 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYtKTIEIDGKtIK 
KLV++G+ +V K++L RF +++F++ Y + IG+DF++KT+++++ TI 
15 KLVFLGEQSVAKTSLITRFRYDSFDNTYQAIIGIDFLSKTMYLEDGTIG 



63 



LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
L +WDTAGQER RS+ P Y+R++ ++++VYDITN SF+ ++W++++R+ 
64 LRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRT 113 

HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+ ++V+I LVGN +DL+D+RQVS EEG+ A+ ++ + F+ET AKT+ 
114 ERG — SDVIITLVGNRTDLADKRQVSVEEGERKAKGLN-VTFIETRAKTG 160 

iNVEEAFMEIvRellqrMqe .q.NqteNinidQpsrnrk . . . . rCCCIM* 
+NV++ F +++ +++ +++ + +++++++I+ ++++ + +C+ + 

161 YNVKQLFRRVAAALPGMESTQDGSREDMSDIKLEKPQEQTVSEGGCS-C 208 
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group: transmembrane protein 

DKFZphfbr2-4mll encodes a novel 159 amino acid protein with weak similarity to the putative 
membrane protein YMR034c of S. cerevisiae. 

The novel protein contains 4 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker of neuronal cells. 



weak similarity to YMR034c 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1749 bp 

Poly A stretch at pos . 1727, polyadenylation signal at pos . 1713 



1 GGGGTCCTCA AAGCCGCCGG AGCAACCCCC AGGTCTTTAC TTTACAATCG 

51 GCAATTTGAC TTGCTCTGCT GCATGTCTGG AGGGACCAAG GAAAGTGTGG 

101 AGACGCTCCA AGGATTAGGT GATCGGAGCT TGAAAAGAAA AAAAGCCAAA 

151 CAAATAAACA AAACCCACCC ACCCTAACGA ATATGAGGCT GCTGGAGAGA 

201 ATGAGGAAAG ACTGGTTCAT GGTCGGAATA GTGCTGGCGA TCGCTGGAGC 

251 TAAACTGGAG CCGTCCATAG GGGTGAATGG GGGACCACTG AAGCCAGAAA 

301 TAACTGTATC CTACATTGCT GTTGCAACAA TATTCTTTAA CAGTGGACTA 

351 TCATTGAAAA CAGAGGAGCT GACCAGTGCT TTGGTGCATC TAAAACTGCA 

401 TCTTTTTATT CAGATCTTTA CTCTTGCATT CTTCCCAGCA ACAATATGGC 

451 TTTTTCTTCA GCTTTTATCA ATCACACCCA TCAACGAATG GCTTTTAAAA 

501 GGTTTGCAGA CAGTAGGTTG CATGCCTCCG CCTGTGTCTT CTGCAGTGAT 

551 TTTAACCAAG GCAGTTGGTG GAAATGAGGC AGCTGCAATA TTTAATTCAG 

601 CCTTTGGAAG TTTTTTGGTA AGTAAACATA GTTTAACTTG TCTATTACAA 

651 CTTTTGCTGT GATATTGTGT ATATGAAAGA TTTAGTGAAA GCTGGATTTG 

701 TTTTACTCTT TGGTTAAGTA TAAAAATTGT TGAATCTTTT CATGTGCCAG 

751 TATCCATACC CTGAAGAAAA GTAGTTAATG AATAAAGCAA ATGTTCTCTT 

801 ACAATATATT TTGGAGGTTT GGATTTTAAA ATTCCATTTA ATGAATTCAA 

851 GGAATCAATT AAAACACTAT GTGTCTCCTT ATAGAGGTTA TGTCAATATA 

901 TTGATCATTT AATGAGGTCT TTTAGATTAT TATTATTTTG TATCATGGGA 

951 CTGAGGATTT TGAAAAGGAA ACATGACCCA GCTGGTCAGA AAGGGAATGC 

1001 TAATTTACTT GTTGACATGC CATTTATTTT GTACATTTCA CTGTCAAAGA 

1051 AGCTACTGGC TTGGATGCTT CTGAGAAATC TATGTGAGAA AAAATTTGAA 

1101 AGGAAGATAT GACTAATGAG TAATTTGCAA GTAAATGTTG TATCTATATA 

1151 TATATATATA TAAAGATTCA AAAGTAGTTC AGCTTTCATA AGTAGAACCA 

1201 ATATAAGGAC GTTGTTTTAG CATTTTTAAT CATTATTTTT AAATAAATGA 

1251 TGTAACAGAG GCTTGATTTG TGTTATGAAA GAT TG AG AAA CTAAATTTTC 

1301 TGTTGATTTA ATTTTTTTGT GCCTTAAAAC TTTGTTAAAT TCCTGAAGTT 

1351 AATTATCATA TTGTACTTTT TGGGGCATAA CTCATTAGCA GATATGTAGT 

1401 GCAGTGATTT ACAAATAATT GAGAGTAAAA TCAGTGATGT ATAAACTAGT 

1451 TCATGAGTCT AGGTAAAATA TCAATTACCT CTGTTTAAAA TGCTCTGTTA 

1501 ATTATTATTG TATGTATTTA AATGTAGTTA AAGCTTTTAA ACATGTTGTT 

1551 ACATAGTGTT AATTCTACAC AGTGCTACAC AGCTTTTAGT GTCACATAGC 

1601 CTTACAGAGT TTATAATGAT GTAGCATCTG CAAAATATAT GCATAGCTTA 

1651 TATCCTATTT TTATAGAGCC AGTAATGGTT TTTGTGATGC TGTATTACTT 

1701 CTGGGTTTTA GACAATAAAG TCTGTTTAAC AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



436 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



ORF from 183 bp to 659 bp; peptide length: 159 
Category: similarity to unknown protein 



1 MRLLERMRKD WFMVGIVLAI AGAKLEPSIG VNGGPLKPEI TVSYIAVATI 
51 FFNSGLSLKT EELTSALVHL KLHLFIQIFT LAFFPATIWL FLQLLSITPI 
101 NEWLLKGLQT VGCMPPPVSS AVILTKAVGG NEAAAI FNSA FGSFLVSKHS 
151 LTCLLQLLL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4mll, frame 3 

PIR:S53951 probable membrane protein YMR034C - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 171, P = 3.2e-12 

PIR:A65015 yfeH protein - Escherichia coli (strain K-12), N = 1, Score 
= 131, P = 4.2e-08 



>PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces 
cerevisiae) 

Length = 434 

HSPs: 

Score = 171 (25.7 bits). Expect = 3.2e-12, P = 3.2e-12 
Identities = 38/144 (26%), Positives - 72/144 (50%) 

Query: 5 E RMRKDWFMVG I VLA I AGAKLE PS IGVNGGPLKPEI TVSYIAVATI FFNSGLSLKTEELT 54 

E ++ WF + + + I A+ P+ +GG +K + ++ Y VA IF SGL +K+ L 
Sbjct: 18 EFLKSQWFFICLAILIVIARFAPNFARDGGLIKGQYSIGYGCVAWIFLQSGLGMKSRSLM 77 

Query: 65 SALVHLKLHLFIQI FTLAFFPATIWLF LQLLSITP I NEWLLKGLQT VGCMPPPVSS A 121 

+ +++ +HI++ +++F +++ I++W+L GL P V+S 

Sbjct: 78 ANMLNWRAHATILVLS FLITS SI VYGFCCAVKAANDPKIDDWVLIGLILTATCPTTVASN 137 

Query: 122 VILTKAVGGNEAAAIFNSAFGSFL 145 

VI+T GGN + G+ L 

Sbjct: 138 VIMTTNAGGNSLLCVCEVFIGNLL 161 



Pedant information for DKFZphf kd2_4mll, frame 3 



Report for DKFZphfkd2_4mll . 3 



[LENGTH] 159 

[MW] 17282.92 

[pi] 9.06 

[HOMOL] PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces cerevisiae) 
5e-12 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YMR034c] 2e-13 

[PROSITE] MYRISTYL 2 

[PROSITE] PKC_PHOSPHO_SITE 1 

[KW] TRANSMEMBRANE 4 



SEQ MRLLERMRKDWFMVGIVLAIAGAKLEPSIGVNGGPLKPEITVSYIAVATI FFNSGLSLKT 

PRD ccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeeeeccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMM . . 

SEQ EELTSALVHLKLHLFIQI FTLAFFPAT I WL FLQLLSITPI NEWLLKGLQT VGCMPPPVSS 

PRD hhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhhhheeeecccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ AVILTKAVGGNEAAAIFNSAFGSFLVSKHSLTCLLQLLL 

PRD ceeeeeccccchhhhhhhcccccceeecceeeeeeeccc 

MEM MMMMMMMMMMMMM4MMMMM>]M'MMMMM 



Prosite for DKFZphf kd2_4mll . 3 

PS00005 57->60 PKC_PHOSPHO_SITE PDOC00005 

PS00008 15->21 MYRISTYL PDOC00008 

PS00008 129->135 MYRISTYL PDOC00008 

(No Pfam data available for DKFZphf kd2_4mll . 3) 
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DKFZphutel_17k7 



group: uterus derived 

DKFZphutel_17k7 encodes a novel 520 amino acid protein with weak similarity to S. Cerevisiae 
Fipl . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S. cerevisiae Fipl 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 1914 bp 

Poly A stretch at pos . 1897, polyadenylation signal at pos . 1867 



1 CGGACGCGTG GGCGGACGCG TGGGGCCTTC CTGGGATTGG AGTCTCGAGC 

51 TTTCTTCGTT CGTTCGCCGG CGGGTTCGCG CCCTTCTCGC GCCTCGGGGC 

101 TGCGAGGCTG GGGAAGGGGT TGGAGGGGGC TGTTGATCGC CGCGTTTAAG 

151 TTGCGCTCGG GGCGGCCATG TCGGCCGGCG AGGTCGAGCG CCTAGTGTCG 

2 01 GAGCTGAGCG GCGGGACCGG AGGGGATGAG GAGGAAGAGT GGCTCTATGG 

251 CGATGAAAAT GAAGTTGAAA GGCCAGAAGA AGAAAATGCC AGTGCTAATC 

301 CTCCATCTGG AATTGAAGAT GAAACTGCTG AAAATGGTGT ACCAAAACCG 

351 AAAGTGACTG AGACCGAAGA TGATAGTGAT AGTGACAGCG ATGATGATGA 

401 AGATGATGTT CATGTCACTA TAGGAGACAT TAAAACGGGA GCACCACAGT 

4 51 ATGGGAGTTA TGGTACAGCA CCTGTAAATC TTAACATCAA GACAGGGGGA 

501 AGAGTTTATG GAACTACAGG GACAAAAGTC AAAGGAGTAG ACCTTGATGC 

551 ACCTGGAAGC ATTAATGGAG TTCCACTCTT AGAGGTAGAT TTGGATTCTT 

601 TTGAAGATAA ACCATGGCGT AAACCTGGTG CTGATCTTTC TGATTATTTT 

651 AATTATGGGT TTAATGAAGA TACCTGGAAA GCTTACTGTG AAAAACAAAA 

7 01 GAGGATACGA ATGGGACTTG AAGTTATACC AGTAACCTCT ACTACAAATA 

751 AAATTACGGT ACAGCAGGGA AGAACTGGAA ACTCAGAGAA AGAAACTGCC 

801 CTTCCATCTA CAAAAGCTGA GTTTACTTCT CCTCCTTCTT TGTTCAAGAC 

851 TGGGCTTCCA CCGAGCAGGA GATTACCTGG GGCAATTGAT GTTATCGGTC 

901 AGACTATAAC TATCAGCCGA GTAGAAGGCA GGCGACGGGC AAATGAGAAC 

951 AGCAACATAC AGGTCCTTTC TGAAAGATCT GCTACTGAAG TAGACAACAA 

1001 TTTTAGCAAA CCACCTCCGT TTTTCCCTCC AGGAGCTCCT CCCACTCACC 

1051 TTCCACCTCC TCCATTTCTT CCACCTCCTC CGACTGTCAG CACTGCTCCA 

1101 CCTCTGATTC CACCACCGGG TTTTCCTCCT CCACCAGGCG CTCCACCTCC 

1151 ATCTCTTATA CCAACAATAG AAAGTGGACA TTCCTCTGGT TATGATAGTC 

1201 GTTCTGCACG TGCATTTCCA TATGGCAATG TTGCCTTTCC CCATCTTCCT 

1251 GGTTCTGCTC CTTCGTGGCC TAGTCTTGTG GACACCAGCA AGCAGTGGGA 

1301 CTATTATGCC AGAAGAGAGA AAGACCGAGA TAGAGAGAGA GACAGAGACA 

1351 GAGAGCGAGA CCGTGATCGG GACAGAGAAA GAGAACGCAC CAGAGAGAGA 

14 01 GAGAGGGAGC GTGATCACAG TCCTACACCA AGTGTTTTCA ACAGCGATGA 

14 51 AGAACGATAC AGATACAGGG AATATGCAGA AAGAGGTTAT GAGCGTCACA 

1501 GAGCAAGTCG AGAAAAAGAA GAACGACATA GAGAAAGACG ACACACGGAG 

1551 AAAGAGGAAA CCAGACATAA GTCTTCTCGA AGTAATAGTA GACGTCGCCA 

1601 TGAAAGTGAA GAAGGAGATA GTCACAGGAG ACACAAACAC AAAAAATCTA 

1651 AAAGAAGCAA AGAAGGAAAA GAAGCGGGCA GTGAGCCTGC CCCTGAACAG 

1701 GAGAGCACCG AAGCTACACC TGCAGAATAG GCATGGTTTT GGCCTTTTGT 

1751 GTATATTAGT ACCAGAAGTA GATACTATAA ATCTTGTTAT TTTTCTGGAT 

1801 AATGTTTAAG AAATTTACCT TAAATCTTGT TCTGTTTGTT AGTATGAAAA 

1851 GTTAACTTTT TTTCCAAAAT AAAAGAGTGA ATTTTTCATG TTAAGTTAAA 
1901 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 168 bp to 1727 bp; peptide length: 520 
Category: similarity to known protein 



1 MSAGEVERLV SELSGGTGGD EEEEWLYGDE NEVERPEEEN ASANPPSGIE 
51 DETAENGVPK PKVTETEDDS DSDSDDDEDD VHVTIGDIKT GAPQYGSYGT 
101 APVNLNIKTG GRVYGTTGTK VKGVDLDAPG SINGVPLLEV DLDSFEDKPW 
151 RKPGADLSDY FNYGFNEDTH KAYCEKQKRI RMGLEVIPVT STTNKITVQQ 
201 GRTGNSEKET ALPSTKAEFT SPPSLFKTGL PPSRRLPGAI DVIGQTITIS 
251 RVEGRRRANE NSNIQVLSER SATEVDNNFS KPPPFFPPGA PPTHLPPPPF 
301 LPPPPTVSTA PPLIPPPGFP PPPGAPPPSL IPTIESGHSS GYDSRSARAF 
351 PYGNVAFPHL PGSAPSWPSL VDTSKQWDYY ARREKDRDRE RDRDRERDRD 
401 RDRERERTRE RERERDHSPT PSVFNSDEER YRYREYAERG YERHRASREK 
451 EERHRERRHR EKEETRHKSS RSNSRRRHES EEGDSHRRHK HKKSKRSKEG 
501 KEAGSEPAPE QESTEATPAE 

BLASTP hits 

Entry AF016427_4 from database TREMBL: 

gene: "F32D1.9"; Caenorhabditis elegans cosmid F32D1. 

Score - 392, P = 1.8e-36, identities - 156/519, positives = 212/519 

Entry S62454 from database PIR: 

hypothetical protein SPAC22G7 . 10 - fission yeast (Schizosaccharomyces 
pombe) 

Score = 246, P = 2.0e-22, identities = 62/163, positives = 91/163 

Entry A56545 from database PIR: 

FIP1 protein - yeast (Saccharomyces cerevisiae) 

Score = 186, P = 2.9e-16, identities - 56/206, positives = 92/206 



Alert BLASTP hits for DKFZphutel_17k7, frame 3 

TREMBLNEW:AF109907_1 product: "S164"; Homo sapiens S164 gene, partial 
cds; PS1 and hypothetical protein genes, complete cds; and S171 gene, 
partial cds., N - 2 , Score = 236, P = 1.5e-16 



>TREMBLNEW:AF109907_1 product: "S164"; Homo sapiens S164 gene, partial cds; 
PS1 and hypothetical protein genes, complete cds; and S171 gene, partial 
cds . 

Length = 735 

HSPs: 



Score = 236 (35.4 bits), Expect = 1.5e-16, Sum P(2) = 1.5e-16 
Identities = 51/120 (42%), Positives = 76/120 (63%) 



Query : 


383 


REKDRDRERDRDRERDRDP.DRERERTRERERERDHSPTPSVFNSDEERYRYREYA ER 


439 






REK+++RER+R+R+RDRDR +ER+R R+RER+RD S + +++R R RE + ER 




Sbjct: 


227 


REKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSS-DRNKDRSRSREKSRDRER 


285 


Query: 


440 


GYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSK 


498 






ER R + ER RER R RE+E R + + +R E +E D++ R K ++ R K 




Sbjct: 


286 


EREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREK 


345 


Query: 


499 


E 499 




Sbjct: 


346 


E 

E 346 




Score 


= 214 


(32.1 bits), Expect = 4.4e-14, Sum P(2) = 4.4e-14 




Identities = 


= 50/133 (37%), Positives = 75/133 (56%) 




Query: 


383 


REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS-DEERYRYREYAERG 


440 






RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 




Sbjct: 


208 


RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 


266 


Query: 


441 


YERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSKEG 


500 






+R++ E+ R+R RE+E R+R R R E+R+++KK 




Sbjct: 


2 67 


SDRNKDRSRSREKSRDRE-RERERERERE-REREREREREREREREREREREREKDKKRD 


324 


Query: 


501 


KEAGSEPAPEQESTE 515 








+E E A E+ E 




Sbjct: 


325 


REEDEEDAYERRKLE 339 
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Score = 214 
Identities ■ 



Query: 


383 


Sbjct: 


208 


Query: 


441 


Sbjct: 


267 


Query: 


498 


Sbjct: 


327 


Score 


= 210 


Identities ■■ 


Query: 


383 


Sbjct: 


235 


Query: 


440 


Sbjct: 


294 


Query: 


490 


Sbjct: 


354 


Score 


- 205 


Identities ■ 


Query: 


372 


Sbjct: 


228 


Query : 


430 


Sbjct: 


285 


Query: 


480 


Sbjct: 


344 


Score 


- 202 


Identities : 


Query: 


383 


Sbjct: 


277 


Query: 


443 


Sbjct: 


335 


Score 


= 183 


Identities ■ 


Query: 


372 


Sbjct: 


178 


Query: 


430 


Sbjct: 


231 


Query: 


487 


Sbjct: 


289 


Score 


= 171 


Identities ■ 


Query: 


383 


Sbjct: 


285 


Query: 


443 


Sbjct: 


345 



(32.1 bits), Expect = 4.4e-14, Sura P(2) = 4.4e-14 
■ 55/141 (39%), Positives = 80/141 (56%) 

REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS-DEERYRYREYAERG 4 40 
RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 

RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 2 66 

YERHR-ASREKEE-RHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 4 97 

+R++ SR +E+ R RER R RE+E R + R EE RKKKR 

SDRNKDRSRSREKSRDREREREREREREREREREREREREREREREREREREKDKKRDRE 326 

KEGKEAGSEPAPEQESTEATPA 519 
++ ++A E++ E A 

EDEEDAYERRKLERKLREKEAA 348 



(31.5 bits), Expect = 1.2e-13, Sura P(2) 
■ 59/142 (41%), Positives = 78/142 (54%) 



1.2e-13 



REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS DEERYRYREYAER 4 39 

RE++RDR+RDR +ERDRDRDRER+R R+RER D + S D ER R RE ER 

RERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKSRDRERERERE-RER 2 93 

GYERHRA-SREKE— ERHRER-RHREKEETRHKSS RSNSRRRHESEEGDSHRRH 4 89 

ER R RE+E ER RER R REK++ R + R R+ +E R 

EREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREKEAAYQERL 353 

KHKKSKRSKEGKEAGSEPAPEQE 512 
K+ + + K+ +E E E+E 
KNWEIRERKKTREYEKEAEREEE 376 

(30.8 bits), Expect = 4.4e-13, Sura P(2) = 4.4e-13 
= 59/149 (39%), Positives = 83/149 (55%) 

DTSKQWDYYARREKDRDR — ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 429 
+ K+ + R++DRDR ERDRDR-R+RDRDR+RER+ +R ++R S S D E 

EKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKS RDRE 284 

RYRYREYAERGYERHRA-SREKE- ERHRER-RHREKEETRHKSS RSNSRRRHE 479 

R R RE ER ER R RE+E ER RER R REK++ R + R R+ 

RERERE-REREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLR 343 

SEEGDSHRRH KHKKSKRSKEGKEAGSEPAPEQE 512 

+E R K+ + + K+ +E E E+E 

EKEAAYQERLKNWEIRERKKTREYEKEAEREEE 376 

(30.3 bits), Expect = 9.6e-13, Sum P(2) = 9.6e-13 



REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 4 42 
REK RDRER+R+RER+R+R+RERER RERERER+ D++R REE YE 

REKSRDRERERERERERERERERERERERERERERSRERER-EKDKKRDR-EEDEEDAYE 334 

RHRASREKEERHRERRHREKEETRHKSSRSNSRR-RHESEEGDSHRRHKHKKSKRSKE 4 99 
R + E++ R +E ++E+ + R +R E+E + RR K++KR KE 

RRKL--ERKLREKEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE 390 

(27.5 bits), Expect = 1.2e-10, Sum P(2) = 1.2e-10 
-- 52/141 (36%), Positives = 79/141 (56%) 

DTSKQWDYY-ARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 4 29 
DT K+ + ++EK+R E++R RER+R+R+RERER RERERER+ ++E 
DTHKKLEEEKGKKEKERQEIEKER-RERERERERERER-RERERERERER EREKE 230 

RYRYREYAERGYERHRASREKEERHRER RHREKEETRHKSSRSNSRRRHESEEGDSH 4 86 

+ R RE ER +R R +R RER R RE+ R+K RS SR + E + 

KERERE-RERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKD-RSRSREKSRDRERERE 288 

RRH KHKKSKRSKEGKEAGSEPAPEQE 512 
R + ++ + + +E E E+E 
RERERERERERERERERERERERERE 314 

(25.7 bits), Expect = 2.5e-09, Sum P(2) = 2.5e-09 
- 49/150 (32%), Positives = 78/150 (52%) 

REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 4 42 
RE++R+RER+R+ RER+R+R+RERER RERERER+ +E+ Y R+ + E 

REREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLRE 34 4 

RHRASREK EERHRERRHR EKEETRHKSSRSNSRRRHES-EEGDSHRRH-KH 4 91 

+ A +E+ ER + R + E+EE R + ++R E E+ D R K+ 

KEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKY 4 04 
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Query: 492 KKSKRSKEGKEAGSEPAPEQESTE 515 

+K R +E + E ++E E 
Sbjct: 405 YRGSALQKRLRDREKEMEADERDRKREKEE 434 

Score = 162 (24.3 bits), Expect = 2.4e-08, Sum P(2) = 2.4e-08 
Identities = 45/141 (31%), Positives = 74/141 (52%) 



Query: 


372 




431 






+ SK D + + E+++ ++ +E +++R RERER RERERER + ER 




Sbjct: 


172 


EISKFRDTHKKLEEEKGKKEKERQEIEKER-RERERERERERERRERERER--ERERERE 


228 


Query : 


432 


RYRFYAFRGYFRHR A^RFKFFRHRFR-RHREKEETRHKS^R^NSRRRHESEEGD^HRRHK 


4 90 






+ +E ER ER R +ER R+R R R+++ R +SS N R E+ R + 




Sbjct: 


229 


KEKE-RERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKSRDRERER 


287 


Query: 


491 


PKK e ?'K'RC:i(rFnKFAr''?FPADFnF 9 








+ + +R +E +E E E+E 




Sbjct: 


288 


ERERERERE-RERERERERERE 308 




Score 


= 137 


(20.6 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05 




Identities ■ 


= 48/152 (31%), Positives = 68/152 (44%) 




Query: 


364 


Zi DCTiTDCT TJnTCimMnVVaDDFirnDnD-FRnDnRFRnDnDnRFDFRTRFPFRFRriHQPTPC 


422 






AP P + T + + E RD R+ + RD + E E+ + +E+ER 




Sbjct: 


143 


APLI PYPLITKEDINAI EMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKER 


201 


Query: 


423 


V £ No UtLKl Kl KtL I AtWj I £.KriKA — bKLRt - fcjKriKfcjK— KflK£.i\E. JL 1 Kril\o bKbNoKKKH 


Al R 




+ ER R RE ER ER R REKE ER RER R R+++ T+ + R R R 




Sbjct: 


202 


R-ERERERERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRD 


2 60 


Query: 


479 










E S R + S + +E E E+E 




Sbjct: 


261 


RDRERSSDRNKDRSRSREKSRDRERERERERERE 294 




Score 


= 126 


(18.9 bits), Expect = 1.8e-04, Sum P(2) = 1.8e-04 




Identities ■ 


■ 41/149 (27%), Positives = 66/149 (44%) 




Query: 


375 




A 9 O 






K W+ R+K R+ E + + +RE +R R+ +E R +E D+ P + ++ 




Sbjct: 


354 


KNWEI-RERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKYYRGSALQK 


412 


Query: 


430 


DVOVDITVrvTrDrVCDIIDaCD CkTCDUDrDD UD JTIf FCTI? W VC QQQMCrOCDUirC — — IT 

Kl Kl Kt, I rtlrtfj I CjKH rOioKfc. MitiKHKCKK" ~- — — — — nKCj[\ Ck i KHftooKiNSKKKn ~~ iL> 


A ft T 
H 0 x 






R R RE ER R REKEE R+ H + + + + RRR + 




Sbjct: 


413 


RLRDREKEMEADERDR-KREKEELEEIRQRLLAEGHPDPDAELQRMEQEAERRRQPQIKQ 


471 


Query: 


482 










E +S + K+ K K + E PEQ+ 




Sbjct: 


472 


EPESEEEEEEKQEKEEKREEPMEEEEEPEQK 502 




Score 


= 124 


(13.6 bits), Expect = 3.0e-04, Sum P(2) = 3.0e-04 




Identities = 


= 41/141 (29%), Positives = 65/141 (46%) 




Query : 


380 


YnRPFKnRn-RFRnRnRFRnRnRnRFRFRTRFRFRFRDH^PTPSVFN^DFFR YR YRFYAF 


4 38 






Y R K+ + RER + RE +++ +RE ER RE +E + + D++R + Y 




Sbjct: 


349 


YQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE-FLEDYDDDRDDPKYYRG 


407 


Query : 


439 


r^yfrhr a^rfkffrhrfr-rhrfkfftrhks^rsnsrrrhf^ffgd^hrrhkhkk^kr^ 


4 97 






+ + REKE ER R REKEE R + H ++R + ++R 




Sbjct: 


408 


SALQKRLRDREKEMEADERDRKREKEELEEIRQRLLAEG-HPDPDAELQRMEQEAERRRQ 


466 


Query : 


4 98 


KEGKEAGSEPAPEQESTEATPAE 520 








+ K+ EP E+E E E 




Sbjct: 


4 67 


PQIKQ EPESEEEEEEKQEKE 486 




Score 


= 121 


(18.2 bits), Expect = 6.2e-04, Sum P(2) = 6.2e-04 




Identities - 


= 43/149 (28%), Positives = 67/149 (44%) 




Query: 


364 


APSWPSLVDTSKQHDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 


422 






AP P + T + + E RD R+ + RD + E E+ + +E+ER 




Sbjct: 


143 


APLI PYPLITKEDINAI EMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKE- 


200 


Query : 


423 


VFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEE 


482 






+ ER R RE R ER R RE+E + R RE+E R + R+ R R E 




Sbjct: 


201 


— RRERERERERERERRERERER-EREREREKEKERERERERDRDRD-RTKERDRDRDRE 


256 


Query: 


483 


GDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 








D R + + S R+K+ + E + ++E 




Sbjct : 


257 


RDRDR-DRERSSDRNKD-RSRSREKSRDRE 284 




Score 


= 105 


(15.8 bits), Expect = 3.1e-02, Sum P(2) = 3.1e-02 
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Identities = 25/73 (34%), Positives = 33/73 (45%) 



Query : 


428 


EERYRYREYAERGYERHRASREKE-ERHRERRHREKEETRHKSSRSNSRRRHESEEGDSH 


486 






EE +E + E-t- R RE+E ER RERR RE+E R + REE 




Sbjct: 


184 


EEEKGKKEKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDR 


243 


Query : 


487 


RRHKHKKSKRSKE 499 








R K + R +E 




Sbjct: 


244 


DRTKFRDRDRDRE ?56 




Score 


= 105 


(15.8 bits), Expect = 3.1e-02, Sum P(2) = 3.1e-02 




Identities : 


= 31/87 (35%), Positives = 45/87 (51%) 




Query: 


382 


RREKDRDRERDRDRERDRDRDRER-ERTRERERERDHSPTPSVFNSDEERYRYREYAERG 


440 






+R +DR++E + D ERDR R++E E R+R H P P D E R + AER 




Sbjct: 


412 


KRLRDREKEMEAD-ERDRKREKEELEEIRQRLLAEGH-PDP DAELQRMEQEAERR 


4 64 


Query: 


441 


YERHRASREKEERHRERRHREKEETRHK 4 68 








+ + +E E E +EKEE R + 




Sbjct: 


4 55 


-RQPQIKQEPESEEEEEEKQEKEEKREE 4 91 




Score 


= 46 


(6.9 bits). Expect = 1.5e-16, Sum P(2) = 1.5e-16 





Identities = 13/49 (26%), Positives = 21/49 (42%) 

Query: 54 AENGVPKPKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAP 102 

A NG +P+ +D+ D + D + G 1+ +Y S AP 
Sbjct: 70 ASNGNARPETVTNDDEEALDEETKRRDQMIK-GAIEVLIREYSSELNAP 117 

Score = 46 (6.9 bits), Expect = 1.8e-04, Sum P(2) = 1.8e-04 
Identities = 14/53 (26%), Positives = 21/53 (39%) 

Query: 30 ENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDEDDVH 82 

+EERE EE E ++EED D ++DE+D + 

Sbjct: 282 DRERERERERERERERERERERER-EREREREREREREKDKKRDREEDEEDAY 333 

Score = 44 (6.6 bits), Expect = 2.0e-13, Sum P(2) = 2.0e-13 
Identities = 13/60 (21%), Positives = 21/60 (35%) 

Query: 20 DEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDED 79 

++E +++EERE + E K+EEDDD +D 

Sbjct: 191 EKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDRDRTKERD 250 



Pedant information for DKFZphutel_17k7, frame 3 



Report for DKFZphutel_17k7 . 3 



[LENGTH] 520 

[MW] 58375.30 

[pi] 5.41 

[HOMOL] PIR:S62454 hypothetical protein SPAC22G7 . 10 



(Schizosaccharomyces pombe) 3e-18 



[a. 



[FUNCAT] 04.05.05 mrna processing (5'-end, 3 

cerevisiae, YJR093c] 2e-13 

[ FUNCAT ] 30.10 nuclear organization 

[PROSITE] MYRISTYL 9 

[PROSITE] AMIDATION 1 

[PROSITE] CK2_PHOSPHO_SITE 

[PROSITE] TYR_PHOSPHO_SITE 

[PROSITE] PKC_PHOSPHO_SITE 

[PROSITE] ASN_GLYCOSYLATION 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 



fission yeast 
end processing and mrna degradation) 
cerevisiae, YJR093c] 2e-13 



18 

2 

12 

2 

35.00 % 



SEQ MSAGEVERLVSELSGGTGGDEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPK 

SEG xxxxxxxxxx 

PRD cccchhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAPVNLNIKTGGRVYGTTGTK 

SEG . . . xxxxxxxxxxxxxxxxx 

PRD cceeeecccccccccccccceeeeeccccccccccccccccceeeeeecccceeeccccc 

SEQ VKGVDLDAPGSINGVPLLEVDLDSFEDKPWRKPGADLSDYFNYGFNEDTWKAYCEKQKRI 

SEG 

PRD ceeeccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RMGLEVIPVTSTTNKITVQQGRTGNSEKETALPSTKAEFTSPPSLFKTGLPPSRRLPGAI 



443 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



PRD hhhheeeeeccccceeeeeeecccccccccccccceeeeccccceeeecccccccccccc 

SEQ DVIGQTITISRVEGRRRANENSNIQVLSERSATEVDNNFSKPPPFFPPGAPPTHLPPPPF 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ccccceeeeeecccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ LPPPPTVSTAPPLIPPPGFPPPPGAPPPSLIPTIESGHSSGYDSRSARAFPYGNVAFPHL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccceeeccc 

SEQ PGSAPSWPSLVDTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . . 

PRD ccccccccceeeccccchhhhhhhhhhccccccccccccccchhhhhhhhhhhhcccccc 

SEQ PSVFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHES 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

SEQ EEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQESTEATPAE 

SEG XX . . xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccc 



Prosite for DKFZphutel_17k7 . 3 



PS00001 


40 


->44 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


278- 


>282 


ASN GLYCOS YLATION 


PDOCUUUU1 


PS00005 


1 69- 


>172 


PKC_PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


193- 


>196 


PKC PHOSPHO_ 


SITE 


PDOCUUUUb 


PS00005 


206- 


>209 


PKC PHOSPHO 


SITE 


PDOCUU U J j 


PS00005 


214- 


>2 17 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


233- 


>2 3 6 


PKC PHOSPHO 


SITE 


PDOCUU U J o 


PS00005 


2 68- 


>2 7 1 


rr\<_ FriUofcrriU 


SITE 


PUUL, IJUU UD 


P500005 


346- 


>349 


PKC~PHOSPHO" 


"SITE 


PDOOUUUUD 


rbUUUU D 


T "7 T — 


> J / D 


PKC PHOSPHO 


"site 




PS00005 


459- 


>472 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


474- 


>477 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


485- 


>488 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


494- 


>4 97 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 




2->6 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


17 


->21 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


47 


->51 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


64 


->68 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


66 


->70 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


70 


->74 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


72 


->76 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


74 


->78 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


84 


->88 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


144- 


>148 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


206- 


>210 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


215- 


>219 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00006 


250- 


>254 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


271- 


>275 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


273- 


>277 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


340- 


>344 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


369- 


>373 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


426- 


>430 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


434- 


>442 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


152- 


>161 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


15 


->21 


MYRISTYL 




PDOC00008 


PS00008 


96- 


>102 


MYRISTYL 




PDOC00008 


PS00008 


115- 


>121 


MYRISTYL 




PDOC00008 


PS00008 


130- 


>136 


MYRISTYL 




PDOC00008 


PS00008 


154- 


>160 


MYRISTYL 




PDOC00008 


PS00008 


229- 


>235 


MYRISTYL 




PDOC00008 


PS00008 


244- 


>250 


MYRISTYL 




PDOC00008 


PS00008 


289- 


>2 95 


MYRISTYL 




PDOC00008 


PS00008 


362- 


>3 68 


MYRISTYL 




PDOC00008 


PS00009 


253- 


>257 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphutel_17k7 . 3) 
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group: uterus derived 

DKFZphutel_18cl2 encodes a novel 378 amino acid protein nearly identical to human 
WUGSC :H_DJ0872F07 . 1 protein. 

The novel protein has an additional N-terminal domain, which is not present in 
WUGSC : H_DJ0872F07 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



nearly identical to human WUGSC : H_DJ0872F07 . 1 protein 

on genomic level encoded by AC004537, 10 exons the predicted 
protein sequence AC004537 _1 is only partialy o.k. first exon wasn't 
predicted there are additional exons predicted 
(BLASTX/EST-BLAST shows that the cDNA is only party spliced) 
intron ~1216-3540/V~3577-5059 

Sequenced by AGOWA 

Locus: map="7q31" 

Insert length: 6005 bp 

Poly A stretch at pos . 5980, polyadenylation signal at pos . 5968 



1 AGCGGGTGCT GCTAGCGGAG GCGCCATATT GGAGGGGACA AAACTCCGGC 
51 GACAGCGAGT GACACAAATA AACCCCTGGA CCCCCTTGTT CCCTCAGCTC 
101 TAAGGGCCGC GATGTTGTAC CTAGAAGACT ATCTGGAAAT GATTGAGCAG 
151 CTTCCTATGG ATCTGCGGGA CCGCTTCACG GAAATGCGCG AGATGGACCT 
201 GCAGGTGCAG AATGCAATGG ATCAACTAGA ACAAAGAGTC AGTGAATTCT 
251 TTATGAATGC AAAGAAAAAT AAACCTGAGT GGAGGGAAGA GCAAATGGCA 
301 TCCATCAAAA AAGACTACTA TAAAGCTTTG GAAGATGCAG ATGAGAAGGT 
351 TCAGTTGGCA AACCAGATAT ATGACTTGGT AGATCGACAC TTGAGAAAGC 
401 TGGATCAGGA ACTGGCTAAG TTTAAAATGG AGCTGGAAGC TGATAATGCT 
4 51 GGAATTACAG AAATATTAGA GAGGCGATCT TTGGAATTAG ACACTCCTTC 
501 ACAGCCAGTG AACAATCACC ATGCTCATTC ACATACTCCA GTGGAAAAAA 
551 GGAAATATAA TCCAACTTCT CACCATACGA CAACAGATCA TATTCCTGAA 
601 AAGAAATTTA AATCTGAAGC TCTTCTATCC ACCCTTACGT CAGATGCCTC 
651 TAAGGAAAAT ACACTAGGTT GTCGAAATAA TAATTCCACA GCCTCTTCTA 
701 ACAATGCCTA CAATGTGAAT TCCTCCCAAC CTCTGGGATC CTATAACATT 
751 GGCTCGTTAT CTTCAGGAAC TGGTGCAGGG GCAATTACCA TGGCAGCTGC 
801 TCAAGCAGTT CAGGCTACAG CTCAGATGAA GGAGGGACGA AGAACATCAA 
851 GTTTAAAAGC CAGTTATGAA GCATTTAAGA ATAATGACTT TCAGTTGGGA 
901 AAAGAATTTT CAATGGCCAG GGAAACAGTT GGCTATTCAT CATCTTCGGC 
951 ACTTATGACA ACATTAACAC AGAATGCCAG TTCATCAGCA GCCGACTCAC 
1001 GGAGTGGTCG AAAGAGCAAA AACAACAACA AGTCTTCAAG CCAGCAGTCA 
1051 TCATCTTCCT CCTCCTCTTC TTCCTTATCA TCGTGTTCTT CATCATCAAC 
1101 TGTTGTACAA GAAATCTCTC AACAAACAAC TGTAGTGCCA GAATCTGATT 
1151 CAAATAGTCA GGTTGATTGG ACTTACGACC CAAATGAACC TCGATACTGC 
1201 ATTTGTAATC AGGTAAAAGT CTGTTATATC TATAAAAGTA TAATCTGAAT 
12 51 AAACTAGAAG GAAGAGAACT ATTTCATTTT TAAGCACTTT TTTAAACTCA 
1301 CTTAAAATAC CTTTGCTTTA TTTGTATACT TTTCTCCCCC TTCTTACAAA 
1351 AGTGACATTT GCTGTAAATA CTGAGTATAA AGAAAAATGT TACCCATAAT 
14 01 CCTAGCCCTC AGATACAACC TGTAACTAAA CATTTTTGGT ATACCACTAC 
14 51 CATATACCTC ATGTGCACAT TGGCTGCCTT AATAAAATAC AACAGACTGG 
1501 GTAGCTTAAA CAACAGAAAA TAATTTTCTC ACAGGTATGA AGGCTGGGAA 
1551 GTCCAAGATC AAGGTGTCCA CTGACTCAGT TCTGGAGGAG GGCTCCCTTC 
1601 CTAGATGGAG ACTGCTGCCT TCTCACCGGG TCCTCACATG ATAGAGGGAG 
1651 AAAGAGTGTG CTCTGGTGTC TTTTCTTATA AGGGCACCAG CCTTGTCAGA 
1701 GTAGGACCCC ACTCTATGAC CTCATTTAAC CTTTACCACC TCCTCACAGG 
1751 CCCTGTTTCC AATTATAGTC ACGTTGGGGG TTAGGGCTTC AACATATGAT 
1801 TTTGAGACAT AAGCTTGCAT TTCATAACAC GTGTCTATGC AGATTTGCAC 
1851 ATGCATGTGT GTATAAGTTT GTCAGTAGGA ACCACAGTGT ATACTTTCTT 
1901 GTTACTGGCT TTTTTCTCTA AATCAGGTAT ACCGAACATG ATTTTTCTTT 
1951 AAGATCATAT TTTTAATTTT CACATAGTTA TCTCTTATGC CATCCAGTGT 
2001 AGTTTTCTTA ACCAATACCT AGCTATAGAT TATATTAGTG GTTTTAATTT 
2051 GTTTGAAATT AGGGATAATA TTACGATAGG CATTTTTTAA ATGTAATCCA 
2101 TTTTATACAT CTAATTTCTT GGATAATCTT TTAGAAATAA AATTAGGCTG 
2151 TAAATATTTG ACAGACACCA AAATATATTT TCTAGAAATT TATTACCAAA 
2201 AATTAATAAA CATACCGGTT TACTAAACCC TGTCCAACAC TGGATATTAT 
2251 TTTCTTTTAA AAACTAAGTA CCAATTTGGT AGTTTTATAT TATGATTGTT 
2301 TTAAATACAC TAGTATTATT GAAGTTGGAC ATTTTTTGAC CATTTTTGTT 
2351 TTTTACATTA TGAATCGACT CCTAATGGTG TCGGCTGATT TTTCTATTGT 
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24 01 TTTTGTTATG TACTCTAAAT 
24 51 TCTAAAATTT TAATTTTATG 
2501 GCAAGCCATG GATTATATAC 
2551 CTACAAAAAA TTGTCTTGTA 
2601 TGTCTAGTTC TTTGCATGCT 
2651 TCAGCAGATT ATTGTGTCTA 
2701 AATTTTAACA ATTTGTTAGC 
2751 AGTACAAATG ATAGAAAAGC 
2801 TACTTGCAAA AGAACAAAAT 
28 51 ACTAAAAAAC TAGAAGGTGA 
2901 AATTAAGAGC CCCCAAAAAC 
2951 AGGACCATGT AAATATATTT 
3001 ATCTAGTTAA TCCTGTCTGA 
3051 GAACTCAAGA ATTAGCAAGT 
3101 AAAGAAAAAT GAGAAAGGAC 
3151 AAGGAGACTA TTAATTGCAA 
3201 GTAAAAACTT TCAGTAAGTA 
3251 CAATAAGAAC TACCCAGAAA 
3301 TCAAGTTCTT CCAGACTTCT 
3351 CCACCCTTGC CCCACCCTGC 
3401 TACCTTGATT CTCAATGTAA 
34 51 CCAGCAGTAT CATACATAAA 
3501 TAGTCATAAT AAAGACATTT 
3551 GATGGTGGGA TGTGATAACC 
3 601 GAATGAAAAA AATCACAGGT 

3 651 TGTATGGTTT GGTCTAAGAA 

37 01 TGCTAATGCT AGAATATTCC 
3751 GTTCCCTATT TTAATTTTTA 
3801 GTCTCAGAAG TGTTATGTTG 

38 51 TTACTAAAAA ATACTGTGTT 
3901 AGAAATGGAA TTTTAAAACA 
3951 GTGAAGGGTG GGATAATTGA 
4001 GAGTATAATA TTTTTTCCTG 
4051 CCCCTGATCT AAAAAGTAAA 
4101 TTAGGAGACT TAATTTTAGT 
4151 ATATTGTTGT CTAGTAATTT 
4201 TCCTAAGATT AACAAGAGTT 
4251 GGAGCTGGCT AGGTGCATTA 
4301 TTACAGCAAC ATGCTCACGT 
4351 TGTCTAGAAT CCCTTGTAGG 

4 4 01 CCCAACCATT TCAAGAACAT 
44 51 AAATGATGAA TGAGATTTCT 
4501 TTCTTATTCC CAGTTGTGCT 
4551 TTTTCTGCTT AATATGTTTC 
4601 GGAATAGAAA CACCCACGTA 
4 651 GTCTTTAATT ACAATCAGCT 
4701 TGACTATTAA ATAAAATCTT 
4751 CATCCCACTT CTTTTTGCCC 
4801 CATTAGGATG AGGGGACTAG 
4851 CATAATTTTT CTGTGCCTTC 
4901 TCTAAATGAT GTCTAAATAA 
4951 CTAATTATTG TTAAATTAAA 
5001 ATATACTCTC TTGTCTTCAG 
5051 GCTTTCTAGT GCCCTATAGA 
5101 AGAGGCACCA AAAGGCAAAT 
5151 AGAGAAGAGG CAGCAGACAC 
5201 AAATAAACTT CAGCTGAAGA 
5251 AGAAAGAAGA AACAATGCAT 
5301 GACAATCCTA TAAGATCTTG 
5351 ATGTAAGTAA ATTATTTATG 
54 01 GTTAGCCTTG GATTATTTCA 
5451 TCAACCATTT TCTCAAAGTA 
5501 ATTCCAATGA TGAAGATTTT 
5551 TTCTGCTGCA TGTACTGTAC 
5601 ATGGTTGCAA AAAAAAAAAA 
5651 AATGGTTTTT AAAATGCCTT 
5701 ATTCAGCAGG CTGAAGGAAA 
5751 TCTAGAGTAC CTGGGTACAT 
5801 TTGTGCCATT AGTCTTTCTA 
5851 TGAGGGAGGG GGCGGGGGAC 
5901 TTTTATACCA AATGTGTTTA 
5951 AATTGTATTA GGTGTTAAAA 
6001 AAAAA 



ATTTGCTTGA TTTAGTTTTT TAAAAATAAT 
TAGTTATGAC TGTTAATTTT TTTTTATGAA 
TTAGAAGGGC TTTCTCTTTG GCTCTTCTTT 
TAATATTTTC TCCTAGTTTT TATATGGTTT 
TCAGTTTCTT CACATTTAAG ACTTAGTCTA 
ACAGTATGAG TTGCCAGTCT GATTTTTAAA 
TGTTCCACTA TCACCCGATA AACATTTTTC 
ATATCCTGTA TCCTGACAAC AAAAGTAGAT 
CAGACTGAAC CTAGAGTTTT CCTCTGTAAC 
TGGAATATGT CTGTAGAGCT TTCAGGGAAA 
TTGATATTCA GAGAAGTTAT TTCTCTGCAT 
TCACTCATGC AGAGAATCAG AAGATATGCC 
AAAATTATTC AATCCACTGA GAACTTCAGT 
TATGCCCTAA AGTGCTGGTG ATGAAGAGCA 
ATAAAATAGA TAAGTTTAGA AGTTTCAAGG 
AAATATATAT GACCTAATGT GACCCAAGAA 
AATAATCAAG AAAGGAACTT AAAATTTTTA 
GATGACTCCT TCATCCGGGT GATTTATATG 
GAAGGGCAGA TAATTCCTGT GCATTTCTTC 
CCAAAAGAGT ATTTCAGGAA AAAATTATTA 
TTGTATATTC AGTGTATTTC CCTTTATTTT 
CAGTTAATTG GTATCTAGGT GTTTGTTACA 
AATTTTTTTT AACTAGGTAT CTTATGGTGA 
AAGATGTAAG TATTACATTT TTCTATTTAG 
TGTTATTACT TGAATATTTG TCTTATTTGC 
AACAGGTTTG CAGGTATATT AGTTATGTTA 
TCTTCAAAAT AGGGTAGTGT CCCTTAATGT 
AAGCTAATTT TATGGTTTTA TGTGCAGATT 
TATGAAAATT ATAAATACCC TCCTTTCCCT 
TACTAGAATC CAGTTCATTT ATCACATTGA 
ATTCATTCTT TCAGGCTGCA CCGTGCTAAA 
GGATCTAATG TGAGATTATC TTCCTCTCAT 
TACTCTGCAG GTGTCAGCTG ATAAGAGCCA 
GGAAATTTGA AAGGAAGGAA TTCTTGGTTT 
TAGAGATACG TTTTTTATTC AATACTGAGA 
TGACTCCCTC CTTATTTAGT AGTGACAGGA 
TTAAATTTGT AAAACAATCT GAAGATTGAG 
AAATGTGTAC TTTTCCTAGA CCTGATAGGG 
AGATTGGGAC AGAGCCTCCT TCTGTTTCCC 
CTGTTTGTGG TTGTTGCAAA AACAATATTG 
CACTGTAAAC TCTTCTGGGG CAGTTAGTGA 
ATGAGTACCA GCATCATGCT TCTCTGATTC 
CTTCTGAGTG CTAAGACTTT CATGAAAGAG 
AAAGAGGAAT AATTTTTCTC TACATTTCAA 
GGAAATGCAG GGCATAAGAC ATAAATTAAT 
TATTCTACTT TATGAGACAG CAAATAAGGC 
AAGTTATATT TACCTTCTAC ATAGAAGATT 
TTGAAAGCTG AAAACTAGTG AATTTTCATT 
ATTACATGGA CCTCAGGATT CTTGAAGATG 
ATTTCCTCAT TCCTGAAGCT TATCATTTAG 
TCTAGATCTA AAAATTCTGA TGTCACACAT 
TGGATTATTC AGTCTCCTGA GCATATTTTA 
AAGTACTGAA AACTTGTTTT TTGCAATTTT 
ATGGTTCCAT TATGGCTGCG TTGGATTGAC 
GGTACTGTCC ACAGTGCACT GCTGCAATGA 
AAATAAAGGT GGTCCTTTTG TTTGATGAAG 
TTTTATATAG GACTTTAAAA AGAAGAGAAG 
TTCCAGGCAA CCACTTAAAG GATTTACATA 
AACTTGAATT TTATGGGTTG TATTTTAATA 
CACTCCTGGT GTGCTATGAA TATTATTCCA 
GTGGCCAACA TATGCAGACA TTTGTACTCC 
ATGGGCATTC TATGATTTAG ACTTCAAGGA 
AAGGAAAGTA TTTTATATTC AACAGGTATA 
TCCAGAGCTG TTATGTAACA CTGTATATAA 
AAGTCAGTGC TTCTAAAAAG AATTTAAGAT 
TATAATAAGC TTTGTTTCTT TGTGAAACTA 
TGGTTCATGT GATAATGTGG GCTGGTATCC 
AAACAGAAAC TCCTGTAGGT AAAAAGTAAT 
TGTTTCTGCA TCCAGATAGA GTGCAGTTCA 
TGAAGGGGAA AGGGCGTTAA AGTGATACAT 
TTTTTTTGTG CAAGTAATCC TTAAAATTGC 
TAAAGTTTTT AAAAAATTAA AAAAAAAAAA 



BLAST Results 



Entry HSG20547 from database EMBL : 
HSG20547I human STS A005W09. 
Length = 154 
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Minus Strand HSPs: 

Score = 770 (115.5 bits), Expect = 2.9e-26, P = 2.9e-26 
Identities = 154/154 (100%) 



Medline entries 



98101645: 

The candidate tumour suppressor p33INGl cooperates with p53 in cell 
growth control. 



Peptide information for frame 1 



ORF from 112 bp to 1245 bp; peptide length: 378 
Category: similarity to known protein 



1 MLYLEDYLEM IEQLPMDLRD RFTEMREMDL QVQNAMDQLE QRVSEFFMNA 
51 KKNKPEWREE QMASIKKDYY KALEDADEKV QLANQIYDLV DRHLRKLDQE 
101 LAKFKMELEA DNAGITEILE RRSLELDTPS QPVNNHHAHS HTPVEKRKYN 
151 PTSHHTTTDH IPEKKFKSEA LLSTLTSDAS KENTLGCRNN NSTASSNNAY 
201 NVNSSQPLGS YNIGSLSSGT GAGAITMAAA QAVQATAQMK EGRRTSSLKA 
251 S YEAFKNNDF QLGKEFSMAR ETVGYSSSSA LMTTLTQNAS SSAADSRSGR 
301 KSKNNNKSSS QQSSSSSSSS SLSSCSSSST VVQEISQQTT VVPESDSNSQ 
351 VDWTYDPNEP RYCICNQVKV CYIYKSII 

BLASTP hits 

Entry AF044076_1 from database TREMBL: 

"ING1"; product: "candidate tumor suppressor p33lNGl"; Homo 
sapiens candidate tumor suppressor p3 3INGl (ING1) x.RNA, complete 
cds. Homo sapiens (human) 
Length = 279 

Score = 162 (57.0 bits), Expect = l.le-09, P = l.le-09 
Identities = 48/183 (26%), Positives = 92/183 (50%) 

Entry AC004537_1 from database TREMBL: 

gene: "WUGSC : H_DJ0872F07 . 1" ; Homo sapiens PAC clone DJ0872F07 from 
7q31, complete sequence. 

Score = 1814, P - 3.7e-187, identities = 358/358, positives = 358/358 
Entry CEY51H1A_1 from database TREMBL: 

gene: "Y51H1A.4"; Caenorhabditis elegans cosmid Y51H1A 

Score = 213, P = 3.7e-15, identities = 37/123, positives = 82/123 



Alert BLASTP hits for DKFZphutel_18cl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18cl2, frame 1 



Report for DKFZphutel_18cl2 . 1 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

from 7q31, 

[ FUNCAT ] 

[ FUNCAT ] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



378 

42275.72 
5.72 

TREMBL:AC004537_1 gene: "WUGSC : H_DJ0872F07 . 1 " ; Homo sapiens PAC clone DJ0872F07 
complete sequence, le-157 

99 unclassified proteins [S. cerevisiae, YHR090c] 8e-05 

04.05.01.04 transcriptional control [S. cerevisiae, YNL097c] 2e-04 

MYRISTYL 3 

AMIDATION 2 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 4 

PROKAR_LIPOPROTEIN 1 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 3 

ASN_GLYCOSYLATION 5 

All_Alpha 

LOW COMPLEXITY 20.63 % 
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[KW] COILED_COIL 7.94 % 



SEQ MLYLEDYLEMIEQLPMDLRDRFTEMREMDLQVQNAMDQLEQRVSEFFMNAKKNKPEWREE 

SEG 

PRD ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COILS 

SEQ QMASIKKDYYKALEDADEKVQLANQIYDLVDRHLRKLDQELAKFKMELEADNAGITEILE 

SEG 

PRD hhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ RRSLELDTPSQPVNNHHAHSHTPVEKRKYNPTSHHTTTDHIPEKKFKSEALLSTLTSDAS 

SEG 

PRD hhccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhcccc 

COILS 

SEQ KENTLGCRNNNSTASSNNAYNVNSSQPLGSYNIGSLSSGTGAGAITMAAAQAVQATAQMK 

SEG xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . . 

PRD cccccccccccccccccccccccccccccccccccccccc cchhhhhhhhhhhhhhhhhh 

COILS 

SEQ EGRRTSSLKASYEAFKNNDFQLGKEFSMARETVGYSSSSALMTTLTQNASSSAADSRSGR 

SEG xxxxxxxxxxxx 

PRD hccccccccchhhhhhccccccccccccccccccccccceeeeecccccccccccccccc 

COILS 

SEQ KSKNNNKSSSQQSSSSSSSSSLSSCSSSSTVVQEISQQTTVVPESDSNSQVDWTYDPNEP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccceeecccccccccccccccccccccccccceeeecccccc 

COILS 

SEQ RYCICNQVKVCYIYKSII 

SEG 

PRD eeeeceeeeeeeeeeccc 

COILS 



Prosite for DKFZphutel_18cl2 . 1 



PS00001 


190- 


•>194 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


191- 


•>195 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


203- 


•>207 


ASN_GLYCOSYLATION 


PDOC00001 


PS00001 


288- 


■>292 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


306- 


■>310 


ASN_GLYCOSYLATION 


PDOC00001 


PS00002 


218- 


>222 


GLYCOS AMINOGLYCAN 


PDOC00002 


PS00004 


243- 


■>247 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


64 


: ->67 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


247- 


•>250 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


298- 


■>301 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


142- 


■>146 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


156- 


>160 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


349- 


>353 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00008 


186- 


>192 


MYRISTYL 


PDOC00008 


PS00008 


214- 


>220 


MYRISTYL 


PDocooooe 


PS00008 


219- 


>225 


MYRISTYL 


PDOC00008 


PS00009 


241- 


>245 


AMI DAT I ON 


PDOC00009 


PS00009 


298- 


>302 


AMI DAT I ON 


PDOC00009 


PS00013 


315- 


>326 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphutel_18cl2 . 1) 
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DKFZphutel_18il9 



group: transcription factors 

DKFZphutel_18il9 encodes a novel 759 amino acid protein with similarity to the SREBP-2 mutant 
sterol regulatory element binding protein-2 of Cricetulus griseus . 

The SREBP-2 protein is embedded in the membranes of the nucleus and endoplasmic reticulum. In 
cholesterol -depleted cells the proteins are cleaved to release soluble NH2-terminal fragments 
that enter the nucleus and activate genes encoding the low density lipoprotein receptor and 
enzymes of cholesterol synthesis. The new protein is a putative transcription factor capable 
of protein-protein interaction via a lim domain and additionally shows similarity to the 
common sunflower transcription factor SF3. 

The new protein can find application in modulating/blocking the expression of genes involved 
in lipid metabolism. 



similarity to transcription factor SF3 

complete cDNA, complete cds, EST hits 

strong similarity to mutated SREBP-2 of hamster, 

similarity is not to SREP-2 part of protein but to the unknown part of 
the fusion protein 

Sequenced by AGOWA 

Locus: /map=12 

Insert length: 3664 bp 

Poly A stretch at pos . 3647, polyadenylation signal at pos . 3636 



1 GCGCTAGGTA GAGCGCCGGG 
51 GAAAGGCGGC TTTTAGCCAG 
101 CATCTCCATT TAATAGACGG 
151 GCCAAAGAAC TTTCTCTTGT 
201 AATATTCTCC AAGTACCAGA 
251 AGAGAAGTAA CACCGAAAAT 
301 ACTGTGTTAA AGAAGAAGTG 
351 CACAGACTCT CTACGGAACA 
4 01 ATCCTCCTGC TGAAGTGACA 
451 CAAGAAGAAC AAATCCACCC 
501 CCTCGTTCAG GGTCGATATC 
551 ACCACTCAAC AGAAAGTAAA 
601 CATGAAGTAG AAAAATCAGA 
651 AATAGAGAAA TATAATGTTC 
701 AAGGTGAACC AACTCAAACT 
7 51 AGTGGAAGGA AGATCTCTGA 
801 AGGCCCAGGT CAGTTGTCAT 
851 GTAGACGAAA TCTGGAACTT 
901 CGAATGGCCA AGTACCAGGC 
951 CTATACAAAT GAGCTGAAAG 
1001 TGGAGCAAAA GGAGAATGTG 
1051 CAGGAAGGGG AAAAGATTTC 
1101 CACCCCTGCC GAAGATGACT 
1151 AACAGCCTGT CCATCCCAAG 
1201 CTTTCTGAAA GTTCTCCTCC 
1251 AAGAGAGACC TGCGTGGAAT 
1301 TCTTGGCCAA CCAGCAGGTG 
1351 TGCAACAACA AACTCAGTCT 
1401 CTATTGTAAG CCTCACTTCA 
1451 ATGAAGGCTT TGGGCACAGA 
1501 GAAAACGAAG AGATTTTGGA 
1551 GACCCCTCAC AGCCCAGGGG 
1601 TCCTGGCTGC AAGTATGGAA 
1651 GACAAGCCAG CTGAAACCAA 
1701 TGAACTTGGA AGTTCAGGAA 
1751 AGCCCAAATG GCCTCCTGAA 
1801 GATGTCGATC TAGATCTGAA 
1851 AAGAAGCCGC CCATTCACTG 
1901 AGAGCCCAAA AACTGTGTCC 
1951 GAGCAGAGTG AAGAGTCTGT 
2001 GGAAAATGCC AAGGCTTCTA 
2 051 GGCAAAACAA AGAATCTAAA 
2101 CATAGTTTGG AGATGGAGAA 
2151 CGATGAAGAT GATAACAGCT 
2201 AGTCTCTGAA TTGGTCGAGT 
2251 ACTACTCAGA ATCAGAAATC 



ACCTGTGACA GGGCTGGTAG CAGCGCAGAG 
GTATTTCAGT GTCTGTAGAC AAGATGGAAT 
CAATGGACCT CACTATCATT GAGGGTAACA 
CAACAAGAAC AAGTCATCGG CTATTGTGGA 
AAGCAGCTGA AGAAACAAAC ATGGAGAAGA 
CTCTCCCAGC ACTTTAGAAA GGGGACCCTG 
GGAGAACCCA GGGCTGGGAG CAGAGTCTCA 
GCAGCACTGA GATTAGGCAC AGAGCAGACC 
AGCCACGCTG CTTCTGGAGC CAAAGCTGAC 
CAGATCTAGA CTCAGGTCAC CTCCTGAAGC 
CCCACATCAA GGACGGTGAG GATCTTAAAG 
AAAATGGAAA ATTGTCTAGG AGAATCCAGG 
AATCAGTGAA AACACAGATG CTTCGGGCAA 
CGCTGAACAG GCTTAAGATG ATGTTTGAGA 
AAGATTCTCC GGGCCCAAAG CCGAAGTGCA 
AAACAGCTAT TCTCTAGATG ACCTGGAAAT 
CTTCTACATT TGACTCGGAG AAAAATGAGA 
CCACGCCTCT CAGAAACCTC TATAAAGGAT 
AGCTGTGTCC AAACAAAGCA GCTCAACCAA 
CCAGTGGTGG CGAAATCAAA ATTCATAAAA 
CCCCCAGGTC CTGAGGTCTG CATCACCCAT 
TGCAAATGAG AATAGCCTGG CAGTCCGTTC 
CCCGTGACTC CCAGGTTAAG AGTGAGGTTC 
CCACTAAGTC CAGATTCCAG AGCCTCCAGT 
CAAAGCAATG AAGAAGTTTC AGGCACCTGC 
GTCAGAAGAC AGTCTATCCA ATGGAGCGTC 
TTTCACATCA GCTGCTTCCG TTGCTCCTAT 
AGGAACATAT GCATCTTTAC ATGGAAGAAT 
ATCAACTCTT TAAATCTAAG GGCAACTATG 
CCACACAAGG ATCTATGGGC AAGCAAAAAT 
GAGACCAGCC CAGCTTGCAA ATGCAAGGGA 
TAGAAGATGC CCCTATTGCT AAGGTGGGTG 
GCCAAGGCCT CCTCTCAGCA GGAGAAGGAA 
GAAGCTGAGG ATCGCCTGGC CACCCCCCAC 
GTGCCTTGGA GGAAGGGATC AAAATGTCAA 
GACGAAATCA GCAAGCCCGA AGTTCCTGAG 
GAAGCTAAGA CGATCTTCTT CACTGAAGGA 
TAGCAGCTTC ATTTCAAAGC ACCTCTGTCA 
CCACCTATCA GGAAAGGCTG GAGCATGTCA 
GGGTGGAAGA GTTGCAGAAA GGAAACAAGT 
AGAAGAATGG GAATGTGGGA AAAACAACCT 
GGAGAGACAG GGAAGAGAAG TAAGGAAGGT 
TGAGAATCTT GTAGAAAATG GTGCAGACTC 
TCCTCAAACA ACAATCTCCA CAAGAACCCA 
TTTGTAGACA ACACCTTTGC TGAAGAATTC 
CCAGGATGTG GAACTCTGGG AGGGAGAAGT 
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2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 



GGTCAAAGAG 
ATGAGGATGA 
TTCATGTTAG 
TAAGCAGGTA 
AAAGAATTCC 
CATTCTAAAT 
ATGATATGCG 
GATAATAGCC 
TAGATGATTA 
ACAGAAGGAA 
AAAGGGCACA 
TATTTACCTG 
TCTTGCTGTG 
AACTACATCC 
TTGAGGCAAT 
GCTGTCTCCC 
AAATGATTGC 
AAGCTGCAAT 
GTGTTAGAGC 
TACACCACTT 
CCCTTTTTGA 
GATTTTTATC 
TTTCTTGGGA 
AGTACATTTG 
AGAGGTGTCT 
TAGCTTTAAT 
CCTAGTCTGA 
AAAAAAAAAA 



CTCTCTGTGG 
GGATGAAGAG 
TGTTAGCGAG 
TCCCAGCATG 
TTCTTAAAAT 
ACTAGAGATA 
TAAGTGCTGT 
CAGATTCTAC 
GTAGTATATT 
TTTAGGGGCT 
GTTTGTATAT 
TTAAGAGATT 
ATATATATGA 
TGAACTCGAC 
TGAAAAACCA 
AAATAAGCTT 
TTTCTTTTCT 
ATTTTAGTAA 
AAAGTGAAGA 
GAGCTCAGAC 
GACACTAATT 
ACAGTATTCT 
TGATTTTCTA 
TTGTACACAG 
TAAGCTGTAG 
ATTTTTTAGA 
AACATTTTTA 
AAAA 



AAGAACAGAT 
TGACAAATTG 
CCACTGCCCT 
AAATGTAATT 
CAAAAACAAA 
ACTTTACTTA 
AAGGCTTGTA 
TGTATTCCCA 
GTTACACACT 
TAAACATTAC 
TTTTAAATGA 
ATTTAGTCTT 
GGAATTTACT 
GTCCTGAGGT 
ACCTACACTC 
TTGTATCTGC 
GGTGATATCT 
TACCTTCGGG 
GTTTAAAGGA 
CTCTAAACCC 
TTTAAATACT 
CAGGGTGAAA 
GTCTTAAGGT 
TTGATATTCC 
GCTTTTCTTT 
GATGTAAAAC 
TTCAATAAAG 



AAAGAGAAAT 
CAATGATGCT 
TTGTCAAAAT 
TACTTGGAAG 
ACAAAAAAAC 
AATTCTTCAT 
ACTGGGGAAA 
AAAGGCAATA 
ATTTTGGAAT 
GACTGAATGC 
ATACCAATTT 
TAAATTTTTT 
ACTTTATGTC 
ATAATACAAC 
TTCGGTGCTT 
CAGTGAATTT 
GTGCTTCTCA 
ATCACTGTCC 
GGAAGAAGAA 
TGTATTTCCC 
TACTAGCTCT 
TTAAACCAAC 
TTGGGGACAT 
AAATTGTATG 
GTACTGCATT 
ATTCTGCTTT 
ATTTTAATTA 



CGGTATTATG 
GGGCCTTAAA 
GTGATGCACA 
TAACTTTGGA 
ACAAAAAACA 
TTTAGCAGTG 
TATTCCACCT 
TTAAGGTAGA 
TAGAGAACAT 
ACTTTAGTAT 
AATTTTTTAG 
AGGTTAATTT 
CTGCTCTCTA 
AGAGCACTTT 
AGAGAGATCT 
ACTGTACTCC 
TAATTACTGA 
CCCATCTTCC 
AGAACTGTCT 
TTATGATGTC 
GAAATATATT 
TATAGGCCTT 
TATAAACTTG 
GATGGGAGGG 
TATAGAGATT 
CTTAGTCTTA 
AAATTTGAAA 



BLAST Results 



Entry HS512217 from database EMBL: 
human STS SHGC-14654. 
Length = 250 
Minus Strand HSPs: 

Score = 1202 (180.3 bits), Expect = 1.8e-46, P = 1.8e-46 
Identities - 242/244 (99%) 



Medline entries 



95263566: 

Three different rearrangements in a single intron truncate 
sterol regulatory element binding protein-2 and produce 
sterol-resistant phenotype in three cell lines. Role of introns 
in protein evolution. 

93258417: 

Characterization of a pollen-specific cDNA from sunflower 
encoding a zinc finger protein. 



Peptide information for frame 1 



ORF from 94 bp to 2370 bp; peptide length: 759 
Category: similarity to known protein 



1 MESSPFNRRQ WTSLSLRVTA KELSLVNKNK SSAIVEIFSK YQKAAEETNM 

51 EKKRSNTENL SQHFRKGTLT VLKKKWENPG LGAESHTDSL RNSSTEIRHR 

101 ADHPPAEVTS HAASGAKADQ EEQIHPRSRL RSPPEALVQG RYPHIKDGED 

151 LKDHSTESKK MENCLGESRH EVEKSEI SEN TDASGKIEKY NVPLNRLKMM 

201 FEKGEPTQTK ILRAQSRSAS GRKISENSYS LDDLEIGPGQ LSSSTFDSEK 

251 NESRRNLELP RLSETSIKDR MAKYQAAVSK QSSSTNYTNE LKASGGEIKI 

301 HKMEQKENVP PGPEVCITHQ EGEKISANEN SLAVRSTPAE DDSRDSQVKS 

351 EVQQPVHPKP LSPDSRASSL SESSPPKAMK KFQAPARETC VECQKTVYPM 

401 ERLLANQQVF HISCFRCSYC NNKLSLGTYA SLHGRIYCKP HFNQLFKSKG 

4 51 NYDEGFGHRP HKDLWASKNE NEEILERPAQ LANARETPHS PGVEDAPIAK 

501 VGVLAASMEA KASSQQEKED KPAETKKLRI AWPPPTELGS SGSALEEGIK 

551 MSKPKWPPED EISKPEVPED VDLDLKKLRR SSSLKERSRP FTVAASFQST 

601 SVKSPKTVSP PIRKGWSMSE QSEESVGGRV AERKQVENAK ASKKNGNVGK 

651 TTWQNKESKG ETGKRSKEGH SLEMENENLV ENGADSDEDD NSFLKQQSPQ 

701 EPKSLNWSSF VDNTFAEEFT TQNQKSQDVE LWEGEVVKEL SVEEQIKRNR 
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751 YYDEDEDEE 

BLAST P hits 
Entry CG22818_1 from database TREMBL: 

"SREBP-2"; product: "mutant sterol regulatory element binding 
protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory 
element binding protein-2 (SREBP-2) mRNA, complete cds . Cricetulus 
griseus (Chinese hamster) 
Length = 839 

Score = 1502 (528.7 bits), Expect = 3.9e-154, P = 3.9e-154 
Identities = 290/380 (76%), Positives = 322/380 (84%) 

Entry S28507 from database PIR: 
transcription factor SF3 - common sunflower 
Length = 219 

Score = 212 (74.6 bits), Expect = 6.3e-18, Sum P(2) = 6.3e-18 
Identities = 36/82 (43%), Positives - 55/82 (67%) 

Entry NTLIMDOM_l from database TREMBL: 

"SF3"; product: "LIM-domain SF3 protein"; N.tabacum mRNA for 
LIM-domain protein Nicotiana tabacum (common tobacco) 
Length = 189 

Score = 216 (76.0 bits), Expect = 1.0e-16, P = 1.0e-16 
Identities = 42/94 (44%), Positives = 57/94 (60%) 



Alert BLASTP hits for DKFZphutel_18il9, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18il9, frame 1 



Report for DKFZphutel_18il9 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 



759 

85225.57 
6.41 

TREMBL :CG22818_1 gene: "SREBP-2"; product: "mutant sterol regulatory element 
; Cricetulus griseus SRD-2 mutant sterol regulatory element binding protein- 
complete cds. le-151 

unclassified proteins [S. cerevisiae, YLR257w] 3e-05 

04 translation (initiation, elongation and termination) [S. cerevisiae, 
mRNA cap-binding protein] le-04 

[S. cerevisiae, YGR162w TIF4631 - mRNA 



.03 



organization of cytoplasm 
le-04 



binding protein-2 

2 ( SREBP-2 ) mRNA, 

[FUNCAT] 99 

[FUNCAT] 05 
YGR162W TIF4631 - 

[FUNCAT] 30 

cap-binding protein] 

[BLOCKS] BL00478B 

[PIRKW] zinc finger 9e-16 

[PIRKW] DNA binding 9e-16 

[SUPFAM] LIM metal-binding repeat homology 9e-16 

[PROSITE] MYRISTYL 6 

[PROSITE] LIM_DOMAIN_l 1 

[ PROSITE] AMIDATION 2 

[PROSITE] CAMP_PHOSPH0_SITE 4 

[PROSITE] CK2_PHOSPHO_SITE 28 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 15 

[PROSITE] ASN_GLYCOSYLATION 6 

[PFAM] LIM domain containing proteins 

[KW] Irregular 

[KW] 3D 

[KW] L0W_COMPLEXITY 5 . 53 % 



SEQ MESSPFNRRQWTSLSLRVTAKELSLVNKNKSSAIVEIFSKYQKAAEETNMEKKRSNTENL 

SEG 

lctl- 

SEQ SQHFRKGTLTVLKKKWENPGLGAESHTDSLRNSSTEIRHRADHPPAEVTSHAASGAKADQ 

SEG 

lctl- 

SEQ EEQIHPRSRLRSPPEALVQGRYPHIKDGEDLKDHSTESKKMENCLGESRHEVEKSEISEN 

SEG 

lctl- 

SEQ TDASGKIEKYNVPLNRLKMMFEKGEPTQTKILRAQSRSASGRKISENSYSLDDLEIGPGQ 

SEG 



451 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Ictl- 

SEQ LSSSTFDSEKNESRRNLELPRLSETSIKDRMAKYQAAVSKQSSSTNYTNELKASGGEIKI 

SEG 

Ictl- 

SEQ HKMEQKENVPPGPEVCITHQEGEKISANENSLAVRSTPAEDDSRDSQVKSEVQQPVHPKP 

SEG x 

Ictl- 



SEQ LSPDSRASSLSESSPPKAMKKFQAPARETCVECQKTVYPMERLLANQQVFHISCFRCSYC 

SEG xxxxxxxxxxxxxxxx 

Ictl- ETTTTEEETTTCEEEETTEEEETTTTBTTTT 

SEQ NNKLSLGTYASLHGRIYCKPHFNQLFKSKGNYDEGFGHRPHKDLWASKNENEEILERPAQ 

SEG 

Ictl- TCBCBTTBEEEETTEEEETTTTTTTTTTCCTTTTTTTCTTT 

SEQ LANARETPHSPGVEDAPI AKVGVLAASMEAKASSQQEKEDKPAETKKLRI AWPPPTELGS 

SEG 

Ictl- 

SEQ SGSALEEGIKMSKPKWPPEDETSKPEVPEDVDLDLKKLRRS SSLKERSRPFTVAASFQST 

SEG xxxxxxxxxxxxxxxxxx 

Ictl- 

SEQ SVKSPKTVSPPIRKGWSMSEQSEESVGGRVAERKQVENAKASKKNGNVGKTTWQNKESKG 

SEG 

Ictl- 

SEQ ETGKRSKEGHSLEMENENLVENGADSDEDDNSFLKQQSPQEPKSLNWSSFVDNTFAEEFT 

SEG 

Ictl- 

SEQ TQNQKSQDVELWEGEVVKELSVEEQIKRNRYYDEDEDEE 

SEG xxxxxxx 

Ictl- 



Prosite for DKFZphutel_18il9 . 1 



PS00001 


29->33 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


59->63 


ASN" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


92->96 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


251->255 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00001 


286->290 


ASN" 


GLYCOSYLATION 


PDOC00001 


PS00001 


706->710 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00004 


52->56 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


S5->69 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


222->226 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


579->583 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


15->18 


PKC 


PHOSPHO SITE 


PDOC0O0Q5 


PS00005 


19->22 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


89->92 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


153->161 


PKC~ 


PHOSPHO SITE 


PDOC00005 


PS00005 


184->187 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


220->223 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


248->251 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


253->256 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


266->269 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


525->528 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


583->586 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS000Q5 


601->604 


PKC~ 


"PHOSPHO SITE 


PDOCO0005 


PS00005 


604->607 


PKC" 


PHOSPHO_SITE 


PDOC00005 


PS00005 


642->645 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


662->665 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


19->23 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


48->52 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


55->59 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


85->89 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


93->97 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


132->13S 


CK2~ 


"PHOSPHO_SITE 


PDOC00006 


PS00006 


168->172 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


230->234 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


244->248 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


266->270 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


294->298 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


318->322 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


326->330 


CK2 


"PHOSPHO SITE 


PDOC00006 


PSO0006 


337->341 


CK2 


PHOSPHO SITE 


PDOC00006 
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PS00006 


369 


->373 


CK2 PHOSPHO 


SITE 


PDOC00006 


t J V u \J u u 


389 


->3 93 


rv7~pur)^ pho 


"site 


PDOC00006 


PS00006 


467- 


->47 1 




"site 


PDOC00006 


PS00006 


514- 


->518 


CV7~ PHOSPHO"" 


"site 


PDOC00006 


P300006 


543 


->547 


CK^^PHOSPHO 


"site 


PDOC00006 


PS0OOO 6 


563* 


->5 67 


CKP^PHOSPHO"" 


"site 


PDOC00006 


PS00006 


583 


->587 


CK? PHOSPHO 


site 


PDOC00006 


PS00006 


617 


->621 


CK2 PHOSPHO 


site 


PDOC00006 


P300006 


658- 


->662 


CK2 _ PHOSPHO" 


"site 


PDOC00006 


PS00006 


686- 


->690 


CK2~~PHOS PHO~ 


site 


PDOC0000 6 


C *J \J V \J \J \J 


698 


->702 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


709- 


->713 


CK2~~ phospho" 


"site 


PDOC00006 


PS00006 


714 


->718 


CK2 _ PHOSPHO*" 


site 


PDOC00006 


PS00006 


741- 


->745 


CK2 PHOSPHO 


site 


PDOC00006 


PS00007 


223 


->230 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


222- 


->230 


TYR PHOSPHO 


site 


PDOC00007 


PS00008 


239 


->245 


myrTstyl 




PDOC00008 


PS00008 


427 


->433 


MYRISTYL 




PDOC00008 


PS00008 


502- 


->508 


MYRISTYL 




PDOC00008 


PS00008 


539- 


->545 


MYRISTYL 




PDOC00008 


PS00008 


548- 


->554 


MYRISTYL 




PDOC00008 


PS00008 


627- 


->633 


MYRISTYL 




PDOC00008 


PS00009 


220- 


->224 


AMI DAT ION 




PDOC00009 


PS00009 


662- 


->666 


AMIDATION 




PDOC00009 


PS0O478 


390- 


->425 


LIM DOMAIN 1 




PDOC00382 



Pfam for DKFZphutel_18il9. 1 



HMM_NAME LIM domain containing proteins 

HMM *CagCNrpIyDREivMRAMNKvWHpECFrCcriCqqPLtegdeFYErDGrI 
C C++++Y+ E++ A+ V+H++CFRC+ C+ L+ G+ + ++ GRI 
Query 390 CVECQKTVYPMERLL-ANQQVFHISCFRCSYCNNKLSLGT-YASLHGRI 436 

HMM YCKhDYYrrFg* 

YCK+++ ++F+ 
Query 437 YCKPHFNQLFK 447 
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DKFZphutel_18i4 
group: uterus derived 

DKFZphutel_18i4 encodes a novel 220 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

weak similarity to C.elegans D2085.2 
complete cDNA, complete cds, few EST hits 
Sequenced by AGOWA 
Locus: /map="7q31" 
Insert length: 1568 bp 

Poly A stretch at pos. 1551, polyadenylation signal at pos . 1523 

1 GCCGAGCGGA GAGGGTAGAG ACGGGGTTTC ACCGTGTTAG CCAAGATGGT 
51 CTCGATCTCC TGACCTCGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTG 
101 GGATTACAGG CGTGAGCCAC TGCGCCCGGC CTGTTGTACA GTTATTAAAG 
151 TTATCATTTA ACATGGAAGA AGATGAGTTC ATTGGAGAAA AAACATTCCA 
201 ACGTTATTGT GCAGAATTCA TTAAACATTC ACAACAGATA GGTGATAGTT 

2 51 GGGAATGGAG ACCATCAAAG GACTGTTCTG ATGGCTACAT GTGCAAAATA 
301 CACTTTCAAA TTAAGAATGG GTCTGTGATG TCACATCTAG GAGCATCTAC 

3 51 CCATGGACAG ACATGTCTTC CCATGGAGGA GGCTTTCGAG CTACCCTTGG 

4 01 ATGATTGTGA AGTGATTGAA ACTGCAGCAG CGTCCGAAGT GATTAAATAT 
4 51 GAGTATCATG TCTTATATTC CTGTAGCTAC CAAGTGCCTG TACTTTACTT 
501 TAGGGCAAGC TTTTTAGATG GGAGACCTTT AACTCTGAAG GACATATGGG 
551 AAGGAGTTCA TGAGTGCTAT AAGATGCGAC TGCTACAGGG ACCATGGGAC 
601 ACTATTACGC AACAGGAACA TCCAATACTT GGGCAACCCT TTTTTGTACT 
651 TCATCCCTGC AAGACGAATG AATTCATGAC TCCTGTATTA AAGAATTCTC 
701 AGAAAATCAA TAAGAATGTC AACTATATCA CATCATGGCT GAGCATTGTA 
751 GGGCCAGTTG TTGGGCTGAA TCTACCTCTG AGTTATGCCA AAGCAACGTC 
801 TCAGGATGAA CGAAATGTCC CTTAACAAGA TTCTTCTATT GAGTTTAGGA 
851 ATTGCGGCAC GAAGAATGCC AAGAGTTTAC CTGGCCAGCC CTGGCTTTAA 
901 TAGGACTGAT ACCATGGAAT ATTTCATCTC ACCAAGATGT GACATGGATT 
951 ATTTTTCCCT TGGACACAAA TGTCTACAGC AACTGATGTT TGATAGGCTG 

1001 AATGTTTAGA AGAAACACTT CAAAGGGATA CATCATGGCC AGGCATGGTG 
1051 GCTCACACCT GTAATCCAAG CACTTTGGGA GGCCAAGGTG GGAGCATCAC 
1101 TTGATCCTGG GAGTTCGAGA CCAGCCTGGG CAACAIGGTG AAACCCTGTC 
1151 GGTACAAAAA AATACAAAAA TTTGCCTGTT TATGGTGGTG TGTTCCTGTA 
1201 GTCCCAGCTC CCCAGGAGGC TGAGGTGGGA GGTTGGCTTT AACCCAGGAG 
1251 GCAGAGGTTG CAGTGAGCTG AGACTGTGCC ACTGCAGTCC AGCCTGGGTG 
1301 ACAGAGCCAG ACACTGTCTC GGGAAAAAAA AAAAAAAAAA AAAGACACAT 
1351 CACTATAAAT AGCAAAAAAA CAAATCTAAC TTATTAATAC TAGGAATACC 
14 01 AACATTATTA GGGCACTTGC AGGTTATTCT TTTCTAGGCC AAGTACTTCA 
14 51 CTTCCATTTG TCTGACATGG AGATTGAGGG AGAAATGTAT TTGTGTGTTC 
1501 ATTTTAATGT AAGATATATA AAAATTAAAT TACTGGATTT ACCTGTCCCT 
1551 GAAAAAAAAA AAAAAAAA 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 163 bp to 822 bp; peptide length: 220 
Category: similarity to unknown protein 
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1 MEEDEFIGEK TFQRYCAEFI KHSQQIGDSW EWRPSKDCSD GYMCKIHFQI 

51 KNGSVMSHLG ASTHGQTCLP MEEAFELPLD DCEVIETAAA SEVIKYEYHV 

101 LYSCSYQVPV LYFRASFLDG RPLTLKDIWE GVHECYKMRL LQGPWDTITQ 

151 QEHPILGQPF FVLHPCKTNE FMTPVLKNSQ KINKNVNYIT SWLSIVGPVV 

201 GLNLPLSYAK ATSQDERNVP 



BLASTP hits 



Entry CED2085_2 from database TREMBL: 
"D2085.2"; Caenorhabditis elegans cosmid D2085 
Length = 173 

Score = 167 (58.8 bits), Expect = l.le-12, P = l.le-12 
Identities = 36/121 (29%), Positives = 64/121 (52%) 



Alert BLASTP hits for DKFZphutel_18i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18i4, frame 1 



Report for DKFZphutel_18i4 . 1 



[LENGTH] 220 

[MW] 25278.99 

[pi] 5.34 

[HOMOL] TREMBL:CED2035_2 gene: "D2085.2"; Caenorhabditis elegans cosmid D2085 2e-ll 

[BLOCKS] BL00221E 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] AS N_GLYCOS YLAT I ON 1 

[KW] Alpha_Beta 



SEQ MEEDEFIGEKTFQRYCAEFIKHSQQIGDSWEWRPSKDCSDGYMCKIHFQIKNGSVMSHLG 

PRD cccccccchhhhhhhhhhhhhhhhcccccccccccccccceeeeeeeeeeeccceeeeec 

SEQ ASTHGQTCLPMEEAFELPLDDCEVIETAAASEVIKYEYHVLYSCSYQVPVLYFRASFLDG 

PRD cccccccchhhhhhhhccccceeehhhhhchhhhhhhheeeeccccceeeeeeecccccc 

SEQ RPLTLKDIWEGVHECYKMRLLQGPWDTITQQEHPILGQPFFVLHPCKTNEFMTPVLKNSQ 

PRD cccccchhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccccccccccc 

SEQ KINKNVNYITSWLSIVGPVVGLNLPLSYAKATSQDERNVP 

PRD ccccccccccccceeeeccccccccceeeecccccccccc 



Prosite for DKFZphutel_18i4 . 1 



PS00001 


52->56 


ASN GLYCOS YLAT ION 


PDOC00001 


PS00005 


124->127 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


179->182 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


116->120 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


124-M28 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


149->153 


CK2 PHOSPHO" 


"site 


PDOCC0006 


PS00006 


212->216 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


53->59 


MYRISTYL 




PDOC00008 


PS00008 


131->137 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_18i4 . 1 ) 
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DKFZphutel_1811 



group: nucleic acid management 

DKFZphtes3_15j 18 encodes a novel 184 amino acid protein with similarity to S. cerevisiae 
putative ribosomal protein YHR148w. 

The novel protein is similar to several 40S ribosomal proteins and therefore seems to part 
the corresponding ribosome subunit. 

The new protein can find application in modulation of ribosome assembly, structure and 
function. 



strong similarity to S. cerevisiae YHR148w 
complete cDNA, complete cds, EST hits, 

potential start at Bp 45 matchs kozak consensus ANNatgG 
gene disruption of YHR148w is lethal! 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1076 bp 

Poly A stretch at pos . 1035, polyadenylation signal at pos. 1006 



1 GCGCGCTCTC AGCTTCGGGT CCTGCGGCTG CGGCTGCCGC CATCATGGTG 

51 CGGAAGCTTA AGTTCCACGA GCAGAAGCTG CTGAAGCAGG TGGACTTCCT 

101 GAACTGGGAG GTCACCGACC ACAACCTGCA CGAGCTGCGC GTGCTGCGGC 

151 GTTACCGGCT GCAGCGGCGG GAGGACTACA CGCGCTACAA CCAGCTGAGC 

201 CGTGCCGTGC GTGAGCTGGC GCGGCGCCTG CGCGACCTGC CCGAACGCGA 

251 CCAGTTCCGC GTGCGCGCTT CGGCCGCGCT GCTGGACAAG CTGTATGCTC 

301 TCGGCTTGGT GCCCACGCGC GGTTCGCTGG AGCTCTGCGA CTTCGTCACG 

351 GCCTCGTCCT TCTGCCGCCG CCGCCTCCCC ACCGTGCTCC TCAAGCTGCG 

401 CATGGCGCAG CACCTTCAGG CTGCCGTGGC CTTTGTGGAG CAAGGGCACG 

451 TACGCGTGGG CCCTGACGTG GTTACCGACC CCGCCTTCCT TGTCACGCGC 

501 AGCATGGAGG ACTTTGTCAC TTGGGTGGAC TCGTCCAAGA TCAAGCGGCA 

551 CGTGCTAGAG TACAATGAGG AGCGCGATGA CTTCGATCTG GAAGCCTAGC 

601 GGATCTCCCA CTTTGCATGG CTGTCTTTTA CAGATGGGAA AACTGAGGCC 

651 TGATGCTGGA GATTCTATGA GGGTGCTCTC CTCAAGGGTA TCAGACGGTC 

701 GTAGGTTCTT AAGAATTTGA TTCATCAGTG GCAGGCCATG CATAGAGCCA 

751 CGGGAGGTGC GTCCTTGTTT TCCAGGAAAT GTTCTTAGAA CTTGGACTAC 

801 TGATTATTAA TTGACTGTGC CTTGGGAAAC AGTGGGAAGT AACTTGGTGC 

851 AGCACTGGGG TATTGTTGGA CTGGTTCAAT TCGTTTAACT CGAATTCTTG 

901 CTCCTGGCCG TGGTTAAGCT GTGTACAGAT GATGGAGAGT TTGGCCTCAA 

951 GTTTTTATAA ACTGAGCGAG ACTAGTGTTC AGGATCTCCT CCCTTGTTTA 

1001 AATGTCAATA AATGCCCCAA CTGCTTTGTA AGCTCAAAAA AAAAAAAAAA 

1051 AAAAAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 45 bp to 596 bp; peptide length: 184 
Category: strong similarity to known protein 



1 MVRKLKFHEQ KLLKQVDFLN WEVTDHNLHE LRVLRRYRLQ RREDYTRYNQ 

51 LSRAVRELAR RLRDLPERDQ FRVRASAALL DKLYALGLVP TRGSLELCDF 

101 VTASSFCRRR LPTVLLKLRM AQHLQAAVAF VEQGHVRVGP DVVTDPAFLV 

151 TRSMEDFVTW VDSSKIKRHV LEYNEERDDF DLEA 

BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_1811, frame 3 
NO Alert BLASTP hits found 

Pedant information for DKFZphutel_1811, frame 3 

Report for DKFZphutel_1811 . 3 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

cerevisiae) 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PFAM] 

[KW] 

[KW] 



184 

21850.21 
9.54 

PIR:S33911 probable ribosomal protein VHR148w - yeast ( Saccharomyces 
4e-47 

05.01 ribosomal proteins [S. cerevisiae, YHR148w] 2e-48 

30.03 organization of cytoplasm [S. cerevisiae, YPL081w] 5e-07 

j mrna translation and ribosome biogenesis [M. jannaschii, MJ0190] 8e-05 

BL00632 

cytosol le-07 

ribosome le-07 

protein biosynthesis le-07 

rat ribosomal protein S9 le-07 

MYRISTYL 1 

CK2_PHOSPHO_SITE 2 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 1 

Ribosomal protein S4 

All_Alpha 

LOW COMPLEXITY 6.52 % 



SEQ MVRKLKFHEOKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYNOLSRAVRELAR 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLRDLPERDQFRVRASAALLDKLYALGLVPTRGSLELCDFVTASSFCRRRLPTVLLKLRM 

SEG 

PRD hhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ AQHLQAAVAFVEQGHVRVGPDVVTDPAFLVTRSMEDFVTWVDSSKIKRHVLEYNEERDDF 

SEG 

PRD hhhhhhhhhhhhhhhccccceeecccceeeeeccccceeeeeccchhhhhhhhhcccccc 

SEQ DLEA 

SEG 

PRD CCCC 



Prosite for DKFZphutel_1811 . 3 

PS00005 163-M66 PKC_PHOSPHO_SITE PDOC00005 

PS00006 153->157 CK2_PHOSPHO_SITE PDOC00006 

PS00006 159-M63 CK2_PHOSPHO_SITE PDOC00006 

PS00007 41->49 TYR_PHOSPHO_SITE PDOC00007 

PS00008 87->93 MYRISTYL PDOC00008 



Pfam for DKFZphutel_1811 . 3 



HMM_NAME Ribosomal protein S4 

HMM *MSR. YRGPRWKIIRRPGElPWLTnK tklmrkYC. . lRPgQHgWR 

M+R ++ +++K+++++++L W ++++R Y R+++ ++ 

Query 1 MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYN 49 

HMM qRktLs KIRRmSQYr I RLQEKQKLRFMYGNI tERQLRRYvRiaEdKRKl D 

Q + +R +++ + L+E + +R +++++L++++ +++ L 

Query 50 QLSR— AVRELARRLRDLPERDQFRVRASAALLDKLYALGLVP-TRGSLE 96 

HMM YsTGenLMQILEMRLDNIVFRMGMAPTIHHARQLINHRHIRVNdRIVNIP 
++ + ++++RL++++ ++ MA ++A+ +++++H+RV++ +V++P 
Query 97 LCDFVTASSFCRRRLPTVLLKLRMAQHLQAAVAFVEQGHVRVGPDVVTDP 146 

HMM SYiCRPNDilSIRDkqrMQsHlkWnieSPegrmRPNHLErNnkkYeGtIN 
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++++++ + +++++W++ S+ ++R+ + Y+ + 

Query 147 AFLVTRS M EDFVTWVDSSK 1 KRHVLEYNEERD 178 

HMM rllEReWiplklNElLVVEY* 
+++ + 

Query 179 DFDLE 183 
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DKFZphutel_19f 19 



group: transmembrane protein 

DKFZphutel_19f 19 encodes a novel 204 amino acid protein with similarity to murine p24 protein. 

Murine p24 is expressed only in brain where it is localized exclusively in neurons. It seems 
to be a neuron-specific membrane protein localised in intracellular organelles of highly 
differentiated neural cells and may play a role in the neural organelle transport system. As 
p24, the novel protein contains 2 transmembrane regions, but it contains not the sequence 
homologous to the microtubule-binding domain of microtubule-associated proteins present in 
p24. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the- expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to mouse P24 protein ; 
membrane regions: 2 

Summary DKFZphutel_19f 19 encodes a novel 204 amino acid protein, with 
similarity to mouse P24 protein. 



similarity to mouse P24 protein 

complete cDNA, complete cds, EST hits, 
2 TM-domains 



Sequenced by AGOWA 

Locus: Anap=14.8 cR from top of Chr20 linkage group 



Insert length: 2042 bp 

Poly A stretch at pos . 1958, polyadenylation signal at pos . 1940 



1 GCAGGCAGAG AGATGAGGAA ACTGAGACCC AGAAAGGTGG AAGCACTTGT 

51 CTAAGGTCAC GCCTCCAGGA AGCAGTGTGT CCACGACTCC AGTCCAAGTG 

101 GTCAGGCTCC AGAGCCCACA GTCCCAGGGG TCCATGATGC CGAGCTGCAA 

151 TCGTTCCTGC AGCTGCAGCC GCGGCCCCAG CGTGGAGGAT GGCAAGTGGT 

201 ATGGGGTCCG CTCCTACCTG CACCTCTTCT ATGAGGACTG TGCAGGCACT 

251 GCTCTCAGCG ACGACCCTGA GGGACCTCCG GTCCTGTGCC CCCGCCGGCC 

301 CTGGCCCTCA CTGTGTTGGA AGATCAGCCT GTCCTCGGGG ACCCTGCTTC 

351 TGCTGCTGGG TGTGGCGGCT CTGACCACTG GCTATGCAGT GCCCCCCAAG 

401 CTGGAGGGCA TCGGTGAGGG TGAGTTCCTG GTGTTGGATC AGCGGGCAGC 

451 CGACTACAAC CAGGCCCTGG GCACCTGTCG CCTGGCAGGC ACAGCGCTCT 

501 GTGTGGCAGC TGGAGTTCTG CTCGCCATCT GCCTCTTCTG GGCCATGATA 

551 GGCTGGCTGA GCCAGGACAC CAAGGCAGAG CCCTTGGACC CCCAAGCCGA 

601 CAGCCACGTG GAGGTCTTCG GGGATGAGCC AGAGCAGCAG TTGTCACCCA 

651 TTTTCCGCAA TGCCAGTGGC CAGTCATGGT TCTCGCCACC CGCCAGCCCC 

701 TTTGGGCAAT CTTCTGTGCA GACTATCCAG CCCAAGAGGG ACTCCTGAGC 

751 TGCCCACATG GCCTAAGATG TGGGTCCTGG ATCCTTCCCC CTTCTCACCA 

801 TAACCCCCTC TCAGTGTTTC CCCAACTTCT CCCTTTAGAG CCCAACTCCA 

851 GGTCAAATCT GGAGCTCAAA TCCCAGTGCT CCCTCCCCAG GAGTGGGGCC 

901 CCAACTCTTC CAAGATACCA GCATTCCTCA AGTCCTCCCA AAACTTCCTA 

951 CCCACACCCT CTTCCCAAGG CCCTCAGGGG CAGAAAACAT CTCCTTCAAC 

1001 CCGTCCCCAC TCCTTCCTCT GCATGACCTT GGGCAAACCC TTGCCCTTTC 

1051 AAGCCATCAG CTCCTGCCTC TCTGCCATGA GGGCTTTGGA TCAGATTCCT 

1101 CTTCTCGCCA GGATGAGGAC ACGCACTGCC CTCCATAGAC ACAGATGAAG 

1151 GGGTGGGGGT CATTCAGCTC GAATGGGTCC CAGATGCTCA CTTGGCCTTT 

1201 CCCTGCAGGA TGAGTGAAGA CGTTTGCCTC TCACAGTGTG TCTTCTACCT 

1251 GCATTTTGGC ATCAGAGCCC CCCAGCCCAC CCACCACAGG CAATTACTAG 

1301 CCCTAGTTGA TAGGTGAGGT GGGTGAAGAA GGCTGGAGGT GACATGTCCG 

1351 AGGTCACACA ACAAAGCAGC ATGCAGGAAC TAGAAACACA TCTTCAGCCT 

1401 CCTCCTGGGC CAGCTCTTGT GCTACAGGTG GGGCGGAGCC AGCCCCTCAC 

1451 CTTCCTGGTT CCCTGAGGGT CCTCAGGGTG GAGGACAGGT TTGGCCCAGA 

1501 AAGACTAGCC AGAGGCCTGA TGGTCCCAGG TGGCTCTGGA TATACTTTGG 

1551 ATATGGATTT AAATGGTCTC TAAGAGCCGG GGGTAGGGGG CAGGAAAAGT 

1601 GGGTTGTCTT TGCCCCTCAA AGTCCACCTA CCTAGAAACC AAGCCCACGG 

1651 TCTTGGCCGT GACCCTGATA ATAAATGGGC TCTCTCAGAG GCGCCAGCCC 

1701 CTCCCTCCCC AGCCGGAGGC GTCATCTCTC TTCTGTACCA CTAGAGGGAG 

1751 CTCTGATGCA GCTGGAGAGC AGCGCTCAAG GCTCTCGCCC CTCCCCTCCC 

1801 TAACCCTTAC CTTCAGTCTC CACCAGCCTG AAGGGCCTCC TAGGGGATCC 

1851 TCAGGCGGCC CCCACCAGGG CACACCCTAC TGTCCTTGTG CCTCACGCCC 

1901 CCTCCTCATC CTGCACCCCT TCCATCCCAC CTTCCCTTTC AATAAACAGC 

1951 TGGGATGGAA AAAAAAAAAA AGAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2001 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 
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BLAST Results 



Entry HS417348 from database EMBL: 
human STS WI-14697. 
Length = 290 
Minus Strand HSPs: 

Score = 1254 (188.2 bits), Expect = 3.0e-50, P = 3.0e-50 
Identities = 262/273 (95%) 



Medline entries 



97334404: 

A newly identified membrane protein localized exclusively in 
intracellular organelles of neurons. 



Peptide information for frame 2 



ORF from 134 bp to 745 bp; peptide length: 204 
Category: similarity to known protein 



1 MMPSCNRSCS CSRGPSVEDG KWYGVRSYLH LFYEDCAGTA LSDDPEGPPV 
51 LCPRRPWPSL CWKISLSSGT LLLLLGVAAL TTGYAVPPKL EGIGEGEFLV 
101 LDQRAADYNQ ALGTCRLAGT ALCVAAGVLL AICLFWAMIG WLSQDTKAEP 
151 LDPEADSHVE VFGDEPEQQL SPIFRNASGQ SWFSPPASPF GQSSUQTIQP 
201 KRDS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19f 19, frame 2 

TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds . , N = 1, Score = 295, P = 3.8e-26 



>TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds . 

Length - 196 

HSPs : 

Score = 295 (44.3 bits), Expect = 3.Se-26, P = 3.8e-25 
Identities = 58/139 (41%), Positives = 81/139 (58%) 

Query: 2 MPSCNRSCSCSRGPSVEDGKW YGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWP 58 

M SC+ +C R + +G + YGVRSYLH FYEDC + + + P R W 

Sbjct: 1 MTSCSNTCGSRRAQADTEGGYQQRYGVRSYLHQFYEDCTASIWEYEDDFQIQRSPNR-WS 59 

Query: 59 SLCWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLA 118 

S+ WK+ L SGT+ ++LG+ L G+ VPPK+E GE +F+V+D A YN AL TC+LA 
Sbjct: 60 SVFWKVGLISGTVFVILGLTVLAVGFLVPPKIEAFGEADFMVVDTHAVKYNGALDTCKLA 119 

Query: 119 GTALCVAAGVLLAICLFWAM 138 

G L G +A CL ++ 
Sbjct: 120 GAVLFCIGGTSMAGCLLMSV 139 



Pedant information for DKFZphutel_19f 19, frame 2 



Report for DKFZphutel_19f 19.2 



[LENGTH] 204 

[MW] 21983.07 

[pi] 4.69 

[HOMOL] TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, complete 
cds. 7e-19 

[PROSITE] MYRISTYL 4 
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[PROSITE] CAMP_PHOSP!IO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOS YLATION 2 

[KW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 10.29 % 



SEQ MMPSCNRSCSCSRGPSVEDGKWYGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWPSL 

SEG 

PRD cccccccccccccccccccccceeehhhhhccccccccccccccccccccccccccccce 

MEM MM 

SEQ CWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLAGT 

SEG . . . . xxxxxxxxxxxxxxxxxxxxx 

PRD eeeeeccccceeecccceeeecccccccccccccccceeeecccccccchhhhhhhhchh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMM 

SEQ ALCVAAGVLLAICLFWAMIGWLSQDTKAEPLDPEADSHVEVFGDEPEQQLSPIFRNASGQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccccccceeeeccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ SWFSPPASPFGQSSVQTIQPKRDS 

SEG 

PRD ccccccccccccceEeeccccccc 

MEM 



Prosite for DKFZphutel_19f 19.2 



PS00001 


6->10 


ASN GLYCOS YLATION 


PDOC00C01 


PS00001 


176->180 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


201->205 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


114->117 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


146->150 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


157->161 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00008 


38->44 


MYRISTYL 


PDOC00008 


PS00008 


92->98 


MYRISTYL 


PDOC00008 


PS00008 


119->125 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphutel_19f 19.2) 



461 



12/13/10, EAST Version: 2.4.2. 



WO 01/12659 



PCT/IB00/01496 



DKFZphutel_19gl9 



group: uterus derived 

DKFZphutel_19gl9 encodes a novel 400 amino acid protein, with strong but partial similarity to 
a bovine elastin-related protein expressed in fetal calf ligamentum nuchae. 

The novel protein contains 2 RGD cell attachment sites. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to bovine elastin fragment 
complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 

Locus: map=5 4.9 cR from top of Chr3 linkage group 
Insert length: 3244 bp 

Poly A stretch at pos . 3227, polyadenylation signal at pos. 3216 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



GTAACTGCAG 
GCTGGGGCTG 
CTTCCTAGCT 
GGCCGCGAAT 
AAACCAGCTC 
GGGATGTTTG 
CACCAATGAG 
TCTCGCTTGT 
GAAGGAAGGC 
GTCTGATCCA 
ACGTGGAGAT 
GAGGATGGGC 
GAGGTCAGAA 
ATAACCCCAG 
GTCCAAATTG 
CAACTTCAAG 
TCATTCGCCG 
GAGGTGGGAG 
TGACCCTGAC 
GGGGTGACCA 
CTCCTGCACC 
AAGGAGCAAC 
CCATGTTCAT 
GACTGGTTTC 
TGCCTTCTGT 
GGCTCTTCTA 
GTGCCCATCC 
AAAAGACCCT 
CCTCTCTCAC 
TTTTGGACTC 
GTGCACCAGG 
GCCAGTAAGC 
AGCTTCTCCT 
CCACTCAGCC 
GCTGATAAAG 
AAGTGCCATG 
CGGCCGTTAG 
GCATGGACCC 
ACCTGATGTG 
AAAAGCTGGA 
TCTTTGTAGT 
TACTTTGCTG 
AATAAGATTT 
ACAGGAAACA 
TTAAATTGGG 
TGTGTCACTG 
AAAGGATTTG 
TCATCCTCTG 
GATTAAAGAA 
CTCCTTCATG 
GCTGGTTGTA 
TTTCGATGTG 
CACGTGTTGT 



TAAGTCCCGC 
GCAAGAGGCC 
GAACAGCGCG 
TATTCCAGTA 
CCAGCCAGGC 
TGGGGCTCAT 
GGCCGCGCAT 
GGTGTCTCCT 
TGGTGCACAT 
AACTATGGGG 
GTACCAATGG 
AGGTGAAGAA 
ATCATCAACA 
TGCCATGGCA 
GCAGGTTTTT 
TCCCTGAGCC 
TGGAGACTTT 
ACTTGCGTGT 
CTGGGCCCAG 
GCTAGTCCCA 
ACGGGGACTT 
TCCATGAAGA 
GGGCCTCAAC 
CTGTTTTCCG 
GTGGCCACCT 
CCGACCCCTG 
TTGTTGCTCG 
GGCACCCGCC 
CTCTGACCCA 
TGCACCCCCT 
TTGGTGTTCA 
AGCTTTGGTG 
GCTGTTTCCT 
CATTGGCAGC 
GGCTCAGCCT 
CTTTGCCACC 
TCATTACTGC 
CTCAAAGCAG 
GTGTTTAAAA 
CATATACTGG 
GTGTGGGATC 
CTTCATGTGT 
AAAAACAAAA 
CCCTTTAGTC 
GTATGTGGTT 
AGAAGCTTTA 
TTTTCACTCT 
GCATCGGAAA 
ACGATTAAGA 
GTGACGCCCC 
CAATGCCCAC 
GGCACCTGGG 
GTCCATAGCT 



TTGGCCCTGG 
GCTGGACACC 
AGGCGGCCGC 
CCAGTACCCG 
TTCCTGGAAC 
GGCCTTCCTG 
TGAAGACGGC 
GACAGCATCC 
CATTGGCGCC 
TCCATCTTCC 
GTAGAAACTG 
GGAGACGAGG 
GCAAAAACTT 
GTGGAGTCAT 
CCTCTCGTCA 
TATCCAAGCT 
TTCTACCACA 
CTCCTTTTCC 
CTCACGTGGT 
TTCTCCACCA 
CTCAGCAGAG 
CCTGGGGCCT 
CTTATGACAC 
AGACCTGGTC 
CGCTGACCCT 
TGGGCCCTCC 
GACACGGGTG 
CGACACCTGC 
GCTCCATGCC 
CTCCTCTTCA 
CCAGCTCATG 
GGCAGCAGCA 
TCCTCTCTTG 
TGACAACGCA 
TGCCGTGTGC 
ACCACCAAGC 
TGAGTCCTGG 
GCACACCCAA 
GAGAAGAAAC 
GCTTCACACT 
TCTGAAGGCC 
ACTTTCCTAC 
CAAAAAAAAC 
TGTCAGTTGA 
TGATTGATAA 
CAATGGATGC 
GGGAGGAGAG 
CTCCCCTATG 
GAAAAGGTTG 
GTCAACCACA 
GCCTGCCTGG 
CTTCCTAGGG 
TTAGTCTTCC 



AGTCCACGCG 
ACGCTCCAGT 
AGCGAGCCGG 
GAGAGAACAT 
GGCTGAGCGA 
CTCTCCTTCT 
AACCTCATTG 
ACAGTGTGGC 
TTACGGACA'I 
GGCTGTGAAA 
AGGAGTCCAG 
TATTCCTACA 
CGACCGAGAG 
TCACGGCAAC 
GGCCTCATCG 
GGAGGACCCT 
GCGAAAATCC 
TATGCTGGAC 
CACTGTGATT 
AGTCTGGGGA 
GAGGTGTTTC 
GCGGGCAGCT 
GGATCCTCTA 
AACATTGGCC 
GCTGACCGTG 
TCATTGCCGG 
CCAGCCAAAA 
GTGAGCCCTA 
AGAGCAGGAG 
GGGGCCAGAC 
TCTTCCCCAC 
GCCATGAATG 
GACTGAGTGG 
GACACGCTCT 
TGCTTCTCAT 
ACATCTGTGA 
GTCACCAGCA 
AACACAAGTC 
ACTGAAGATG 
TATCTTATGG 
CTATTTAAGT 
CCCAAGAGGA 
ACTTAATATT 
ATTCAGAGCA 
AAAGTTACCT 
TTTTGAAACA 
GGTGGAGAAA 
CACTTGAAGA 
GAAGCTTTAT 
ATCAAGAACT 
CTGCTTTCAC 
CTGCTTCTGA 
TAAATAAGAT 



GATTTTCGAA 
CGTCAGCCCA 
GTCCCACCAT 
GTCAAAGTTA 
GACCTCGGGT 
ACCTAATTTT 
GCTGAGGGGC 
TCCGGAGAAT 
CCAAGCTTTT 
CTGCGGAGGC 
GGAGTACACC 
ACACTGAATG 
ATTGGCCACA 
AGCCCCCTTT 
ACAAAGTCGA 
CATGTGGACA 
CAAGTATCCA 
TGAGCGGCGA 
GCCCGGCAGC 
TACCTTACTG 
ATAGAGAACT 
GGCTGGATGG 
CACCTTGGTG 
TGAAAGCCTT 
GCGGCTGGCT 
CCTGGCCCTT 
AGTTGGAGTG 
GGATCCAGGT 
CCCCGGTCAA 
TTGGCAGCAT 
ATCTCTTCTT 
GCAAGCTGAC 
GTACGGCCAG 
ACGGAGGCCT 
CACTGCACAC 
TCCTGAAGGG 
GACACACTGG 
TGTGGCTAGA 
TCCTGAGGAG 
CTTGGCAGAA 
TTTTCTTCGT 
AGTTTTCTGA 
TCAGACTGTT 
CTGAAAGGTG 
CTCAGTATTT 
AGTATCAGCA 
GCACTTGCTT 
TGGTTTAAAA 
ACTAAATGGG 
GAGGCCTGAG 
CTGGGAGTGC 
GTGGTTCTTT 
CCACCCACAC 
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2 651 CTAAGTCACA GAATTTCTAA GTTCCCCAAC TACTCTCACA CCCTTTTAAA 
2701 GATAAAGTAT GTTGTAACCA GGATGTCTTA AATGATTCTT TGTGTACCTT 
2751 TTCTGTCATA TTCAGAAACC GTTTTGTGCC TGCTGGGAGT AATTCCTTTA 
2801 GCAATTAAGT ATTTGGTAGC TGAATAAGGG GTCAGAACTT CTGAAACCAG 
2851 AGATCTGTAA TCATCTCTAT TGGCCTGGGG TGCCTGTGCT ATAAATGAGT 
2 901 TTCTTCACAT GAAAAACACA GCCAGCCCAA GATGACTTAT CTGGGTTTAG 
2951 GATTCAATAG TATTCACTAA CTGCTTATTA CATGAGCAAT TTCATCAAAT 
3001 CTCCAAACTC TTAAAGGATG CTTTCGGAAA ACACGCTGTA TACCTAGATG 
3051 ATGACTAAAT GCAAAATCCT TGGGCTTTGG TTTTTTTCTA GTAAGGATTT 
3101 TAAATAACTG CCGACTTCAA AAGTGTTCTT AAAACGAAAG ATAATGTTAA 
3151 GAAAAATTTG AAAGCTTTGG AAAACCAAAT TTGTAATATC ATTGTATTTT 
3201 TTATTAAAAG TTTTGTAATA AATTTCTAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS545355 from database EMBL: 
human STS WI-14815. 
Length - 436 
Minus Strand HSPs : 

Score = 2040 (306.1 bits), Expect = 6.2e-86, P = 6.2e-86 
Identities = 420/426 (98%) 

Entry HS932147 from database EMBL: 
human STS WI-8531. 
Length =341 
Minus Strand HSPs: 

Score - 1705 (255.8 bits), Expect = 4.7e-70, P - 4.7e-70 
Identities = 341/341 (100%) 



Medline entries 



86051793: 

Bovine elastin cDNA clones: evidence for the occurrence of a 
new elastin-related protein in fetal calf ligamentum nuchae. 



Peptide information for frame 2 



ORF from 149 bp to 1348 bp; peptide length: 400 
Category: similarity to known protein 



1 MAANYSSTST RREHVKVKTS SQPGFLERLS ETSGGMFVGL MAFLLSFYLI 

51 FTNEGRALKT ATSLAEGLSL VVSPDSIHSV APENEGRLVH IIGALRTSKL 

101 LSDPNYGVHL PAVKLRRHVE MYQWVETEES REYTEDGQVK KETRYSYNTE 

151 WRSEIINSKN FDREIGHNNP SAMAVESFTA TAPFVQIGRF FLSSGLI DKV 

201 DNFKSLSLSK LEDPHVDIIR RGDFFYHSEN PKYPEVGDLR VSFSYAGLSG 

251 DDPDLGPAHV VTVIARQRGD QLVPFSTKSG DTLLLLHHGD FSAEEVFHRE 

301 LRSNSMKTWG LRAAGWMAMF MGLNLMTRIL YTLVDWFPVF RDLVNIGLKA 

351 FAFCVATSLT LLTVAAGWLF YRPLWALLIA GLALVPILVA RTRVPAKKLE 

BLASTP hits 

Entry 145887 from database PIR: 
elastin - bovine (fragment) 
Length = 40 

Score = 131 (46.1 bits), Expect = 4.9e-08, P = 4.9e-08 
Identities = 31/41 (75%), Positives = 34/41 (82%) 



Alert BLASTP hits for DKFZphutel_19gl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphutel 19gl9, frame 2 



Report for DKFZphutel_19gl9.2 



[LENGTH J 400 
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[MW] 

[pi] 

[HOMOLl 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 



44831.53 
7.23 

PIR: 145887 elastin 
RGD 2 
MYRISTYL 3 
CAMP_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
T YR_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLAT ION 
TRANSMEMBRANE 4 



bovine (fragment) le-06 



SEQ MAANYSSTSTRREHVKVKTSSQPGFLERLSETSGGMFVGLMAFLLSFYLIFTNEGRALKT 

PRD ccceeecccceeeeeeeecccccceeeecccccccchhhhhhhhhhheeeeecccchhhh 

MEM MMMMMMMMMJ^MMMMMMMMMMMMMMMMMMM . . 

SEQ ATSLAEGLSLVVSPDSIHSVAPENEGRLVHIIGALRTSKLLSDPNYGVHLPAVKLRRHVE 

PRD hhhhhccceeeeccccceeeeccccceeeeeeeeeeceeeccccccccccchhhhhhhhh 

MEM 

SEQ MYQWVETEESREYTEDGQVKKETRYSYNTEWRSEIINSKNFDREIGHNNPSAMAVESFTA 

PRD hheeehhhhheeecccccccceeeccccccceeeeeeccccceeecccccceeeeeeecc 

MEM M 

SEQ TAPFVQIGRFFLSSGLIDKVDNFKSLSLSKLEDPHVDI IRRGDFFYHSENPK YPEVGDLR 

PRD ccceeeeeeeeeccccccccccceeeeeeeccccceeeeecccceeecccccccccccee 

MEM MMMMMMMMMMMMMMMMM 

SEQ VSFSYAGLSGDDPDLGPAHVVTVIARQRGDQLVPFSTKSGDTLLLLHHGDFSAEEVFHRE 

PRD eeccccccccccccccceeeeeeeeecccccccccccccceeeeeecccccchhhhhhhh 

MEM 

SEQ LRSNSMKTWGLRAAGWMAMFMGLNLMTRILYTLVDWFPVFRDLVNIGLKAFAFCVATSLT 

PRD hhccccccccchhhhhhhhhhhchhhhhhhhheeecccccccccccceeeeeeeeehhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM MMMM 

SEQ LLTVAAGWLFYRPLWALLIAGLALVPILVARTRVPAKKLE 

PRD hhhhhccceeehhhhhhhhhhhhchhhhhhhhcccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphutel_19gl9.2 



PS00001 


4->8 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


140->144 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


9->12 


PKC PHOSPHO 


SITE 


PDOC0C005 


PS00005 


10->13 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


97->100 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


276->279 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


305->30S 


PKC_PHOSPHO" 


[site 


PDOC00005 


PS00006 


10->14 


CK2 PHOSPHO" 


STTF, 


PDOC00006 


PS00006 


63->67 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


209->213 


CK2 PHOSPHO^ 


"site 


PDCC00006 


PS00006 


249->253 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


292->296 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


332->336 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


220->227 


TYR PHOSPHO 


"site 


PDOC0C007 


PS00007 


99->107 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


35->41 


MYRISTYL 




PDOC00008 


PS00008 


93->99 


MYRISTYL 




PDOC00008 


PS00008 


310->316 


MYRISTYL 




PDOC00008 


PS00016 


221->224 


RGD 




PDOC00016 


PS00016 


268->271 


RGD 




PDOC00016 



(No Pfam data available for DKFZphutel_19gl9 . 2 ) 
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DKFZphutel_19g22 



group: cell structure and motility 

DKFZphutel_19g22 encodes a novel 390 amino acid protein with very strong similarity to 
tuf telin/enamelin . 

Tuf telin/enamelin are matrix proteins of the teeth. As other proteins involved in 
calcification, these proteins are also expressed in the uterus matrix. 

The new protein can find application in modulation of tissue-calcification, especially the 
uterus . 



complete cDNA, complete cds start at Bp 51, EST hits in 3' UTR, 
human homolog of mouse tuftelin 

tuftelin is descriebed as a matrix protein of teeth but it seems also 
to be pressend in the uterus matrix 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3110 bp 

Poly A stretch at pos . 3093, polyadenylation signal at pos . 3071 



1 GCAGACAGCG GGGTGGACAA GTGGCGTGTG TGCTGCGACC CCGAGGGAAG 
51 ATGAACGGGA CGCGGAACTG GTGTACCCTG GTGGACGTGC ACCCAGAGGA 
101 CCAGGCGGCG GGCAGCGTGG ACATTCTCAG GCTGACTCTC CAGGGTGAAC 
151 TGACAGGAGA TGAACTTGAA CACATAGCCC AGAAGGCGGG CAGGAAGACC 
201 TATGCCATGG TGTCCAGCCA CTCAGCTGGT CATTCTCTGG CTTCAGAACT 
251 GGTGGAGTCC CATGATGGAC ATGAGGAGAT CATTAAGGTG TACTTGAAGG 
301 GGAGGTCTGG AGACAAGATG ATTCACGAGA AGAATATTAA CCAGCTGAAG 
351 AGTGAGGTCC AGTACATCCA GGAGGCCAGG AACTGCCTAC AGAAGCTCCG 
4 01 GGAGGATATA AGTAGCAAGC TTGACAGGAA CCTAGGAGAT TCTCTCCATC 
451 GACAGGAGAT ACAGGTGGTG CTAGAAAAGC CAAATGGCTT TAGTCAGAGT 
501 CCCACAGCCC TGTACAGCAG CCCACCTGAG GTGGACACCT GTATAAATGA 
551 GGATGTTGAG AGCTTGAGGA AGACGGTGCA GGACTTGCTG GCCAAGCTTC 
601 AGGAGGCCAA GCGGCAACAC CAGTCAGACT GTGTGGCTTT TGAGGTCACA 
651 CTCAGCCGGT ACCAGAGGGA AGCAGAACAA AGTAATGTGG CCCTTCAGAG 
701 AGAGGAGGAC AGAGTGGAGC AGAAAGAGGC AGAAGTCGGA GAGCTGCAGA 
7 51 GGCGCTTGCT AGGGATGGAG ACGGAGCATC AGGCCTTACT GGCGAAAGTG 
801 AGGGAAGGGG AGGTGGCCCT AGAGGAACTT CGGAGCAACA ATGCTGACTG 
851 CCAAGCAGAA CGAGAAAAGG CTGCTACCCT GGAAAAGGAA GTGGCCGGGT 
901 TGCGGGAGAA GATCCACCAC TTGGATGACA TGCTCAAGAG CCAGCAGCGG 
951 AAAGTCCGGC AAATGATAGA GCAGCTCCAG AATTCAAAAG CTGTGATCCA 
1001 GTCAAAGGAC GCCACCATCC AGGAGCTCAA GGAGAAAATC GCCTATCTGG 
1051 AGGCAGAGAA TTTAGAGATG CATGACCGGA TGGAACACCT GATAGAAAAA 
1101 CAAATCAGTC ATGGCAACTT CAGCACCCAG GCCCGGGCCA AGACAGAGAA 
1151 CCCGGGCAGT ATTAGGATAT CCAAGCCGCC TAGCCCGAAG CCCATGCCTG 
1201 TCATCCGAGT GGTGGAAACC TGAGCTGCCT GGAGATGGTT GCTGCCATTG 
12 51 CTGCTGCCTC TGCCTCGGAG AAGCCCACTG CCCCTGTTGG CTGTTAACAC 
1301 TGCCTTTGAC TTCCTGACTG TCCCCTGGCT GCACCCAGGA CTTCGGGCTC 
1351 CTGTGTCTCA CCATTCCCAA GCCCCTGGCC ACTCTAAGCT GGGCAGACGG 
14 01 AGCACGAGCA CCTATTCAAG GCACTGCAGC CCTTTGGAAG ACATTGTCCT 
1451 GCAAGCAGGA GCCAGGGCAA TATCTATATT CCTACAGTGA CTATTTTTCT 
1501 CTGTAGAGAG CCTCCCTTCT GTTGTAGACT GGACTCTGGC TGCGCCATAA 
1551 GCCAGGCCTT CATCAGATTG GGAGAGGTGA CAAGATTTGC CTCAGCCCTA 
1601 AAAGCTGGAG ACACAGATGT CCAGAGTGAT TGGAGAATGT CCTGGGGGAA 
1651 TGAAGTTCCT TCCACAAACA CAGCTCAGTT CTTAGCAACA AACTGTTTGT 
1701 TTTTCTACTT GCTCCATCTG CAGCCTACGC TGCCCTGGCC TCCTGCAGAC 
1751 AGATAGTGGG GTTACCTGGC AAGGCCTGGT GAGAGCCAGT GAACCTAAGC 
1801 TTTGACTGGG TGGCCTTGTC TTTCTGGGGA GGAGGGAATG TACATTCAGG 
18 51 GAGTAGCCTT TTGCGGAAAA ATTCTCTAGG GCTACAGACA GTCATGTGTG 
1901 ACTTCTCTCT GCTGTGAAAA CTCCCAGAGT CTCTTTAGGG ATTTTCCCTA 
1951 AGGTGTACCA CCAGGCACAC CTCAGTCTTC TTGACCCAGA GCCTGAAAAC 
2001 TGTTTTCACT GGGTTCCACC AGTCCCAGCA AAATCCTCTT TGTATTTATT 
2051 TTGCTAAGTT ATTGGTGGTT TTGCTTACAT CTCATGATTG ATATAATACC 
2101 AAAGTTCTAT AGCCTTCTCT TGCAGTATTT GGATTTGCTT GAAACCGGGA 
2151 AAACTGTTCC CATTAGGCTT GTTAATGTCA GAGTGACACT ATTATGAATC 
2201 TTTCTCTCCC TTTCCTCTGC CTGTTTCTTC TCTCTTTCTC CTTCAAACTT 
2251 GCTCTGCAGC TAAGGAAGGT GAGTCTACTT TCCCTGAGGC TTTGGGGTCA 
2301 GAGTATATGT TGTTTGGAGA AAGAGGGCAA TCAGGACTCT' TCTGGGACCC 
2351 AGATGAGTTC TTCACTAGCC CTTCTGAACC CCTTGCTCCA TAATTGGTCT 
2401 TTTATCCTGG CTCTGAATGA CCCTGCAGGT CATCATGGTT TTCTTTTTTT 
24 51 ATTGTTTTTT TTTTTTTCTG AGACAGAGTC TCACTCTGTC ACCCAGGCTG 
2501 GAGTGCAGTG GCGCGATCTC AGCTCACTGC AACCTCTGCC TCCCGGATTT 
2551 AAGCGATTCT TCTGCCTCAG CCTCCCGAGT AGCTGGGACT ACAGGTGTGC 
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2601 CACCACGCCT GGCTGATTTT 

2651 TACTGGCTAG GCTGGTCTCG 

2701 CGGCTTCCCA AAGTGCTAGG 

2751 ATGGTGTTTT TCTTTAGGGC 

2801 ATCAGAGTAT GGTACTATAG 

2851 TAAGTGTTTA GGCTCTATGT 

2901 GTGTTTCTGT GTCTCAAGAC 

2951 ATGCTCTGGG ATTTCAGGGA 

3001 AGCAGGTGAT ATCCATGTTT 

3051 TATTCTTTGT ATGGCGAATT 

3101 AAAAAAAAAA 



TGTATTTTTA GTAGAGATGG GGTTTCACCA 
AATTCCTGAC CTCAGGTGAT CCACCCACCT 
ATTATAGGCT TGAGCTACTG TGCCCGGCCC 
TCTTCCTACA GCCTTGAGAA GTAGATAGGC 
GAATCAGAAA AATTCAAAAC AAATGTGGAT 
GGCTCACGCA GCCAGAATCC TTAAGTCTGT 
TGGGCTCACA TTCTGGCTTT GTCCATAACA 
GTTCCCTCAT TTGTAAAATG AGGGGGTCAG 
CTTCCCTTTC TGATATTGTT GTCTGTGGCA 
TAATAAATTA TATTAATGTG TCTAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98200312 : 

Tuftelin — aspects of protein and gene structure 
97228909: 

Timing of the expression of enamel gene products during mouse tooth 
development . 

91340750: 

Sequencing of bovine enamelin ("tuftelin") a novel acidic enamel 
protein. 



Peptide information for frame 3 



ORF from 51 bp to 1220 bp; peptide length: 390 
Category: strong similarity to known protein 



1 MNGTRNWCTL 

51 YAMVSSHSAG 

101 SEVQYIQEAR 

151 PTALYSSPPE 

201 LSRYQREAEQ 

2 51 REGEVALEEL 

301 KVRQMIEQLQ 

351 QISHGNFSTQ 



VDVHPEDQAA GSVDILRLTL QGSLTGDSLE 

HSLASELVES HDGHEE11KV YLKGRSGDKM 

NCLQKLREDI SSKLDRNLGD SLHRQEIQVV 

VDTCINEDVE SLRKTVQDLL AKLQEAKRQH 

SNVALQREED RVEQKEAEVG ELQRRLLGME 

RSNNADCQAE REKAATLEKE VAGLREKIHH 

NSKAV1QSKD ATIQELKEKI AYLEAENLEM 

ARAKTENPGS IRISKPPSPK EMPVIRVVET 

BLASTP hits 



HIAQKAGRKT 
IHEKNINQLK 
LEKPNGFSQS 
QSDCVAFEVT 
TEHQALLAKV 
LDDMLKSQQR 
HDRMEHLIEK 



No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19g22, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_19g22 , frame 3 



Report for DKFZphutel_19g22 . 3 



[LENGTH] 

[MW] 

[pi] 

[ HOMOL ] 

cds. 0.0 

[FUNCAT] 

2e-ll 

[FUNCAT] 

[FUNCAT] 

jannaschii, 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 



390 

44264.09 
5.68 

TREMBL: AF04 7704 



1 product: "tuftelin"; Mus musculus tuftelin mRNA, complete 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL058w] 



[S. cerevisiae, YDL058w] 2e-ll 
recombination and repair [M. 



30.03 organization of cytoplasm 
1 genome replication, transcription, 
MJ1643] 7e-ll 

09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] le-08 

03.22.01 cell cycle check point proteins [S. cerevisiae, YGL086w] 6e-08 

30.10 nuclear organization [S. cerevisiae, YGL086w] 6e-08 
03.13 meiosis [S. cerevisiae, YNL250w] 7e-08 
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[ FUNCAT] 


03.19 recombination and dna repair [S. cerevisiae, YNL250w] 7e-08 


[FUNCAT] 


11.04 dna repair (direct repair, base excision repair and nucleotide excision 


repair) 


[S. cerevisiae, YKR095w] le-07 


[FUNCAT] 


03.22 cell cycle control and mitosis [S. cerevisiae, YDR285w] 2e-07 


(FUNCAT] 


30.13 organization of chromosome structure [ S . cerevisiae , YDR285w] 2e-07 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YOR216c] le-05 


[FUNCAT] 


01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] le-04 


[FUNCAT] 
le-04 


03.04 budding, cell polarity and filament formation [S. cerevisiae, YNL243w] 


[ FUNCAT ] 


30.04 organization of cytoskeleton [S. cerevisiae, YNL243w] le-04 


[FUNCAT] 


03.07 pheromone response, mating-type determination, sex-specific proteins 


[S. 


cerevisiae, YNL243w] le-04 


[FUNCAT] 


08.19 cellular import [S. cerevisiae, YNL243w] le-04 


[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YNL243w] le-04 


[FUNCAT] 


08.22 cytoskeleton-dependent transport (S. cerevisiae, YHR023w MYOl - 


myosin- 1 isof orm] 4e-04 


r FTINPAT 1 


f) ? ^ rutnH np<;i <; C ^ fPVft/i'^i^f^ YHRfl?^u MY(~) 1 — Ttivn^in — T "i *irt f n rrn 1 d A — fl d 


[ FUNCAT] 


09.10 nuclear bio gene sis [ S . ce revi s iae , YDR3 5 6w ] 4 e -04 


| £ Ull V^rt 1 J 


n^ httt^ n i "7 .a t~ i r^i ~f ron t - r^fno f ^ roroui ^0 VMQ^Qdi.il 7 d — H J 

JU . U J <-J-L LjalllZOI LU11 KJ L. LCIIL1UD UHltZ L 1 * LCiCwlalaC/ 1 IMr\Zl 34W J /C! UM 


[ EC ] 


3 6 1 32 Mvo^in ATP^^p flp-OQ 


[ PIRKW] 


Vi 1 o f V" »3 H pmi nn onH 1*^ — 0*7 

VJ 1-\J r^CU C11LIJ_11W C1LU J_ C w / 


[ PIRKW] 


finrl one 1 o — 11 fi 


[ PIRKW] 


cit rulline le-07 


r PTRKW 1 
I r 1 xviwr j 


Lai li-ieill LcjJCOL OC 


[ PIRKW ] 


h a t" r"<*iH "i mo i~ "1 <=• — H£ 

1 1C 1. C J- (JVJ.-LILIC J. JC uu 


[ PIRKW] 


DNA repair 2e-06 


[ PIRKW] 




[ PIRKW] 


C11UULV -J *J / 


[ PIRKW] 




[ PIRKW] 


7 i nr "f i n f 7 <=> r To - (17 


F PTRKW1 


lilt; L d J. iJJLIlv*_LIiy Jc LI / 


T PTRKW 1 


miKrl p rnnt T'arhi i*in Qp-HQ 


[ DT R™ 1 




[ PIRKW ] 


3 <" +■ 1 hi i Pa — HQ 
aC.LJ.ri L)±ilU.±Jiy □ e U 


[ PIRKW] 




[ PIRKW] 


cell division control le— 06 


[ PIRKW] 


ATP 8e-0 9 


[ P I RKW ] 


f h T*nmn r>m?i 1 nrntpin 3 p — fl 


[ P I RKW ] 


thick filament 8e-09 


[ PIRKW] 


phosphoprotein le-145 


[ P I RKW ] 


skeletal muscle 8e — 09 


[ PIRKW] 


calcium binding Is- 07 


[ PIRKW ] 


mpi s ? p— 0 


[ PIRKW] 


alternative splicing 7e-08 


[ PIRKW] 


DNA condensation 3e-06 


[ PIRKW] 


coiled coil 4e-10 


(PIRKW] 


P-loop 8e-09 


[PIRKW] 


heptad repeat le-07 


[PIRKW] 


methylated amino acid 8e-09 


[PIRKW] 


immunoglobulin receptor 2e - 06 


[PIRKW] 


peripheral membrane protein 3e-07 


[PIRKW] 


cardiac muscle 8e-09 


[PIRKW] 


hydrolase 8e-09 


[PIRKW] 


muscle 7e-08 


[PIRKW] 


EF hand le-07 


[PIRKW] 


cytoskeleton 7e-08 


[PIRKW] 


hair le-07 


[PIRKW] 


smooth muscle 7e-08 


[PIRKW] 


calmodulin binding 3e-07 


[SUPFAM] 


conserved hypothetical P115 protein 2e-09 


[SUPFAM] 


myosin heavy chain 8e-09 


[SUPFAM] 


RAD50 protein 2e-06 


[SUPFAM] 


calmodulin repeat homology le-07 


[SUPFAM] 


myosin motor domain homology 8e-09 


[SUPFAM] 


alpha-actinin actin-binding domain homology le-06 


[SUPFAM] 


tropomyosin 7e-08 


[SUPFAM] 


protein-tyrosine kinase ret 3e-07 


[SUPFAM] 


plectin le-OS 


[SUPFAM] 


trichohyalin le-07 


[SUPFAM] 


pleckstrin repeat homology 2e-06 


[SUPFAM] 


ribosomal protein S10 homology le-06 


[SUPFAM] 


protein kinase homology 3e-07 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 2e-06 


[SUPFAM] 


giantin 4e-06 


[SUPFAM] 


kinesin-related protein KLPA le-06 


[SUPFAM] 


kinesin motor domain homology le-06 


[SUPFAM] 


human early endosome antigen 1 3e-07 


[SUPFAM] 


M5 protein 2e-06 


[PROSITE] 


MYRISTYL 1 


[PROSITE] 


AMIDATION 1 


[PROSITE] 


CK2 PHOSPH0_SITE 6 
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[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASNJ3LYCOSYLATION 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4 . 62 % 

[KW] COILED COIL 35.13 % 



SEQ MNGTRNWCTLVDVHPEDQAAGSVDILRLTLQGELTGDELEHIAQKAGRKTYAMVSSHSAG 

SEG 

PRD cccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ HSLASELVESHDGHEEIIKVYLKGRSGDKMIHEKNINQLKSEVQYIQEARNCLQKLREDI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ SSKLDRNLGDSLHRQEIQVVLEKPNGFSQSPTALYSSPPEVDTCINEDVESLRKTVQDLL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCC 

SEQ AKLQEAKRQHQSDCVAFEVTLSRYQREAEQSNVALQREEDRVEQKEAEVGELQRRLLGME 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ TEHQALLAKVREGEVALEELRSNNADCQAEREKAATLEKEVAGLREKIHHLDDMLKSQQR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KVRQMIEQLQNSKAVIQSKDATIQELKEKIAYLEAENL3MHDRMEHLIEKQISHGNFSTQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ ARAKTENPGSIRISKPPSPKPMPVIRVVET 

SEG xxxxxxxxxxxxxxxxxx . . . 

PRD hhcccccccceeeecccccccccceeeccc 

COILS 



Prosite for DKFZphutel_19g22 . 3 



psooooi 




2->6 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


356- 


>360 


asn" 


"GLYCOSYLATION 


EDOC00001 


PS00005 


121- 


>124 


PKC" 


"PHOSPHO SITE 


FDOC00005 


PS00005 


171- 


>174 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


370- 


>373 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


378- 


>381 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


9 


->13 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


35 


->39 


CK2~ 


"PHOSPHO SITE 


PDOC0000 6 


PS00006 


122- 


>126 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


157- 


>161 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


175- 


>179 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


322- 


>326 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00008 


355- 


>361 


MYRISTYL 


PDOC00008 


PS00009 


46 


->50 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphutel_19g22 . 3) 
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DKFZphutel_19hl7 



group: intracellular transport and trafficking 

DKFZphutel_19hl7 encodes a novel 879 amino acid protein, with similarity to N.crassa osbP 
oxysterol-binding protein. 

The novel protein contains a oxysterol-binding protein family signature. Mammalian oxysterol- 
binding protein (OSBP) is a protein binds a variety of oxysterols (oxygenated derivatives of 
cholesterol) . OSBP seems to play a complex role in the regulation of sterol metabolism. OSBP 
is a cytosolic/Golgi receptor for oxysterols such as 25-hydroxycholesterol , and thus a 
potential target of siphingomyelin turnover and cholesterol mobilization at the plasma 
membrane and/or Golgi apparatus. Therefore, the new protein seems to be involved in oxysterol 
metabolism. 

The new protein can find application in modulating the response of cells to oxysterols. The 
protein can be used as marker for the golgi system. The Protein might be used to direct drugs 
to the golgi system in response to oxidative stess. 



strong similarity to C.elegans ZK1086.1 and oxysterol-binding proteins 

complete cDNA, complete cds, few EST hits 

similarity to proteins involved in steroid biosynthesis 



Sequenced by AGOWA 
Locus: unknown 



Insert length: 3828 bp 

Poly A stretch at pos . 3811, polyadenylation signal at pos . 3784 



1 GCCCGCGCGC CCGGCCGGCC CGGAGCACCG AGCTCGCGGC ACGGTAGGAG 
51 AAGCCCCCGA GCGCCCACAG CATGAAGGAG GAGGCCTTCC TCCGGCGCCG 
101 CTTCTCCCTG TGTCCACCTT CCTCCACCCC TCAGAAAGTC GACCCCCGGA 
151 AGCTCACCCG GAACTTGCTC CTCAGCGGAG ACAATGAGCT CTACCCACTC 
201 AGCCCAGGGA AGGACATGGA GCCCAACGGC CCGTCGCTGC CCAGGGATGA 
251 AGGGCCCCCG ACCCCAAGCT CTGCCACGAA GGTGCCACCG GCAGAGTACA 
301 GGCTGTGCAA CGGGTCAGAC AAGGAATGTG TGTCCCCCAC CGCCAGGGTC 
351 ACCAAGAAGG AGACTCTCAA GGCGCAGAAG GAGAACTACC GGCAGGAGAA 
401 GAAGCGCGCC ACACGGCAGC TGCTCAGCGC TCTGACAGAC CCCAGCGTGG 
451 TCATCATGGC TGACAGCCTG AAGATCCGCG GCACCCTGAA GAGCTGGACC 
501 AAGCTGTGGT GCGTGCTGAA GCCGGGGGTG CTGCTCATCT ACAAGACGCC 
551 CAAGGTGGGC CAGTGGGTGG GCACGGTGCT GCTGCACTGC TGCGAGCTCA 
601 TCGAGCGGCC CTCCAAGAAG GACGGCTTCT GCTTCAAGCT CTTCCACCCG 
651 CTGGATCAGT CCGTCTGGGC CGTGAAGGGC CCCAAAGGTG AGAGCGTGGG 
701 CTCCATCACA CAGCCCCTGC CCAGCAGCTA CCTGATCTTC AGGGCCGCCT 
751 CCGAGTCAGA TGGTCGCTGC TGGCTGGACG CCCTGGAGCT GGCCCTGCGC 
801 TGCTCTAGCC TACTGAGACT GGGCACCTGC AAGCCGGGCC GAGACGGGGA 
851 GCCAGGGACC TCGCCAGACG CATCACCCTC ATCGCTCTGT GGGCTGCCAG 
901 CCTCAGCCAC TGTCCACCCA GACCAAGACC TGTTCCCACT GAACGGGTCT 
951 TCCCTGGAGA ACGATGCATT CTCAGACAAG T C G GAG AG AG AGAACCCTGA 
1001 GGAGTCAGAT ACCGAGACCC AGGACCATAG CCGGAAGACG GAGAGTGGCA 
1051 GCGACCAGTC AGAGACCCCT GGGGCCCCGG TGCGGAGAGG GACCACCTAT 
1101 GTGGAGCAGG TCCAGGAGGA GCTGGGGGAG CTGGGCGAGG CGTCCCAGGT 
1151 GGAGACAGTG TCAGAGGAGA ACAAGAGTCT GATGTGGACC CTGCTGAAGC 
1201 AGCTACGGCC AGGCATGGAC CTGTCCCGCG TGGTGCTACC CACGTTCGTA 
1251 CTGGAGCCGC GCTCCTTCCT GAACAAGCTC TCCGACTACT ACTACCACGC 
1301 AGACCTGCTC TCCAGGGCTG CGGTGGAGGA GGATGCCTAC AGCCGCATGA 
1351 AGCTGGTGCT GCGGTGGTAC CTGTCTGGCT TCTACAAGAA GCCCAAGGGA 
1401 ATCAAGAAGC CGTACAACCC CATCCTGGGG GAGACCTTCC GCTGCTGCTG 
1451 GTTCCACCCG CAGACTGACA GCCGCACATT CTACATAGCA GAGCAGGTGT 
1501 CCCACCACCC GCCCGTGTCT GCCTTCCACG TCAGCAACCG GAAGGACGGC 
1551 TTCTGCATCA GTGGCAGCAT CACAGCCAAG TCCAGGTTTT ATGGGAACTC 
1601 GCTGTCGGCG CTGCTGGACG GCAAAGCCAC GCTCACCTTC CTGAACCGAG 
1651 CCGAGGATTA CACCCTTACC ATGCCCTACG CCCACTGCAA AGGAATCCTG 
1701 TATGGCACGA TGACCCTGGA GCTGGGTGGG AAGGTCACCA TCGAGTGTGC 
1751 GAAGAACAAC TTCCAGGCCC AGCTGGAATT CAAACTCAAG CCCTTCTTCG 
1801 GGGGTAGCAC CAGCATCAAC CAGATCTCGG GAAAGATCAC GTCGGGAGAG 
1851 GAAGTCCTGG CGAGCCTCAG TGGCCACTGG GACAGGGACG TGTTTATCAA 
1901 GGAGGAAGGG AGCGGAAGCA GTGCGCTTTT CTGGACCCCG AGCGGGGAGG 
1951 TCCGCAGACA GAGGCTGAGG CAGCACACGG TGCCGCTGGA GGAGCAGACG 
2 001 GAGCTGGAGT CCGAGAGGCT CTGGCAGCAC GTCACCAGGG CCATCAGCAA 
2 051 GGGCGACCAG CACAGGGCCA CACAGGAGAA GTTTGCACTG GAGGAGGCAC 
2101 AGCGGCAGCG GGCCCGTGAG CGGCAGGAGA GCCTCATGCC CTGGAAGCCG 
2151 CAGCTGTTCC ACCTGGACCC CATCACCCAG GAGTGGCACT ACCGATACGA 
2201 GGACCACAGC CCCTGGGACC CCCTGAAGGA CATCGCCCAG TTTGAGCAAG 
2251 ACGGGATCCT GCGGACCTTG CAGCAGGAGG CCGTGGCCCG CCAGACCACC 
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2301 TTCCTGGGCA GCCCAGGGCC CAGGCACGAG AGGTCTGGCC CAGACCAGCG 
2351 GCTTCGCAAG GCCAGCGACC AGCCCTCCGG CCACAGCCAG GCCACGGAGA 
2401 GCAGCGGATC CACGCCTGAG TCCTGCCCAG AGCTCTCAGA CGAGGAGCAG 
2451 GATGGTGACT TTGTCCCTGG CGGTGAGAGC CCATGCCCTC GGTGCAGGAA 
2501 GGAGGCGCGG CGGCTGCAGG CCCTGCACGA GGCCATCCTC TCCATCCGAG 
2551 AGGCCCAGCA GGAGCTGCAC AGGCACCTCT CGGCCATGCT GAGCTCCACG 
2 601 GCACGGGCAG CACAGGCACC GACCCCAGGC CTCCTGCAGA GCCCCCGATC 
2651 CTGGTTCCTG CTCTGCGTGT TCCTGGCGTG TCAGCTGTTC ATTAACCACA 
2701 TCCTCAAATA GGAGCCCTGG GGGCAGAGCT CCTGGCCAGT CCCGAGCCCT 
2751 CCCTCCCAGG CACCCAGCAC TTTAAGCCTG CTCCATGGAG GCAGAGAGGC 
2801 CCGGCAAGCA CAGCCACTGT GACGGGGAGT CCAGGCGCAG GAGGGACCCG 
2851 GGGCCACAAG GCGCTGCGGG CCCAGGTGTG CTGGGCCCCT CTCAGGGGCA 
2901 CTGGCCTCTC TGCAGGGCCT TCCGCCCAGC GCTGGCCTTA ATGCTAAAGC 
2951 CAAATGCAGC TTCTGCTGTG CGACGCACTC CTGGCCATCT TGCCGTGTCA 
3001 CCCCCTGTCC GGCCTCCACT TGCCATGGGG GATGGATGGA TTTAGGGTGG 
3051 GAGGGCCTGT GGGGGCCCTG GACAGTCACA CCCCAGCAGC AGTGAGTGGG 
3101 CAGGTTTGGA GGAGCAGCCA GGGAGCCCCG AGTGGCCCAG GAGTCCCCCC 
3151 ACACACAGAT GCATAGGCCT GCCTTCCGGA GACCCTGTCC ACATTGCCGG 
3201 GACCACCCTG GTGGGGCCAC TGGTGGGTGC CAGGGACAGG TTAGGGCCAC 
3251 TCTGGGGAAG GCATTTTGGT TTTTTATTCC ACGCTCTGCT GTTTGGATGG 
3301 GAGCCCCACA GAGGCAGGTC CTGGAACCAC CCCACCCCCA CACCTGGACG 
3351 CTCGCTCTGG TGGGGGCACA CGCAGGTGGA GGTGGTTGTG GGTGCAGGTG 
3401 TGTGCAGGGG TGTGGGGGGC GCAGGGGTGT GGCTTAGCTG GCCCCGCACC 
34 51 CAGGCCGGGG AGGCTCAAGT TCGCCACTTT ACTCAGACCG ATGCACAGTC 
3501 TTCCCATTTT ACACTTTTTT AATAAACATA ATTGCAATAT TTTAGGTGGG 
3551 CTGCGAGCTG CAGTCAGCCT TCACGTCTGG CCTCAGTCCC CGTGTCAGTG 
3601 CCGCTCTGCG TGTGCGTGTG CGCGTGTGTG AGCCTCTACA CATATATATA 
3651 TGTACAGAGC CTTAAACCAC ATCGTGGCGG TGCCGTCTGA GCTGTAGCGG 
3701 GTGGCTTTGT TTCCAGTTTT TGTACCCGTG TCCTTGTCTC CCCTCCTCCC 
3751 CCATCTGGGG ATGTGTCTGT GTTCCACACC TTGAAATAAA CAGACACATA 
3801 CGTGTTCTCT TAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98315477: 

The pleckstrin homology domain of oxysterol-binding 
protein recognises a determinant specific to Golgi 
membranes . 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) --implications for the role of 

OSBP . 
98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) --implications for the role of 
OSBP. 



Peptide information for frame 3 



ORF from 72 bp to 2708 bp; peptide length: 879 
Category: strong similarity to known protein 



1 MKEEAFLRRR FSLCPPSSTP QKVDPRKLTR NLLLSGDNEL YPLSPGKDME 

51 PNGPSLPRDE GPPTPSSATK VPPAEYRLCN GSDKECVSPT ARVTKKETLK 

101 AQKENYRQEK KRATRQLLSA LTDPSVVIMA DSLKIRGTLK SWTKLWCVLK 

151 PGVLLI YKTP KVGQWVGTVL LHCCELIERP SKKDGFCFKL FHPLDQSVWA 

201 VKGPKGESVG SITQPLPSSY LIFRAASESD GRCWLDALEL ALRCSSLLRL 

251 GTCKPGRDGE PGTSPDASPS SLCGLPASAT VHPDQDLFPL NGSSLENDAF 

301 SDKSERENPE ESDTETQDHS RKTESGSDQS ETPGAPVRRG TTYVEQVQEE 

351 LGELGEASQV ETVSEENKSL MWTLLKQLRP GMDLSRVVLP TFVLEPRSFL 

401 NKLSDYYYHA DLLSRAAVEE DAYSRMKLVL RWYLSGFYKK PKGIKKPYNP 

451 ILGETFRCCW FHPQTDSRTF YIAEQVSHHP PVSAFHVSNR KDGFCISGSI 

501 TAKSRFYGNS LSALLDGKAT LTFLNRAEDY TLTMPYAHCK GILYGTMTLE 

551 LGGKVTIECA KNNFQAQLEF KLKPFFGGST SINQISGKIT SGEEVLASLS 

601 GHWDRDVFIK EEGSGSSALF WTPSGEVRRQ RLRQHTVPLE EQTELESERL 
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651 WQHVTRAISK GDQHRATQEK FALEEAQRQR ARERQESLMP WKPQLFHLDP 

701 ITQEWHYRYE DHSPWDPLKD IAQFEQDGIL RTLQQEAVAR QTTFLGSPGP 

751 RHERSGPDQR LRKASDQPSG HSQATESSGS TPESCPELSD EEQDGDFVPG 

801 GESPCPRCRK EARRLQALHE AILSIREAQQ ELHRHLSAML SSTARAAQAP 

851 TPGLLQSPRS WFLLCVFLAC QLFINHILK 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphutel_19hl7 , frame 3 

TREMBL:CEZK1086_2 gene: "ZK1086.1"; Caenorhabditis elegans cosmid 
ZK1086, N = 1, Score = 1495, P - 2.7e-153 

PIR:S25324 hypothetical protein YKR003w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 574, P = 8.5e-57 

TREMBL:CEAF195_7 gene: "C32F10.1"; Caenorhabditis elegans cosmid 
C32F10., N = 1, Score = 588, P = 8.6e-57 

PIR:S46796 hypothetical protein YKR003w homolog YHROOlw - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 585, P = 1.9e-56 

TREMBL:NCOSBP_l gene: "osbP",- product: "oxysterol-binding protein"; 
N.crassa mRNA for putative oxysterol-binding protein, N = 1, Score = 
571, P = 7e-55 

TREMBL:AB017026_1 product: "oxysterol-binding protein"; Mus musculus 
mRNA for oxysterol-binding protein, complete cds . , N = 2, Score = 328, 
P = 3e-35 



>TREMBL:CEZK1086_2 gene: "ZK1086.1"; Caenorhabditis elegans cosmid ZK1086 
Length = 751 



HSPs : 



Score = 1495 (224.3 bits), Expect = 2.7e-153, P = 2.7e-153 
Identities = 327/663 (49%), Positives = 430/663 (64%) 

Query: 129 MADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKV — GQWVGTVLLHCC ELIERPSKKDGF 186 

MAD+LKIRG LK W +■ +CVLKPG+L++YK K G WVGTVLL+ CELIERPSKKDGF 
Sbjct: 1 MADTLKIRGALKRWNRYYCVLKPGLLILYKHKKADRGDWVGTVLLNHCELIERPSKKDGF 60 

Query: 187 CFKLFHPLDQSVWAVKGPKGESVGSIT-QPLPSSYLIFRAASESDGRCWLDALELALRCS 245 

CFKLFHP+D S+W +GP G+S GS T PL +S+LI RA S+ GRCW+DALEL+ +C+ 
Sbjct: 61 CFKLFHPMDMSIWGNRGPLGQSFGSFTLNPLNTSFLICRAPSDQAGRCWMDALELSFKCT 120 

Query: 24 6 SLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAFSDK-S 304 

LL+ T D + G D+S +G ++DD G AS+ + 

Sbjct: 121 GLLKK-TMNE-LDDKNG- — DSSMND— GQRCESRMSRDSD GDDTRELAVSETDA 168 

Query: 305 ERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTT YVEQVQEELGELGEASQVE 361 

E+ E D + +DH E G SET +R T ++ +E G G S E 

Sbjct: 169 EKHFQEIDDVQDEDH EDGK-MSETSDT-IREAFTESAWIPSPKEVFGPDG--SLTE 220 

Query: 362 TVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEED 421 

V EENKSL+WTLLKQ+RPGMDLS+VVLPTF+LEPRSFL KL+DYYYHADL+S A ED 
Sbjct: 221 EVGEENKSLIWTLLKQIRPGMDLSKVVLPTFILEPRSFLEKLADYYYHADLISEAVAEPD 280 

Query: 422 AYSRMKLVLRWYLSGFYKKPKGI KKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHPP 481 

+ R+ V +++LSGFYKKPKG+KKPYNPILGETFRC W HP S TFY+AEQVSHHPP 
Sbjct: 281 PFQRIVKVTKFFLSGFYKKPKGLKKPYNPILGETFRCKWEHPD-GSTTFYMAEQVSHHPP 339 

Query: 482 VSAFHVSNRKDGFCI SGSITAKSRFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCKG 541 

VS+ ++NRK GF ISG+I AKS++YGNSLSA+L GK LT LN E Y + +PYA+CKG 
Sbjct: 340 VSSLFITNRKAGFNISGTILAKSKYYGNSLSAILAGKLRLTLLNLGETYIVNLPYANCKG 399 

Query: 542 I LYGTMTLELGGKVT I ECAKNNFQAQLEFKLKPFFGGSTS INQISGKI TSGEEVLASLSG 601 

1+ GTMT+ELGG+V IEC K ++ L+FKLKP GG+ NQI G I G + LAS+ G 
Sbjct: 400 IMI GTMTMELGGEVNI ECEKTGYRTTLDFKLKPMLGGA — YNQIEGS I KYGSDRLASI EG 457 

Query: 602 HWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISKG 661 

WD + IK G W P+ EV + RL ++ + ++EQ E ES +LW+HVT AIS 

Sbjct: 458 AWDGVIRIK--GPDGKKELWNPTPEVIKTRLPRYEINMDEQGEWESAKLWRHVTEAISNE 515 

Query: 662 DQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKDI 721 

DQ++AT+EK ALE QR RA+ S +P + + F ++ Y + D+ PWD DI 

Sbjct: 516 DQYKATEEKTALENDQRARAK SGIPHETKFFKKQH-GDDYVYIHADYRPWDNNNDI 570 
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Query: 722 AQFEQDGILRTLQQEAVAR— QTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSG 779 

Q E + +++T+ + + + + LGS E S D+ + +p + + 

Sbjct: 571 QQIENNYVVKTISRHSKRKTGNSEQLGSDNTS-EASESDEEVI EPKIKKKEIVPAK 625 

Query: 780 STPESCPELSDE 791 

S P + PE++DE 
Sbjct: 626 SKPIT-PEVADE 636 



Pedant information for DKFZphutel_19hl7, frame 3 



Report for DKFZphutel_19hl7 . 3 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[FUNCAT] 

[FUNCAT] 

3e-55 

[ FUNCAT] 

[ FUNCAT] 

3e-23 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[SUPFAM] 

[SOPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PFAM] 

[KW] 

[KW] 

[KW] 



879 

98616.79 
7.29 

TREMBL:CEZK108 6_2 gene: 



'ZK1086.1"; Caenorhabditis elegans cosmid ZK1086 le-157 



01.06.16 lipid and fatty-acid binding [S. cerevisiae, YHROOlw] 3e-55 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YHROOlw] 

30.03 organization of cytoplasm [S. cerevisiae, YPL14 5c ] 3e-23 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YPL145c] 



04.05.01.07 chromatin modification 
BL00168F 

BL01013D Oxysterol-binding protein family proteins 
BL01013C Oxysterol-binding protein family proteins 
BL01013B Oxysterol-binding protein family proteins 
BL01013A Oxysterol-binding protein family proteins 
transmembrane protein le-19 
pleckstrin repeat homology 8e-18 
ankyrin repeat homology le-19 
unassigned ankyrin repeat proteins le-19 
MYRISTYL 12 
CAMP_PHOSPHO_SITE 6 
OSBP 1 

CK2_PHOSPHO_SITE 21 
PR0KAR_LIP0PROTEIN 1 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 2 0 

ASN_GLYCOSYLATI0N 3 
PH (pleckstrin homology) domain 
TRANSMEMBRANE 1 
LOW_C0MPLEXITY 2.96 % 

COILED COIL 3.53 % 



[S. cerevisiae, YAR044w] 5e-20 



SEQ MKEEAFLRRRFSLCPPSSTPQKVDPRKLTRNLLLSGDNELYPLSPGKDMEPNGPSLPRDE 

SEG 

PRD ccchhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS '. 

MEM 

SEQ GPPTPSSATKVPPAEYRLCNGSDKECVSPTARVTKKETLKAQKENYRQEKKRATRQLLSA 

SEG 

PRD cccccccccccccceeeecccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LTDPSVVIMADSLKIRGTLKSWTKLWCVLKPGVLLIYKTPKVGQWVGTVLLHCCELIERP 

SEG 

PRD hcccceeeecccccccccccccceeeeeeccceeeeecccccccceeeeecccccccccc 

COILS CCC 

MEM 

SEQ SKKDGFCFKLFHPLDQSVWAVKGPKGESVGSITQPLPSSYLIFRAASESDGRCWLDALEL 

SEG 

PRD ccccceeeeecccccceeeeecccccceeecccccccceeeeeeehhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ ALRCSSLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAF 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ SDKSERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTTYVEQVQEELGELGEASQV 
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SEG xxxxxxxxxxxxx . . . . 

PRD cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccc 

COILS 

MEM 

SEQ ETVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEE 

SEG 

PRD cccccccchhhhhhhhhhcccccceeeccceeeecccchhhhhhhhhccccccccccccc 

COILS 

MEM 

SEQ DAYSRMKLVLRWYLSGFYKKPKGI KKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHP 

SEG 

PRD chhhhhhhhhhhhhhhcccccccccccccccccceeeeeecccccccceeeeeccccccc 

COILS 

MEM 

SEQ PVSAFHVSNRKDGFCISGSITAKS RFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCK 

SEG 

PRD cceeeeecccccccccccccccccccccccccccccceeeeeeccccceeeeccccceee 

COILS 

MEM 

SEQ GILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLS 

SEG 

PRD eeeeeccccccccceeeeeccccccceeeecccccccccccceeeeeccccccceeeeec 

COILS 

MEM 

SEQ GHWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISK 

SEG 

PRD cccccceeeeeccccceeeeeccccccccccccccccccchhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ GDQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKD 

SEG xxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccchh 

COILS 

MEM 

SEQ IAQFEQDGILRTLQQEAVARQTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSGS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhcoccccccccccchhhhhcccccccccccccccccc 

COILS 

MEM 

SEQ TPESCPELSDEEQDGDFVPGGESPCPRCRKEARRLQALHEAILSIREAQQELHRHLSAML 

SEG 

PRD ccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ SSTARAAQAPTPGLLQSPRSWFLLCVFLACQLFINHILK 

SEG 

PRD hhhhhhhcccccccccccceeeeehhhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphutel JL9hl7 . 3 



PS00001 


80->84 


PS00001 


291->295 


PS00001 


367->371 


PS00004 


9->13 


PS00004 


26->30 


PS00004 


95->99 


PS00004 


111->115 


PS00004 


338->342 


PS00O04 


762->766 


PS00005 


82->85 


PS00005 


90->93 


PS00005 


94->97 


PS00005 


98->101 


PS00005 


132->135 


PS00005 


138->141 


PS00005 


159->162 


PS00005 


181->184 


PS00005 


252->255 



ASN_GLYCOS YLATION 
ASN_GLYCOSYLATION 
AS N_GL YCOS YLAT I ON 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
CAMP_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKCPHOSPHOSITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S ITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOCQ0005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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"site 


PDOC00005 
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586 


->589 


PKC PHOSPHO" 
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PS00005 


843- 


->846 


PKC PHOSPHO 


"site 


PDOC00005 
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PDOC00005 
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PROKAR LIPOPROTEIN 
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OSBP 




PDOC0077 4 



Pfam for DKFZphutel_19hl7 . 3 



HMM_NAME PH (pleckstrin homology) domain 

HMM *dvIREGWMyKWgswrkstgnWqrRWFvLrndpnrLiYYkddkdekPrYM 
+VI+ +++++G + W + W+VL++ ++L+ YK + + + ++ 

Query 126 VVIMADSLKIRGTLKS WTKLWCVLKP — GVLLI YKTP-KVGQWVG 167 

HMM lldldcWrMidVEidWmmdndHCFilWtrq 

L+C+ +1+ ++ ++ +CF+++ + 
Query 168 TVLLHCCELIERPSKKD GFCFKLFHPLDQSVWAVKGPKGESVGSITQ 214 

HMM .... rtYYFQAeNeEEMmeWMsalrRalw* 

+ ++F+A++E++ + W++A++ A++ 
Query 215 PLPSSYLI FRAASES DGRCWLDALELALR 243 
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DKFZphutel_19jll 



group: uterus derived 

DKFZphutel 19jll encodes a novel 708 amino acid protein with C-terminal similarity to several 
known proteins, such as human KIAA0231 or murine ras binding protein Sur8 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

Strong similarity to KIAA0231, similarity to ras binding protein Sur8 

EST AA854189 extendes the sequence (294 Bp), with this sequence 
complete cDNA, 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2343 bp 

Poly A stretch at pos . 2323, polyadenylation signal at pos . 2295 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 



GCTCCTGCTA 
GGACCTTCAG 
TCCACTGGTA 
GTCTTTATGC 
CAAAATAGAA 
GGACCACACG 
GACAACAGGA 
AGAAGGCAGC 
TTGTAGTTGA 
CAGGCTAAGG 
AGAAGGTGAT 
TTATCAAATT 
GTCCAGTTTA 
TAAAAACTTT 
CCTTTTGCTA 
ACCTTATACT 
TGTCCGTCAG 
TTGCTTTTAT 
AGATTTGCAG 
GAACTTAAAT 
CAAATGCCCA 
CCAGACACTG 
CATTAAGAAC 
AAGAGCTCTC 
TCTTTCCTGA 
GAGGGAACTC 
ACCTAGTTGG 
TCTCTGCGGG 
TTCCAAAATC 
TGTGCATACA 
AAGATGACCA 
TATTCCTCAT 
AGGAAAACAA 
AGAAAGTTGA 
AGAGCATATA 
ATAAAATAGA 
TACTTGGACT 
AGTTCTACAA 
GCCTTCCAGA 
GGAAAAAACA 
TCTTTCCTAC 
AACTGGGTGA 
GCTCTGTTTG 
ATAACTTATT 
AGTATAAATA 
TCACACAAAA 
AAATTTAATT 



ACCCCATCAC 
CAGTACAGCT 
TGCCAAGTAT 
TCTGCAGTAA 
CATTTCATCT 
GGCTTTATCT 
AGAACAACAT 
CTGGTCAACT 
TAAATCCACT 
CCTTATTTGA 
ATTCTATATG 
CCTAATCATC 
CAGTGGACTG 
TCTTGCAATC 
TCTGTGCTTT 
GGCTGTTCTA 
GAGACTGGAA 
GCTTCATATG 
TGTTCCTGTC 
AACGAATGGA 
TAATCGACTG 
TTTTTGAAAT 
GTAATGATAC 
TCTGCACCAG 
AGGAAAACCT 
CCCCCCTGGA 
CTCTCTAAGT 
ATCTCAAAAG 
CCTCAGGCAG 
TAATGATGGC 
ATCTGACAGA 
GCTGTGTTCA 
TCTGAAATCT 
CAGTGCTAAA 
AAGAAACTCA 
GGTGCTGCCT 
TATCGTACAA 
AGTTTACAGT 
TGAACTCTAC 
GCCTATCTGT 
TTAGATGTAA 
CTGTCGGGCT 
AAACTCTGCC 
TTTCGTTAAA 
ATTAGGTAGT 
TGTACACAAA 
GTATTTTTTC 



TGTGGAAATG 
TTATAAATCA 
TTCCCTTACC 
CTTTTGGTTC 
CCATTCTGGG 
GAAGTGTCTG 
GAACAGGTCC 
CTCAGTCTTT 
GCAGGGGCTC 
GAAGGTGAAG 
CCATGTATGT 
ATTGCATATA 
TAATGTGGAC 
ATACCATGGC 
GTTAGTATCT 
CCGTTCTCTA 
TTGATGATAT 
ATAGATCAGT 
TGAAGTCAGT 
CTCCTGATAA 
GAATTGCCTC 
CACAGAGTTG 
CAGCCACCAT 
TGTTCTGTCA 
CAAGGTCTTG 
TGTATGGGCT 
CATGATATTT 
CCTTAAAATT 
TGGTTGATGT 
ACCAAGCTGG 
GCTGGAGCTG 
GCCTACTCAG 
ATAGAAGAAA 
ACTGTGGCAT 
CCAGCCTGGA 
TCCCACCTCT 
TGACATTCGA 
ATTTTTCCAT 
TTCTGCAAGA 
ACTTTCACCG 
AAGGTAATCA 
CTGAAGCGAG 
TTCTGACGTC 
GTTTGACTGA 
CTTAATGCCT 
GATCGCGTAA 
AATATTAAAA 



AAAGGCCTGA 
GATGTGTTAT 
TTGTCCTCAT 
AAATTCCCTG 
GAAGTGTTTT 
CGGAGGACTC 
AACACCA7CC 
AAAGTCCATT 
TGGATAAAAA 
AAGTTCAGGC 
TCGCCAGACT 
ATAGTGCTCT 
ATTCAGGACA 
ACACTTGTTC 
ATGGATTGAC 
CGGGAATATT 
TCCAGATGTG 
ATGACCCTCT 
GAAAACAAAT 
ACTGAGGCAG 
T7ATCATGCT 
CAATCTCTAA 
TGCACAGCTA 
AAATCCACAG 
AGCGTCAAGT 
CCGAAATCTG 
CCAGAAATGT 
CTCTCTATCA 
TTCCAGCCAT 
TGATGCTCAA 
GTCCACTGTG 
CCTCCAGGAA 
TCGTTAGCTT 
AACAGCATCA 
ACGCCTGTCC 
TCCTATGCAA 
TTTATCCCCC 
CACATGTAAC 
AACTTAAAAC 
AAAATTGGAA 
CTTTGAAATC 
CTGGTTTAGT 
CGGGAGCAAA 
AACACGCTTC 
TTCCTATTTT 
GGAGTATGTA 
AAAAAAAAAA 



AGACAGATTT 
GAGCGAGCCC 
CCATACCCTG 
GTTCCAGCTC 
GACTCTCC7T 
AGAAGAAAAG 
AATCTGGTCC 
CCTGAGAAGT 
GGAAGGTGAG 
TGCATGTGGA 
GTACTTAAAG 
GGTTTCCAAG 
TGAC7GGA7A 
TCAAAACTGT 
GTGCCTTTAT 
CCT7TGAGTA 
AAAAAT G AC 7 
CTATTCCAAG 
TAAAGCAGCT 
AAGCTACAGA 
CTCTGGCCTT 
AACTTGAAAT 
GACAATCTTC 
TGCGGCGCTC 
TTGATGACAT 
GAAGAGCTGT 
CACCCTTGAG 
AAAGCAACGT 
CTCCAGAAGA 
CAACTTAAAG 
ACCTGGAGCG 
T7GGACCTGA 
TCAGCACTTA 
CCTACATCCC 
TTTAGTCACA 
CAAGATCCGA 
CTGAAATTGG 
AAAGTGGAAA 
TCTGAAGATT 
ATTTGCTATT 
CTCCCTCCTG 
TGTAGAAGAT 
TGAAAACAGA 
TACCAAATAC 
TTTTTCCTTT 
TTTTTAATAA 
AAA 



BLAST Results 



No BLAST result 
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Medline entries 



96421675: 

Characterization of densin-180, a new brain-specific synaptic protein 

of the 

O-sialoglycoprotein family. 
98337190: 

SUR-8, a conserved Ras-binding protein with leucine-rich 
repeats, positively regulates Ras-mediated signaling in C. 
elegans . 



Peptide information for frame 1 



ORF from 28 bp to 2151 bp; peptide length: 708 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MKGLKTDLDL QQYSFINQMC YERALHWYAK YFPYLVLIHT LVFMLCSNFW 

51 FKFPGSSSKI EHFISILGKC FDSPWTTRAL SEVSGEDSEE KDNRKNNMNR 

101 SNTIQSGPEG SLVNSQSLKS IPEKFVVDKS TAGALDKKEG EQAKALFEKV 

151 KKFRLHVEEG DILYAMYVRQ TVLKVIKFLI IIAYNSALVS KVQFTVDCNV 

201 DIQDMTGYKN FSCNHTMAHL FSKLSFCYLC FVSIYGLTCL YTLYWLFYRS 

251 LREYSFEYVR QETGIDDIPD VKNDFAFMLH MIDQYDPLYS KRFAVFLSEV 

301 SENKLKQLNL NNEWTPDKLR QKLQTNAHNR LELPLIKLSG LPDTVFEITE 

351 LQSLKLEIIK NVMI PAT I AQ LDNLQELSLH QCSVKIHSAA LSFLKENLKV 

401 LSVKFDDMRE LPPWMYGLRN LEELYLVGSL SHDISRNVTL ESLRDLKSLK 

451 ILSIKSNVSK IPQAVVDVSS HLQKMCIHND GTKLVMLNNL KKMTNLTELE 

501 LVHCDLERIP HAVFSLLSLQ ELDLKENNLK SIEEIVSFQH LRKLTVLKLW 

551 HNSITYIPEH IKKLTSLERL SFSHNKIEVL PSHLFLCNKI RYLDLSYNDI 

601 RFIPPEIGVL QSLQYFSITC NKVESLPDEL YFCKKLKTLK IGKNSLSVLS 

651 PKIGNLLFLS YLDVKGNHFE ILPPELGDCR ALKRAGLVVE DALFETLPSD 
701 VREQMKTE 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_19jll, frame 1 

TREMBL : HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, 
partial cds . , N = 1 , Score = 1408, P = 4.5e-144 

TREMBL: AF054827_1 gene: "soc-2"; product: "leuctne-rich repeat protein 
SOC-2"; Caenorhabditis elegans leucine-rich repeat protein SOC-2 
(soc-2) mRNA, complete cds., N = 1, Score = 304, P = 5.7e-24 

TREMBL :RNU66707_1 product: "densin-180"; Rattus norvegicus densin-180 
mRNA, complete cds., N = 1, Score = 311, P = 7.4e-24 

TREMBL:AF068921_1 product: "Ras-binding protein SUR-8"; Mus musculus 
Ras-binding protein SUR-8 mRNA, complete cds., N = 1 , Score - 302, P = 
l.le-23 



>TREMBL:HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial 
cds . 

Length = 476 



HSPs : 



Score = 1408 (211.3 bits), Expect = 4.5e-144, P = 4.5e-144 
Identities = 265/471 (56%), Positives = 361/471 (76%) 

Query: 237 LTCLYTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMIDQYDPLYSKRFAVF 296 

LT Y+L+W+ SL++YSFE +R+++ DIPDVKNDFAF+LH+ DQYDPLYSKRF++F 
Sbjct: 1 LTSSYSLWWMLRSSLKQYSFEALREKSNYSDIPDVKNDFAFILHLADQYDPLYSKRFSIF 60 

Query: 297 LSEVSENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKL 356 

LSEVSENKLKQ+NLNNEWT +KL+ KL NA +++EL L ML+GLPD VFE+TE++ L L 
Sbjct: 61 LSEVSENKLKQINLNNEWTVEKLKSKLVKNAQDKIELHLFMLNGLPDNVFELTEMEVLSL 120 
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Query: 357 EIIKNVMIPATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMY 416 

E+I V +P+ ++QL NL+EL ++ S+ + AL+FL+ENLK+L +KF +M ++P W++ 
Sbjct: 121 ELIPEVKLPSAVSQLVNLKELRVYHSSLVVDHPALAFLEENLKILRLKFTEMGKIPRWVF 180 

Query: 417 GLRNLEELYLVGSLSHDISRNVTLESLRDLKSLKILSIKSNVSKIPQAVVDVSSHLQKMC 476 

L+NL+ELYL G + + + LE +DLK+L+ L +KS++S+IPQ V D+ LQK+ 

Sbjct: 181 HLKNLKELYLSGCVLPEQLSTMQLEGFQDLKNLRTLYLKSSLSRIPQVVTDLLPSLQKLS 240 

Query: 477 IHNDGTKLVMLNNLKKMTNLTELELVHCDLERIPHAVFSLLSLQELDLKENNLKSIEEIV 536 

+ N+G+KLV+LNNLKKM NL LEL+ CDLERIPH++FSL +L ELDL+ENNLK++EEI+ 
SbjCt: 241 LDNEGSKLVVLNNLKKMVNLKSLELISCDLERIPHSI FSLNNLHELDLRENNLKTVEEI I 300 

Query: 537 SFQHLRKLTVLKLWHNSITYIPEHI KKLTSLERLSFSHNKIEVLPSHLFLCNKI RYLDLS 596 

SFQHL+ L+ LKLWHN+I YIP I L++LE+LS HN IE LP LFLC K+ YLDLS 
SbjCt: 301 SFQHLQNLSCLKLWHNNIAYIPAQIGALSNLEQLSLDHNNIENLPLQLFLCTKLHYLDLS 360 

Query: 597 YNDIRFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNL 656 

YN + FIP EI L +LQYF++T N +E LPD L+ CKKL+ L +GKNSL LSP +G L 
Sbjct: 361 YNHLTFIPEEIQYLSNLQYFAVTNNNIEMLPDGLFQCKKLQCLLLGKNSLMNLSPHVGEL 420 

Query: 657 LFLSYLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKT 707 

L++L++ GN+ E LPPEL C++LKR L+VE+ L TLP V E+++T 
Sbjct: 421 SNLTHLELIGNYLETLPPELEGCQSLKRNCLIVEENLLNTLPLPVTERLQT 471 



Pedant Information for DKFZphutel_19jll, frame 1 



Report for DKFZphutel_19 j 11 . 1 



[LENGTH] 
[MW] 
[pl] 
[HOMOL] 

le-149 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

YJL005w] 3e-17 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

palmitylation, 

[ FUNCAT] 

[FUNCAT] 

9e-08 

[FUNCAT] 

9e-08 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



708 

81812.82 
7.55 

TREMBL : HSD984 



1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial cds . 



30.02 organization of plasma membrane [S. cerevisiae, YJL005w] 3e-17 

03.22 cell cycle control and mitosis [S. cerevisiae, YJLOOSw] 3e-17 

10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-17 
01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

03.10 sporulation and germination [S. cerevisiae, YJLOOSw] 3e-17 
30.10 nuclear organization [S. cerevisiae, YKL193c] 3e-09 
06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YKL193c] 3e-09 

04.05.01.04 transcriptional control [S. cerevisiae, YAL021c] 9e-08 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YAL021c] 



01.01.04 regulation of amino-acid metabolism 



[S. cerevisiae, YAL021c] 



[S. cerevisiae, YOR353c] 3e-07 



99 unclassified proteins 
BL00868F 

BL00985B Spermadhesins family proteins 
3.4.17.3 Lysine carboxypeptidase le-08 
4.6.1.1 Adenylate cyclase 3e-18 
blocked amino end le-10 
phosphotransferase le-09 
nucleus 6e-08 
duplication 3e-18 
platelet le-10 
tandem repeat 7e-16 
keratan sulfate 7e-07 
metallo-carboxypeptidase le-08 
transmembrane protein le-10 

serine/threonine-specif ic protein kinase le-09 

autophosphorylation le-09 

cartilage 7e-07 

connective tissue 7e-07 

magnesium le-09 

cAMP biosynthesis 3e-18 

ATP le-09 

receptor le-09 

leucine zipper 3e-13 

glycoprotein 5e-12 

extracellular matrix 7e-07 

chondroitin sulfate proteoglycan 7e-07 

cell adhesion le-08 

hydrolase le-08 

sulfoprotein 7e-07 

membrane protein le-08 

phosphorus-oxygen lyase 3e-18 
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[PIRKW] collagen binding 7e-07 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology 3e-21 

[SUPFAM] chaoptin le-08 

[SUPFAM] gelsolin repeat homology 3e-21 

[SUPFAM] protein kinase homology le-09 

[SUPFAM] protein kinase Xa21 le-09 

[SUPFAM] fibromodulin 4e-12 

[SUPFAM] yeast adenylate cyclase catalytic domain homology 3e-18 

[SUPFAM] yeast adenylate cyclase 3e-18 

[KW] TRANSMEMBRANE 3 

[KWJ LOW_COMPLEXITY 1.41 % 

SEQ MKGLKTDLDLQQYSFINQMCYERALHWYAKYFPYLVLIHTLVFMLCSNFWFKFPGSSSKI 

SEG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccceeeeccccccee 
MEM MMMMMMMMMMMMMMMMM 



SEQ EHFI SI LGKCFDSPWTTRALSEVSGEDSEEKDNRKNNMNRSNT I QSGPEGSLVNSQSLKS 

SEG 

PRD eeeeBeeecccccccceeeeecccccccccccccccccccccccccccccceeeeccccc 

MEM 

SEQ IPEKFVVDKSTAGALDKKEGEQAKALFEKVKKFRLHVEEGDILYAMYVRQTVLKVIKFLI 

SEG 

PRD cccceeecccccccccchhhhhhhhhhhhhhhhhhhhcccceeeehhhhhhhhhhhhhhh 

MEM MMMMMMMMM 

SEQ IIAYNSALVSKVQFTVDCNVDIQDMTGYKNFSCNHTMAHLFSKLSFCYLCFVSIYGLTCL 

SEG 

PRD hhhhcchhhhheeeeeccccccccccccccccccchhhhhhhhheeeeeeeeeeccceee 

MEM MMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ YTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMI DQYDPLYSKRFAVFLSEV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccchhhhhhhhhhhhh 

MEM 

SEQ SENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKLEIIK 

SEG . .xxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhh 

MEM 

SEQ NVMIPATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMYGLRN 

SEG 

PRD hccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhccccccccccccchhhh 

MEM 

SEQ LEELYLVGSLSHDISRNVTLESLRDLKSLKILSIKSNVSKIPQAVVDVSSHLQKMCIHND 

SEG 

PRD hhhhhhccccccccccccccchhhhhhhhhhhhcccccccccccchhhhhhhhhhhcccc 

MEM 

SEQ GTKLVMLNNLKKMTNLTELELVHCDLERI PHAVFSLLSLQELDLKENNLKSIEEI VSFQH 

SEG 

PRD ceeeecccccccchhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccccch 

MEM 

SEQ LRKLTVLKLWHNSITYIPEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLSYNDI 

SEG 

PRD hhhhhhhcccccceeecccccchhhhhheeeccccceeecccccchhhhhhhhhhccccc 

MEM 

SEQ RFIPPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNLLFLS 

SEG 

PRD cccccccchhhhhhhhhhhccccccccccccchhhhhcccccccceeecccccccchhhh 

MEM 

SEQ YLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKTE 

SEG 

PRD hhhccccccccccccchhhhhhhhheeeeccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_19 j 11 . 1) 
(No Pfam data available for DKFZphutel_19 j 11 . 1 ) 
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DKFZphutel_li2 



group: transcription factor 

DKFZphutel_li2 encodes a novel 594 amino acid protein similar to signal transducing proteins. 

The protein contains 2 WD-40 repeats, which is typical for the beta-transducin subunit of G- 
proteins. In addition, the protein contains a C3HC4 zinc finger and a leucine zipper. The beta 
subunits seem to be required for the replacement of GDP by GTP as well as for membrane 
anchoring and receptor recognition. Due to the zinc finger the novel protein seems to be a new 
molecule involved in signal transduction and transcription. 

The new protein can find application in modulating/blocking gene expression of genes 
controlled by this molecule. 



similarity to Dictostelium myosin heavy chain kinase 

complete cDNA, complete cds, EST hits 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[PFAM] WD domain, G-beta repeats 

[SCOP] dltbgc_ 2.46.3.1.1 betal-subunit of the 

signal-transducing G protei 3e-07 

Sequenced by BMFZ 

Locus: /map="16pl3 . 3" 

Insert length: 3584 bp 

Poly A stretch at pos. 3555, polyadenylation signal at pos. 3537 



1 GGGCGGGAGG TGCTTCCCAA GGACCGTAGA TGCCTCTCTA GAGCATGAGC 
51 TCAGGCAAGA GTGCCCGCTA CAACCGCTTC TCCGGGGGGC CCAGCAATCT 
101 TCCCACCCCA GACGTCACCA CAGGGACCAG AATGGAAACG ACCTTCGGAC 
151 CCGCCTTTTC AGCCGTCACC ACCATCACAA AAGCTGACGG GACCAGCACC 
201 TACAAGCAGC ACTGCAGGAC AGCATGCCCC CCATCAGCAC TCCCCGCCGC 
251 TCCGACTCCG CCATCTCTGT CCGCTCCCTG CACTCAGAGT CCAGCATGTC 
301 TCTGCGCTCC ACATTCTCAC TGCCCGAGGA GGAGGAGGAG CCGGAGCCAC 
351 TGGTGTTTGC GGAGCAGCCC TCGGTGAAGC TGTGCTGTCA GCTCTGCTGC 
401 AGCGTCTTCA AAGACCCCGT GATCACCACG TGTGGGCACA CGTTCTGTAG 
4 51 GAGATGCGCC TTGAAGTCAG AGAAGTGTCC CGTGGACAAC GTCAAACTGA 
501 CCGTGGTGGT GAACAACATC GCGGTGGCCG AGCAGATCGG GGAGCTCTTC 
551 ATCCACTGCC GGCACGGCTG CCGGGTAGCG GGCAGCGGGA AGCCCCCCAT 
601 CTTTGAGGTG GACCCCCGAG GGTGCCCCTT CACCATCAAG CTCAGCGCCC 
651 GGAAGGACCA CGAGGGCAGC TGTGACTACA GGCCTGTGCG GTGTCCCAAC 
701 AACCCCAGCT GCCCCCCGCT GCTCAGGATG AACCTGGAGG CCCACCTCAA 
751 GGAGTGCGAG CACATCAAAT GCCCCCACTC CAAGTACGGG TGCACGTTCA 
801 TCGGGAACCA GGACACTTAC GAGACCCACC TGGAGACTTG CCGCTTCGAG 
851 GGCCTGAAGG AGTTTCTGCA GCAGACGGAT GACCGCTTCC ACGAGATGCA 
901 CGTGGCTCTG GCCCAGAAGG ACCAGGAGAT CGCCTTCCTG CGCTCCATGC 
951 TGGGAAAGCT CTCGGAGAAG ATCGACCAGC TAGAGAAGAG CCTGGAGCTC 
1001 AAGTTTGACG TCCTGGACGA AAACCAGAGC AAGCTCAGCG AGGACCTCAT 
1051 GGAGTTCCGG CGGGACGCAT CCATGTTAAA TGACGAGCTG TCCCACATCA 
1101 ACGCGCGGCT GAACATGGGC ATCCTAGGCT CCTACGACCC TCAGCAGATC 
1151 TTCAAGTGCA AAGGGACCTT TGTGGGCCAC CAGGGCCCTG TGTGGTGTCT 
12 01 CTGCGTCTAC TCCATGGGTG ACCTGCTCTT CAGTGGCTCC TCTGACAAGA 
1251 CCATCAAGGT GTGGGACACA TGTACCACCT ACAAGTGTCA GAAGACACTG 
1301 GAGGGCCATG ATGGCATCGT GCTGGCTCTC TGCATCCAGG GGTGCAAACT 
1351 CTACAGCGGC TCTGCAGACT GCACCATCAT TGTGTGGGAC ATCCAGAACC 
1401 TGCAGAAGGT GAACACCATC CGGGCCCATG ACAACCCGGT GTGCACGCTG 
14 51 GTCTCCTCAC ACAACGTGCT CTTCAGCGGC TCCCTGAAGG CCATCAAGGT 
1501 CTGGGACATC GTGGGCACTG AGCTGAAGTT GAAGAAGGAG CTCACAGGCC 
1551 TCAACCACTG GGTGCGGGCC CTGGTGGCTG CCCAGAGCTA CCTGTACAGC 
1601 GGCTCCTACC AGACAATCAA GATCTGGGAC ATCCGAACCC TTGACTGCAT 
1651 CCACGTCCTG CAGACGTCTG GTGGCAGCGT CTACTCCATT GCTGTGACAA 
1701 ATCACCACAT TGTCTGTGGC ACCTACGAGA ACCTCATCCA CGTGTGGGAC 

17 51 ATTGAGTCCA AGGAGCAGGT GCGGACCCTC ACGGGCCACG TGGGCACCGT 
1801 GTATGCCCTG GCGGTCATCT CGACGCCAGA CCAGACCAAA GTCTTCAGTG 

18 51 CATCCTACGA CCGGTCCCTC AGGGTCTGGA GTATGGACAA CATGATCTGC 
1901 ACGCAGACCC TGCTGCGTCA CCAGGGCAGT GTCACCGCGC TGGCTGTGTC 
1951 CCGGGGCCGA CTCTTCTCAG GGGCTGTGGA TAGCACTGTG AAGGTTTGGA 
2001 CTTGCTAACA GGATCCAGGC CAGGCTGTGG TTTCCCCTGA ACCAGCCCTG 
2051 GACCTTTCTG AGCCAGGCTG GCCACATGGG GTGGTCTCGG GGTTTCTGCC 
2101 TGCCCCGTGG GCATAGGTGG ACAGGCTCTG GCAGCCGGGC AGTGCCCTCC 
2151 CCGTCCCATG CTCGGCGAGC CTCCCTCTAC TCGGCACTGT CCTTGCTGCC 
2201 CAGCCCCTCT CTGGGTGCCA GGTACGACGC TTGCCCCGGC CCACCCTCCA 
2251 TCCCCACCCT CCATCCCCAC CCTAGATGGA GCGAGGGCCT TTTTACTCAC 
2301 CTTTTCTACC GTTTTTAGAC TGTATGTAGA TTTGGTTACC TCCTGGTTGA 
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2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 



AATAAATGCT 
AAGGGGGCTG 
GTGAGTGGGG 
GCCCACTCCG 
AGCTGCTGGC 
TGAGCCAGGC 
CCTTGCCCAG 
CTCTGAGGAG 
ACACGGGGTG 
CGCCAGCCGC 
TTTTAAATTT 
TCAGCAAACA 
AGGAGAGAGG 
CCATGAGCAG 
GCACAGCCCC 
CCGTGGCCTG 
CACACCCACA 
GGAGGAGGAC 
CGCAGAGAAC 
CCCCCGGGCC 
TGGCCGGAGG 
GTCCGGAGCT 
AATCAATAAT 
TTTGTTTCTC 
TCTTGATAAA 



CCACAGACTG 
TGTGTGGCCT 
GGGCATGGGG 
GGGCCTCCCC 
CTCCAGTCCC 
ACCTCTGTTT 
ACCTCCCCTG 
AGGCCTGGGG 
AGACAGCAGG 
CTCCACCCGC 
TTTTTTTAAG 
CGATAGAGGA 
AAAAGGGAGG 
AAGCGTCCGT 
TGGAGAGGGG 
GCCTGCTACA 
TTCACCAAAC 
ACGGCCGCCG 
TTAGGAGAGA 
CCAGCCTTCC 
AAGGACCGCA 
AGACTTCGTG 
ATTTCTTTCT 
TGGGGAAATC 
AAAAAAAAAA 



TGGCTGTGAG 
TGAGGTTGGT 
CAGTTTCCTT 
TCCCTGCTAG 
ATCTCCCCCA 
CCTGCTGTTT 
CCCACCTGCT 
GGACAGCTGG 
AAGGGGCCCT 
CCCACACCAC 
AAACGTCAAA 
GACCAGTCAG 
GCGAGAATGA 
GGGAACTCCA 
GCCAGGCACA 
TGCCCTGCTT 
CCACCCGCGC 
AGAGCAAGGC 
AGCACGGAGG 
ACCTGTGCTA 
GGCAGACAGC 
TCCTTTCAGT 
TTAAATATAT 
CGCCTCAGCT 
AGAAAAAAAA 



TGGGGACAGC 
GTGCACAGGC 
TGGTGGACCC 
GAGGCAACTC 
ACACATGTGC 
ATTGACAGCC 
GGAGCCCAGC 
GCACGTCCAC 
GCACGCCGGG 
AATCGCTGGT 
GTTGTGCCCA 
TACTTCTTGG 
CCACACAACA 
CTGGGGTGGA 
CCCTCAGAGG 
CCACGTGGCT 
CCTGGGACGC 
ACAACCTCGA 
AGCCCCCGGC 
GCAGCCTGGG 
CTGGGCCTCT 
TGGTAAATGG 
ATTTGTTAAA 
CATTCCCAAT 
AAAA 



TCCTCGGGAC 
ACTGGCTGCT 
CAGGACTTCG 
GTCACACCCA 
CCCCAAAAAG 
GACGGCAGCG 
CTGTGCCGCC 
TCGCAGGGAA 
ACGCCACCTC 
TTTCGGCATT 
ACACTGTGGA 
AGGGGGCAGG 
CAGCCTTGGA 
TGGGCTGCCT 
AGCTGCAAGC 
GCCACGCTGA 
AGCCACGCCA 
GTTCTTGGGG 
AGAGCACCCG 
GCCTCCACTC 
AACAGCTTTT 
TTTTCTATAG 
GTTATACCTT 
AAATTAATAC 



BLAST Results 



Entry HSBE from database EMBL: 

Homo sapiens (clone exon trap d5) chromosome 16pl3.3 gene, exon. 
Score = 2375, P = 7.1e-101, identities = 475/475 

Entry HSBD from database EMBL: 

Homo sapiens (clone exon trap d32) chromosome 16pl3.3 gene, exon. 
Score = 876, P = 3.0e-31, identities = 176/177 



Medline entries 



95122486: 

Structural analysis of myosin heavy chain kinase A from 
Dictyostelium. Evidence for a highly divergent protein kinase 
domain, an amino-terminal coiled-coil domain, and a 
domain homologous to the beta-subunit of heterotrimeric G 
proteins . 

96149460: 

Dictyostelium myosin heavy chain kinase A regulates myosin localization 
during growth and 
development. 

97277316: 

Identification of a protein kinase from Dictyostelium with homology to 
the novel catalytic 

domain of myosin heavy chain kinase A. 

96009891: 

A gene responsible for vegetative incompatibility in the fungus 
Podospora anserina encodes a 

protein with a GTP-binding motif and G beta homologous domain. 



Peptide information for frame 2 



ORF from 224 bp to 2005 bp; peptide length: 594 

Category: similarity to known protein 

Prosite motifs: ZINCFINGERC3HC4 (70-80) 

LEUCINE_ZIPPER (436-458) 

LEUCINE_ZIPPER (436-458) 

G_BETA_REPEATS (335-355) 

G BETA REPEATS (376-391) 
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1 MPPISTPRRS DSAISVRSLH SESSMSLRST FSLPEEEEEP EPLVFAEQPS 

51 VKLCCQLCCS VFKDPVITTC GHTFCRRCAL KSEKCPVDNV KLTVWNNIA 

101 VAEQIGELFI HCRHGCRVAG SGKPPIFEVD PRGCPFTIKL SARKDHEGSC 

151 DYRPVRCPNN PSCPPLLRMN LEAHLKECEH IKCPHSKYGC TFIGNQDTYE 

201 THLETCRFEG LKEFLQQTDD RFHEMHVALA QKDQEIAFLR SMLGKLSEKI 

251 DQLEKSLELK FDVLDENQSK LSEDLMEFRR DASMLNDELS HINARLNMGI 

301 LGSYDPQQIF KCKGTFVGHQ GPVWCLCVYS MGDLLFSGSS DKTIKVWDTC 

351 TTYKCQKTLE GHDGIVLALC IQGCKLYSGS ADCTIIVWDI QNLQKVNTIR 

401 AHDNPVCTLV SSHNVLFSGS LKAIKVWDIV GTELKLKKEL TGLNHWVRAL 

451 VAAQSYLYSG SYQTIKIWDI RTLDCIHVLQ TSGGSVYSIA VTNHHIVCGT 

501 YENLIHVWDI ESKEQVRTLT GHVGTVYALA VISTPDQTKV FSASYDRSLR 

551 VWSMDNMICT QTLLRHQGSV TALAVSRGRL FSGAVDSTVK VWTC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_li2 , frame 2 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK 
B)., N = 1, Score = 419, P = 3.6e-37 

SWISSPROT:HET1_PODAN VEGETATIBLE INCOMPATIBILITY PROTEIN HET-E-1., N = 
1, Score = 392, P = 3.1e-33 

SWISSPROT:YDJ5_SCHPO HYPOTHETICAL 67.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN C57A10.05C IN CHROMOSOME I., N = 1, Score = 357, P = 4.1e-30 

TREMBL : AF032 87 8_1 gene: "slimb"; product: "Slimb"; Drosophila 
melanogaster Slimb (slimb) mRNA, complete cds., N = 1, Score = 347, P 
1.7e-29 



>SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B) . 
Length = 732 

HSPs : 



Score 


= 419 


(62.9 bits), Expect - 3.6e-37, P - 3.6e-37 




Identities = 


= 96/268 (35%), Positives = 158/268 (58%) 




Query: 


32 5 


CLCVYSMGDLLFSGSSDKTIKVWD-TCTTYKCQKTLEGHDGIVLALCIQGCKLYSGSADC 


383 






C+C +LLF+G SD +I+V+D +C +TL+GH+G V ++C L+SGS+D 




Sbjct: 


467 


CIC DNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESICYNDQYLFSGSSDH 


522 


Query: 


334 


TIIVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGSL-KAIKVWDIVGTELKLKKELTG 


442 






+1 VWD++ L+ + T+ HD PV T++ + LFSGS K IKVWD+ L+ K L 




Sbjct: 


523 


S IKVWDLKKLRCI FTLEGHDKPVHTVLLNDKYLFSGSSDKTI KVWDL- -KTLECKYTLES 


580 


Query: 


443 


LNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIKVLQTSGGSVYSIAVTNHHIVCGTY 


501 






V+ L + YL+SGS +TIK+WD++T C + L+ V +1 + ++ G+Y 




Sbjct: 


581 


HARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVTTICILGTNLYSGSY 


640 


Query: 


502 


ENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFSASYDRSLRVWSMDNMICTQ 


561 






+ I VW+++S E TL GH V + + D+ +F+AS D ++++W ++ + C 




Sbjct: 


641 


DKTIRVWNLKSLECSATLRGHDRWVEHMVIC DKL-LFTASDDNTIKIWDLETLRCNT 


696 


Query: 


562 


TLLRHQGSVTALAVSRGR — LFSGAVDSTVKVW 592 








TL H +V LAV + + S + D +++VW 




Sbjct: 


697 


TLEGHNATVQCLAVWEDKKCVISCSHDQSIRVW 729 




Score 


= 415 


(62.3 bits). Expect = 1.2e-36, P = 1.2e-36 




Identities = 


= 113/303 (37%), Positives = 166/303 (54%) 




Query: 


255 


KSLEL-KFDVLDENQSKLSEDLMEFRRDASMLNDEL-3HINARLNMGILGS YD 


305 






KS++L K ++L N+ K S +L + ++ + SH+ N+ G YD 




Sbjct: 


427 


KSIDLEKPEILINNKKKESINLETIKLIETIKGYHVTSHLCICDNLLFTGCSDNSIRVYD 


486 


Query: 


306 


-PQQIFKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDG 


364 






Q +C T GH+GPV +C Y+ LFSGSSD +IKVWD +C TLEGHD 




Sbjct: 


487 


YKSQNMECVQTLKGHEGPVESIC-YN-DQYLFSGSSDHSIKVWDL-KKLRCIFTLEGHDK 


543 


Query: 


365 


IVLALCIQGCKLYSGSADCTI IVWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGSL-KA 


423 






V + + L+SGS+D TI VWD++ L+ T+ +H V TL S LFSGS K 




Sbjct: 


544 


PVHTVLLNDKYLFSGSSDKTIKVWDLKTLECKYTLESHARAVKTLCI SGQYLFSGSNDKT 


603 


Query: 


424 


IKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKIWDIRTLDCIHVLQTS 


482 






IKVWD+ + L G WV + + LYSGSY +TI++W++++L+C L+ 




Sbjct: 


604 


IKVWDL--KTFRCNYTLKGHTKWVTTICILGTNLYSGSYDKTIRVWNLKSLECSATLRGH 


661 
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483 


GGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFS 


542 






V+ + ++ ++NI +WD+E+ TL GH TV LAV D+ V S 




Sbjct: 


662 


DRWVEHMVICDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWE--DKKCVIS 


719 


Query: 


543 


ASYDRSLRVW 552 








S+D+S+RVW 




Sbjct: 


720 


CSHDQSIRVW 729 




Score 


= 2S2 


(39.3 bits), Expect = 3.2e-19, P = 3.2e-19 




Identities = 


= 60/184 (32%), Positives = 109/184 (59%) 




Query: 


352 


TYKCQKTLEGHDGIVLALCIQGCKLYSGSADCTIIVWDI — QNLQKVNT I RAHDNPVCTL 


409 






T K +T++G+ + LCI L++G +D +1 V+D QN++ V T++ H+ PV ++ 




Sbjct: 


450 


TIKLIETIKGYH-VTSHLCICDNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESI 


508 


Query: 


410 


VSSHNVLFSGSLK-AIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKI 


4 67 






4- T trcr-Q 4-T tfWWn-*- 4-T 4- T fZ 4- \1 +4- VT 4-C:r:Q 4-TTK4- 




Sbjct: 


509 


CYNDQYLFSGSSDHSIKVWDL--KKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKV 


566 


Query: 


4 68 


WDIRTLDCIHVLQTSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVY 


527 






WD++TL+C + L++ +V ++ ++ ++ G+ + I VWD+++ TL GH V 




Sbjct: 


567 


WDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVT 


626 


Query: 


528 


ALAVIST 534 








+ ++ T 




Sbjct: 


627 


TICILGT 633 




Score 


= 173 


(26.0 bits), Expect = 1.7e-09, P = 1.7e-09 




Identities - 43/118 (36%), Positives - 65/118 (55%) 




Query: 


310 


FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDGIVLAL 


369 






F+C T GH V +C+ +G L+SGS DKTI+VW+ + +C TL GHD V + 




Sbjct: 


612 


FRCNYTLKGHTKWVTTICI — LGTNLYSGSYDKTIRVWNL-KSLECSATLRGHDRWVEHM 


668 


Query: 


370 


CIQGCKLYSGSADCTI I VWDIQNLQKVNTI RAHDNPV-CTLVSSHN — VLFSGSLKAIKV 


426 






I L++ S D TI +WD++ L+ T+ H+ V C V V+ ++I+V 




Sbjct: 


669 


VICDKLLFTASDDNTIKIWDLETLRCNTTLSGHNATVQCLAVWEDKKCVISCSHDQSIRV 


728 



Query: 427 W 427 
W 

Sbjct: 729 W 729 



Pedant information for DKFZphutel_li2, frame 2 



Report for DKFZphutel_li2 . 2 



[LENGTH] 

[MW] 

[pi] 

[ HOMOL ] 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
5e-21 
[ FUNCAT ] 
2e-15 
[ FUNCAT ] 
[FUNCAT] 
le-14 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
YDL145C] 
[FUNCAT] 
le-13 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
TAF90 - 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YMR116C] 



594 

66541.94 
6.64 

SWISSPROT:KMHB DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) 



(MHCK B) . 3e-37 



03.22 cell cycle control and mitosis [S. cerevisiae, YlL046w] 5e-21 
06.13.01 cytoplasmic degradation [S. cerevisiae, YIL046w] 5e-21 

04.05.01.04 transcriptional control [S. cerevisiae, YIL046w] 5e-21 
30.10 nuclear organization [S. cerevisiae, YIL046w] 5e-21 

01.01.04 regulation of amino-acid metabolism [S. cerevisiae, YIL046w] 



99 unclassified proteins 



[S. cerevisiae, YCR072c beta-transducin family] 



le-13 



30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] le-14 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 

03.10 sporulation and germination [S. cerevisiae, YFL009w] le-14 
03.16 dna synthesis and replication [S. cerevisiae, YFL009w] le-14 
30.09 organization of intracellular transport vesicles [S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL145c] 



04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] 2e-ll 
06.10 assembly of protein complexes [S. cerevisiae, YPR178w] 2e-ll 
04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

TFIID subunit] 3e-ll 

03.13 meiosis [S. cerevisiae, YLR129w] 8e-09 

30.03 organization of cytoplasm [S. cerevisiae, YCR057c] 2e-07 
03.25 cytokinesis [S. cerevisiae, YCR057c] 2e-07 
02.16 fermentation [S. cerevisiae, YMR116C] 5e-07 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

5e-07 
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[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 

[S 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[PFAM] 

[KW] 

[KW] 

[KW] 

[KW] 



06.13 proteolysis [S. cerevisiae, YGL003c) 3e-06 

03.01 cell growth [S. cerevisiae, YKL021c] 2e-04 

01.03.07 deoxyribonucleotide metabolism [S. cerevisiae, YOR269w] 2e-04 

30.02 organization of plasma membrane [S. cerevisiae, YOR212w] 0.001 
10.05.07 g-proteins [S. cerevisiae, YOR212w) 0.001 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YOR212w] 0.001 
BL00678 

BL00518 Zinc finger, C3HC4 type, proteins 

dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-10 

2.7.1.129 Myosin-heavy-chain kinase 3e-26 

phosphotransferase 3e-26 

nucleus le-06 

plasma 9e-08 

duplication 3e-25 

hormone 9e-08 

zinc 3e-09 

cell cycle control 4e-13 
transmembrane protein 3e-12 
zinc finger le-08 
stomach 9e-08 
DNA binding 9e-06 
autophosphorylation 3e-26 
phosphoprotein 3e-26 
signal transduction 5e-08 
heterotrimer 5e-08 
coiled coil 3e-26 
multimer 3e-26 

transcription regulation 4e-10 
GTP binding 5e-08 
chromobox homology 9e-06 
RING finger homology 3e-09 
coatomer complex beta 1 chain le-07 
WD repeat homology 3e-26 

yeast coatomer complex alpha chain 3e-12 

GTP-binding regulatory protein beta chain 5e-08 

PRL1 protein 2e-09 

WD_REPEATS 2 

LEUCINE_ZIPPER 1 

MYRISTYL 14 

CK2_PHOSPHO_SITE 4 

ZINC_FINGER_C3HC4 1 

PKC_PHOSPHO_SITE IS 

ASN_GLYCOSYLATION 1 

Zinc finger, C3HC4 type (RING finger) 

WD domain, G-beta repeats 

Irregular 

3D 

LOW_COMPLEXITY 6.23 % 

COILED COIL 6.73 % 



SEQ MPPISTPRRSDSAISVRSLHSESSMSLRSTFSLPEEEEEPEPLVFAEQPSVKLCCQLCC5 

SEG xxxxxxxxxxxxxxx . . . . xxxxxxxxx 

COILS 

lgg2B 

SEQ VFKDPVITTCGHTFCRRCALKSEKCPVDNVKLTVVVNNIAVAEQIGELFIHCRHGCRVAG 

SEG 

COILS 

lgg2B 

SEQ SGKPPI FEVDPRGCPFTIKLSARKDHEGSCDYRPVRCPNNPSCPPLLRMNLEAHLKECEH 

SEG 

COILS 

lgg2B 

SEQ IKCPHSKYGCTFIGNQDTYETHLETCRFEGLKEFLQQTDDRFHEMHVALAQKDQEIAFLR 

SEG 

COILS CCCCCCCCCCCCCC 

lgg2B 

SEQ SMLGKLSEKIDQLEKSLELKFDVLDENQSKLSEDLMEFRRDASMLNDELSHINARLNMGI 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCC 

lgg2B 

SEQ LGSYDPQQIFKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLE 

SEG 

COILS 

lgg2B EECCCCCCEEEEEETTTTCEEEEEETTTEEEEEEG-GGCEEEEEEE 
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SEQ GHDGI VLALCIQGCKLYSGSADCTI I VWDIQNLQKVNTIRAHDNPVCTLVSSHNVLFSGS 

SEG 

COILS 

lgg2B CCCCCEEEEEETTCEEEEEETTTCEEEEETTTTEEEEEE-CTTTTCCCEEE 

SEQ LKAIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSYQTTKIWDIRTLDCIHVLQ 

SEG xxxxxxxxxxxxx 

COILS 

lgg2B 

SEQ TSGGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKV 

SEG 

COILS 

lgg2B 

SEQ FSAS YDRSLRVWSMDNMICTQTLLRHQGS VTALAVSRGRLFSGAVDSTVKVWTC 

SEG 

COILS 

lgg2B 



Prosite for DKFZphutel_li2 .2 



dc finnm 

cc> UUUU1 


267- 


•>271 


ASN GLYCOSYLATION 




roUUUUD 




6->9 




QT TIT 
□ lit 


t> nAp n n n n r 


dc n nnns 


15->18 


rl\i_ trfl*JorrlU 


QT T(T 
Ol 1 Cj 




roUUUUJ 


26 


i->29 


ft\L. rrlUotrrlU 


QTVC 

bllb 




roUUUU J 


5C 


l->53 


PKC^PHOSPHO" 


SITE 


pnArHfi n n r 


Den nn ns 


82 


:->85 


PKC PHOSPHO" 


SITE 


p nor* nnnn 1 ! 




121- 


•>124 


PKC PHOSPHO" 


"site 


r>rv"M" , n n n n ^ 

ir LJUUU U UU D 


Den nn n ^ 


137- 


•>140 


PKC PHOSPHO" 


"site 


C UVJlw UUUU J 


PS00005 


141- 


■>144 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


205- 


■>208 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


247- 


•>250 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


340- 


■>343 


PKC PHOSPHO] 


[site 


PDOC00005 


PS00005 


343- 


•>346 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


352- 


■>355 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


398- 


•>401 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


420- 


■>423 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


464- 


•>467 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


548- 


•>551 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


588- 


•>591 


PKC PHOSPHO" 


"sits 


PDOC00005 


PS00006 


32->36 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


201- 


•>205 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


330- 


•>334 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


533- 


•>537 


CK2 PHOSPHO" 


[site 


PDOC0000 6 


PS00008 


115- 


>121 


MYRISTYL 




PDOC00008 


PS00008 


133- 


•>139 


MYRI ST YL 




PDOC00008 


PS00008 


194- 


•>200 


MYRISTYL 




PDOC00008 


PS00008 


299- 


•>305 


MYRISTYL 




PDOC00008 


PS00008 


314- 


•>320 


MYRISTYL 




PDOC00008 


PS00008 


364- 


•>370 


MYRISTYL 




PDOC00008 


PS00008 


379- 


>385 


MYRISTYL 




PDOC00008 


PS00008 


419- 


■>425 


MYRISTYL 




PDOC00008 


PS00008 


460- 


>466 


MYRISTYL 




PDOC00008 


PS00008 


484- 


>490 


MYRISTYL 




PDOC00008 


PS00008 


499- 


>505 


MYRISTYL 




PDOC00008 


PS00008 


524- 


>530 


MYRISTYL 




PDOCO0008 


PS00008 


568- 


>574 


MYRISTYL 




PDOC00008 


PS00008 


583- 


>589 


MYRISTYL 




PDOC00008 


PS00518 


7C 


->B0 


ZINC FINGER 


C3HC4 


PDOC0044 9 


PS00029 


436- 


>458 


LEUCINE ZIPPER 


PDOC00029 


PS00678 


335- 


>350 


WD REPEATS 




PDOC00574 


PS00678 


376- 


>391 


WD REPEATS 




PDOC00574 



Pfam for DKFZphutelJLi2 .2 
HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFS PDGrWFI vSGSWDgTCRLWD* 

++GH ++VWC+ + G + ++SGS D+T+++WD 
Query 316 FVGHQGPVWCLCVYSMGDL-LFSGSSDKTIKVWD 348 

22.93 519 553 1 34 dkf zphutel_li2 . 2 similarity to Dictostelium myosin heavy chain 

kinase 

Alignment to HMM consensus: 
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Query *MrGHnnWVWCVaF . . SPDGrWFIvSGSWDgTCRLWD* 

++GH ++V+++A+ +PD ++S+S D+++R+W+ 

dkfzphutel 519 LTGHVGTVYALAVISTPDQTK-VFSASVfDRSLRVWS 



553 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW. .CPmC* 

C++C + F++P++++CGH+FC+ C +++ CP+ 

Query 55 CQLC CSV FKDPVITTCGHTFCRRCALKSEKCPVD 



88 
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group: metabolism 



DKFZphutel_20bl9 encodes a novel 486 amino acid protein with similarity to bacterial sarcosine 
oxidases (EC 1.5.3.1.) 

The novel protein seems to be a novel enzyme with sarcosine oxidase activity. 

The new protein can find application in modulation of sarcosine metabolism and as a new enzyme 
for biotechnologic production processes. 



similarity to sarcosine oxidases 
membrane regions: 1 

Summary DKFZphutel_20bl9 encodes a novel 486 amino acid protein, with 
similarity to sarcosine oxidases. 



similarity to sarcosine oxidases 

complete cDNA?, complete cds potential start at Bp 48, EST hits, 
Sequenced by AGOWA 
Locus: unknown 



Insert length: 1967 bp 

Poly A stretch at pos. 1950, no polyadenylation signal found 



1 AGCGAGGCAG CAGTGCAGCT TTCAGAGGGT CCGGGCTCAG AGGGGTTATG 
51 ATTCGGAGGG TTCTGCCGCA CGGCATGGGC CGGGGCCTCT TGACCCGGAG 
101 GCCAGGCACG CGCAGAGGAG GCTTTTCTCT GGACTGGGAT GGAAAGGTGT 
151 CTGAGATTAA GAAGAAGATC AAGTCGATCC TGCCTGGAAG GTCCTGTGAT 
201 CTACTGCAAG ACACCAGCCA CCTGCCTCCC GAGCACTCGG ATGTGGTGAT 
2 51 CGTGGGAGGT GGGGTGCTTG GCTTGTCTGT GGCCTATTGG CTGAAGAAGC 
301 TGGAGAGCAG ACGAGGTGCT ATTCGAGTGC TAGTGGTGGA ACGGGACCAC 
351 ACGTATTCAC AGGCCTCCAC TGGGCTCTCA GTAGGTGGGA TTTGTCAGCA 
4 01 GTTCTCATTG CCTGAGAACA TCCAGCTCTC CCTCTTTTCA GCCAGCTTTC 
4 51 TACGGAACAT CAATGAGTAC CTGGCCGTAG TCGATGCTCC TCCCCTGGAC 
501 CTCCGGTTCA ACCCCTCGGG CTACCTCTTG CTGGCTTCAG AAA AGG AT GC 
551 TGCAGCCATG GAGAGCAACG TGAAAGTGCA GAGGCAGGAG GGAGCCAAAG 
601 TTTCTCTGAT GTCTCCTGAT CAGCTTCGGA ACAAGTTTCC CTGGATAAAC 
651 ACAGAGGGAG TGGCTTTGGC GTCTTATGGG ATGGAGGACG AAGGTTGGTT 
701 TGACCCCTGG TGTCTGCTCC AGGGGCTTCG GCGAAAGGTC CAGTCCTTGG 
751 GAGTCCTTTT CTGCCAGGGA GAGGTGACAC GTTTTGTCTC TTCATCTCAA 
801 CGCATGTTGA CCACAGATGA CAAAGCGGTG GTCTTGAAAA GGATCCATGA 
851 AGTCCATGTG AAGATGGACC GCAGCCTGGA GTACCAGCCT GTGGAATGCG 
901 CCATTGTGAT CAACGCAGCC GGAGCCTGGT CTGCGCAAAT CGCAGCACTG 
951 GCTGGTGTTG GAGAGGGGCC GCCTGGCACC CTGCAGGGCA CCAAGCTACC 
1001 TGTGGAGCCG AGGAAAAGGT ATGTGTATGT GTGGCACTGC CCCCAGGGAC 
10 51 CAGGCCTAGA GACTCCGCTT GTTGCAGACA CCAGTGGAGC CTATTTTCGC 
1101 CGGGAAGGAT TAGGTAGCAA CTACCTAGGT GGTCGTAGCC CCACTGAGCA 
1151 GGAAGAACCG GACCCGGCGA ACCTGGAAGT GGACCATGAT TTCTTCCAGG 
1201 ACAAGGTGTG GCCCCATTTG GCCCTGAGGG TCCCAGCTTT TGAGACTCTG 
1251 AAGGTTCAGA GCGCCTGGGC CGGCTATTAC GACTACAACA CCTTTGACCA 
1301 GAATGGCGTG GTGGGCCCCC ACCCGCTAGT TGTCAACATG TACTTTGCTA 
1351 CTGGCTTCAG TGGTCACGGG CTCCAGCAGG CCCCTGGCAT TGGGCGAGCT 
1401 GTAGCAGAGA TGGTACTGAA GGGCAGGTTC CAGACCATCG ACCTGAGCCC 
14 51 CTTCCTCTTT ACCCGCTTTT ACTTGGGAGA GAAGATCCAG GAGAACAACA 
1501 TCATCTGAGC ATGTGTGCTC TGCACTGGCT CCACTGGCTT GCATCCTGGC 
1551 TGTGTTCACA GCCTTGTTTG CTGCTTCCAT CTTCCCCAGT ACTGTGCCAG 
1601 GCCTTCTCCC CCTCCCCAGT GTCCTCTCCT CTCAGGCAGG CCATTGCACC 
1651 CATATGGCTG GGCAGGCACA GGCAGTGAGG CCGAGGCCAA TAGCGAGTGA 
17 01 TGAGCGGGAT CCTAGGACTG ATCTGTAGCC CATGCTGATG TCACCCACCA 
1751 GGGCAATCCA TCTGGACGCC TGAGCACCCT GGCCCAGGAC TGGCTTCATC 
1801 CTGGCACTGA CCAGGAAAGA CTGCCTCTGA CCCTCTTAGC AGACAGAGCC 
1851 CAGGCATGGG AGCACTCTGG GGCAGCCTGG CTCAGGTTTA TTGATTTTCG 
1901 TCTGTTTACC CTATCCATTA ATCAATACAT GTAATTAACT CCTTCCCTCC 
1951 AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 48 bp to 1505 bp; peptide length: 486 
Category: similarity to known protein 



1 MIRRVLPHGM GRGLLTRRPG TRRGGFSLDW DGKVSEIKKK IKSILPGRSC 

51 DLLQDTSHLP PEHSDVVIVG GGVLGLSVAY WLKKLESRRG AIRVLVVERD 

101 HTYSOASTGL SVGGICQOFS LPENIQLSLF SASFLRNINE YLAVVDAPPL 

151 DLRFNPSGYL LLASEKDAAA MESNVKVQRQ EGAKVSLMSP DQLRNKFPWI 

201 NTEGVALASY GMEDEGWFDP WCLLQGLRRK VQSLGVLFCQ GEVTRFVSSS 

251 QRMLTTDDKA VVLKRIHEVH VKMDRSLEYQ PVECAIVINA AGAWSAQIAA 

301 LAGVGEGPPG TLQGTKLPVE PRKRYVYVWH CPQGPGLETP LVADTSGAYF 

351 RREGLGSNYL GGRSPTEQEE PDPANLEVDH DFFQDKVWPH LALRVPAFET 

401 LKVQSAWAGY YDYNTFDQNG VVGPHPLVVN MYFATGFSGH GLQQAPGIGR 

4 51 AVAEMVLKGR FQTIDLSPFL FTRFYLGEKI QENNII 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20bl9, frame 3 

TREMBL: CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2, 
N = 1, Score = 801, P = 9.2e-80 

PIR:B71184 probable sarcosine oxidase - Pyrococcus horikoshii, N = 2, 
Score = 194, P = 2e-26 

PIR:B69284 sarcosine oxidase, subunit beta (soxB) homolog - 
Archaeoglobus fulgidus, N - 3, Score - 189, P = 8.2e-22 

TREMBL : AF04 27 32_1 gene: "Bb"; product: "unknown protein"; Anopheles 
gambiae (Bb) gene, partial cds; and TU37B2 (TU37B2) and diphenol 
oxidase-A2 (Dox-A2) genes, complete cds., N = 1, Score = 386, P = 
8.7e-36 

PIR:F71008 probable sarcosine oxidase - Pyrococcus horikoshii, N = 2, 
Score = 200, P = 4e-25 



>TREMBL:CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2 
Length = 527 

HSPS : 

Score = 801 {120.2 bits), Expect = 9.2e-80, P = 9.2e-80 
Identities = 171/433 (39%), Positives = 260/433 (60%) 

Query: 61 PEHSDVVIVGGGVLGLSVAYWLKKLESRRGAIRVLVVERDHTYSQASTGLSVGGICQQFS 120 

P +++VI+GGG+ G S A+WLK+ R +V+VVE + ++++ST LS GGI QQFS 
Sbjct: 91 PYRAEIVIIGGGLSGSSTAFWLKE-RFRDEDFKVVVVENNDVFTKSSTMLSTGGITQQFS 149 

Query: 121 LPENIQLSLFSASFLRNINEYLAVVDAPPLDLRFNPSGYLLLA-SEKDAAAMESNVKVQR 179 

+PE + +SLF+ FLR+ E+L ++D+ D+ F P+GYL LA ++++ M S KVQ 
Sbjct: 150 IPEFVDMSLFTTEFLRHAGEHLRILDSEQPDINFFPTGYLRLAKTDEEVEMMRSAWKVQI 209 

Query: 180 QEGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFC 239 

+ GAKV L+S D+L ++P++N + V LAS G+E+EG D W LL +R K +LGV + 
Sbjct: 210 ERGAKVQLLSKDELTKRYPYMNVDDVLLASLGVENEGTIDTWQLLSAIREKNITLGVQYV 269 

Query: 240 QGEVTRFVSSSQRM LTTDDKAWLKRIHEVHVKMDRS-LEYQPVECAIVI 288 

+GEV F R T D+ + +RI V V+ + +P+ +++ 

Sbjct: 270 KGE VEGFQFERHRAS S EVHAFGDD ATADENKLRAQRI SGVLVRPQMNDAS ARP I RAHL I V 329 

Query: 289 NAAGAWSAQIAALAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTS-G 347 

NAAG W+ Q+A +AG+G+G G L +P++PRKR V+V P P +P+DSG 
Sbjct: 330 NAAGPWAGQVAKMAGIGKGT-GLL-AVPVPIQPRKRDVFVIFAPDVPS-DLPFIIDPSTG 386 

Query: 348 AYFRREGLGSNYLGGRSPTEQEEP — DPANLEVDHDFFQDKVWPHLALRVPAFETLKVQS 405 

+ R+ G +L GR+P+++E+ D +NL+VD+D F K+WP L RVP F+T KV+S 
Sbjct: 387 VFCRQTDSGQTFLVGRTPSKEEDAKRDHSNLDVDYDDFYQKIWPVLVDRVPGFQTAKVKS 446 
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Query: 406 AWAGYYDYNTFDQNGVVGPHPLVVNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTID 4 65 

AW+GY D NTFD V+G HPL N++ GF G+ + RA AE + G + ++ 
Sbjct: 447 AWSGYQDINTFDDAPVIGEHPLYTNLHMMCGFGERGVMHSMAAARAYAERIFDGAYINVN 506 

Query: 4 66 LSPFLFTRFYLGEKIQE 482 

L F R + I E 
Sbjct: 507 LRKFDMRRI VKMDPITE 523 



Pedant information for DKFZphutel_20bl9, frame 3 



Report for DKFZphutel_20bl9 . 3 



[LENGTH] 


486 


[MW] 


53811.85 


[pi] 


7.66 


[HOMOL] 


TREMBL : CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2 le-78 


[FUNCAT] 


c energy conversion [H. influenzae, HI0499] 8e-05 


[BLOCKS] 


BL00677A D-amino acid oxidases proteins 


[BLOCKS] 


BL00623A GMC oxidoreductases proteins 


[BLOCKS] 


BL01304A 


[EC] 


1.5.99.2 Dimethylglycine dehydrogenase 2e-07 


[PIRKW] 


flavoprotein 2e-07 


[PIRKW] 


oxidoreductase 2e-07 


[PROSITE] 


MYRISTYL 12 


[PROSITE] 


CK2 PHOSPHO SITE 5 


[PROSITE] 


GLYCOSAMINOGLYCAN 1 


[PROSITE] 


PKC PHOSPHO SITE 6 


[KW] 


TRANSMEMBRANE 1 


[KW] 


LOW COMPLEXITY 7 . 00 % 



SEQ MIRRVLPHGMGRGLLTRRPGTRRGGFSLDWDGKVSEIKKKIKSILPGRSCDLLQDTSHLP 

SEG xxxxxxxxxxxxxxx xxxxxxxx 

PRD ccceeecccccceeecccccccccccccccccchhhhhhhhhhccccccceeeccccccc 

MEM 

SEQ PEHSDVVIVGGGVLGLSVAYWLKKLESRRGAIRVLVVERDHTYSQASTGLSVGGICQQFS 

SEG xxxxxxxxxxx 

PRD cccceeeeeccccchhhhhhhhhhhhhhcccceeeeeeccccccccccccccccceeeec 

MEM MMMMMMMMMMMMMMMMM 

SEQ LPENIQLSLFSASFLRNINEYLAVVDAPPLDLRFNPSGYLLLASEKDAAAMESNVKVQRQ 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhccccceeecccceeeehhhhhhhhhhhhhhhhhh 

MEM 

SEQ EGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFCQ 

SEG 

PRD cccceeecccchhhhhhccccccccccccccccccccccccchhhhhhhhhhhheeeeec 

MEM 

SEQ GEVTRFVSSSQRMLTTDDKAVVLKRIHEVHVKMDRSLEYQPVECAIVINAAGAWSAQIAA 

SEG 

PRD ceeeeecccccccccccchhhhhhhhhheeeecccccccccceeeeeeecccchhhhhhh 

MEM 

SEQ LAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTSGAYFRREGLGSNYL 

SEG 

PRD hhccccccccccccccccccccceeeeeeecccccccccceeeccccceeeeccccccee 

MEM 

SEQ GGRSPTEQEEPDPANLEVDHDFFQDKVWPHLALRVPAFETLKVQSAWAGYYDYNTFDQNG 

SEG 

PRD ecccccccccccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhheeeeeccccccc 

MEM 

SEQ VVGPHPLVVNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTIDLSPFLFTRFYLGEKI 

SEG 

PRD cccccccccceeeecccccccccchhhhhhhhhhhhhhccceeeeccccccccccccccc 

MEM 

SEQ QENNII 

SEG 

PRD cccccc 

MEM 
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Prosite for DKFZphutel_20bl9 . 3 



PS00002 


438->442 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


16->19 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


21->24 


PKC~ PHOSPHO~ 


SITE 


PDOC00005 


PS00005 


87->90 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


164->167 


PKC~ PHOSPHO~ 


"site 


PDOC00005 


PS00005 


250->253 


PKC - PHOSPHO 


SITE 


PDOC00005 


PS00005 


400->403 


PKC~ PHOSPHO" 


"site 


PDOC00005 


PS00006 


120->124 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


164->168 


CK2 _ PHOSPHO" 


SITE 


PDOC00006 


PS00006 


255->259 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


364->368 


CK2 — PHOSPHO" 


SITE 


PDOC0000 6 


PS00006 


366->370 


CK2 - PHOSPHO" 


[site 


PDOC00006 


PS00008 


9->15 


MYRI ST YL 




PDOC00008 


PS00008 


20->26 


MYRISTYL 




PDOC00008 


PS00008 


71->77 


MYRI STYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


109->115 


MYRISTYL 




PDOC00008 


PS00008 


182->188 


MYRISTYL 




PDOC00008 


PS00008 


204->210 


MYRISTYL 




PDOC00008 


PS00008 


235->241 


MYRISTYL 




PDOC00008 


PS00008 


292->298 


MYRISTYL 




PDOC00008 


PS00008 


310->316 


MYRISTYL 




PDOC00008 


PS00008 


354->360 


MYRISTYL 




PDOC00008 


PS00008 


447->453 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_20bl9 . 3) 
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DKFZphutel_20g21 



group: signal transduction 

DKFZphutel_20g21 encodes a novel 861 amino acid protein with partial similarity to human ras 
inhibitor and other ras inhibitor proteins. 

Ras is a signal transducting molecule involved in the receptor tyrosine kinase/RAS/Map kinase 
signalling cascade. Ras proteins bind GDP/GTP and show intrinsic GTPase activity. Mutations in 
ras, which change aa 12, 13 or 61 activate the potential of ras to transform cultured cells 
and are implicated in a variety of human tumours. The novel protein seems to be a new ras 
inhibitor protein. 

The new protein can find application in modulating/blocking ras dependent signal transduction 
pathways . 



Ras inhibitor 

additional 1188 Bp at 5' and 110*7 at 3' end in comparison to 122483 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 4137 bp 

Poly A stretch at pos. 4116, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 



GGGAGAACTG 
GCCTGGAACC 
TATTCCGAGG 
CAGCCTCTCC 
CCATATGGCT 
CAGGCCCAGC 
GAAGAAAGTC 
AGGAATTTGC 
GGAATCAGTT 
CAGGGATGTT 
CCAAGTCGGA 
TGGAGCTCCC 
GCCTCTTTCC 
TTATAAATGG 
TGCAGCCAGA 
AGTGCACAGC 
CTCCCAACGC 
CCGCCACCCG 
TGAAACCCAG 
ACGTAGCTCT 
AAGAAGCAGG 
CGGCGGCCGG 
GCCCAGGTGG 
CCGCCGCCCA 
GCTGAGCGAC 
ACCGGAGCAT 
GAGGACTACG 
GTCCAAAAAG 
CCCAGCTGCA 
AAGCGGATGG 
CTTCGGGTGC 
AGTGCCACGT 
ACCCAGGTCA 
CGAGTCGCTG 
TGCACAAGTG 
AAGGACTTTC 
GCAGCTTGTG 
CCCCTGATTT 
CAGAAGATGT 
CAAGCTCATT 
CTGATGACTT 
CTTGAATTGG 
GCTGTTACAT 
TTTCTCTGAT 
TCAGAAACCA 
CCGGACCATC 
TTCAGGAGGT 
TACATCACCA 
GGGGGACCCT 
AGCAGCTGGC 



AAACAGGAGA 
CGCTGAAACC 
AAGAGGACGT 
AACAGGCTCA 
GCAGCTGAGT 
CTCCGGGGAT 
CTCTCCCTCC 
CATAAAGGAA 
TCGCAGATTT 
CTACCATTTA 
GGCTCAGCTT 
CAGCTGACAG 
TCCGACGGTG 
AGTGCATTCT 
CCAACGGGGC 
CAGGACCTCA 
GAATGGCACG 
CTATTAATAG 
ACGAGCATGC 
GCCTGGAACG 
CTTCTTTTCT 
CCGGGCGCAG 
GGCCCCGCCT 
GCTCTGAATC 
ATGAGCATTT 
GCCTCTGTTT 
AGGGGGAAAG 
AAAAGGAGCA 
GAAGGTGAGC 
TCCGCAGGAT 
TTAGTGCAGG 
GTCCAGCACC 
AGAACTATTT 
ATCCCTGAAG 
CATCTTGAAG 
ACATGGCCGA 
CGGCAGAGGA 
TGTGGATGTG 
ATTCGCCGGA 
TACACGGTCA 
CTTGCCAGTC 
ACACTGAAAT 
GGAGAAGGAG 
AAAGAATTTC 
GAGACACCCT 
CCCTCTGTGG 
CAACAGTGGT 
CTGAGGATGT 
GAGGAGTACA 
AGAGGACACT 



TGGTGCGGAC 
CACAGCATGG 
GAAGACCTGT 
GCATCTTGGA 
CTGAGTGAGG 
CTTCCTGGTT 
GCCTGCCCTG 
AGCACATACA 
ATTCCGGCTC 
CCTTGAAGTT 
GAAGAACTGG 
CAAACCCCCG 
TCTGTCCTGC 
ATCAAAACCA 
CCTGTGCTTT 
GTGGAGGCCT 
GAGCGGACTC 
TCTCCACACA 
CAGAAACAGT 
AAACCAACTC 
GGAAGCAGAG 
GCCCGGAGCT 
GAGGCCGCCC 
ACGGCCCCCG 
CTACTTCCTC 
GGCTACGAGG 
TGACCAAGAG 
GCTCCTTCGT 
GGGGTGTTCA 
CGCCGAGCTT 
ACTACGTGAG 
GACATGCTGC 
GTCTCAGAGC 
ACCAAATAGA 
CCCCTCAAGG 
TGGCTCATGG 
ATCCGCAGGA 
GAGAAAATCA 
AAAGAAGGTC 
TGGAGAACAA 
CTGACCTATG 
CGAGTACATG 
GCTATTACTT 
CAAGAAGAAC 
GAGGCAGTGG 
ACGACTTCCA 
TGCACAGGAA 
GTGTCAGATC 
GCCTCTTTCT 
TACCCTCAAA 



AGATGTCAAC 
TAAGACACAA 
GCCCGGGACT 
CCGGCTCCTC 
AGGAGGCAGC 
CATAAATCTA 
TGAATTTGGG 
CCTTTTCCCT 
ATTGCTTTCT 
GCCTTATGCC 
CCCAGATGGG 
AACCTTCCAC 
CTCCCTGCGT 
GGACGCCTTC 
ATTAATCCCC 
GAAACGGCCG 
GGTCCCCCCC 
AGCCCTCGGC 
CAACCATAAC 
CCATCCCTCC 
GGCGGTGCAA 
GGAGCTGGGC 
CGGGGGATTG 
TGCCATGGAG 
CTCCGACTCG 
CGGACACCAA 
ACCATGGCGC 
GCTGCCCAAG 
GCTCCTTCAT 
TCCCGGGACA 
CTTCCTGCAG 
AGACCATCCG 
TCGGAGCTGG 
TGTGGTGCTG 
GGCATGTGGA 
AAGCAACTCA 
GCTGGGGGTC 
AAGTCAAGTT 
ATGCTGCTGC 
CTCAGGGAGG 
TCATAGCCCA 
ATGGAGCTCC 
GACAAGCGCA 
AAGCAGCGCG 
CACAAACGGA 
GAATTACCTC 
AGACCCTCCT 
TGCGCTGAGA 
CTTCGTTGAC 
AAATCAAGGC 



CTGGAAAATG 
GGATGGTGGC 
CAGGCTATGA 
CACACCCACC 
AGAGGTCCTG 
CCAAGATGCA 
GCCCCACTCA 
GGAAGGCTCA 
ACTGCATCAG 
ATTTCAACAG 
ACTAAATTTC 
CTCCCCATAG 
CAGCTCTGCC 
AGAGCTGGAG 
TTTTCTTGAA 
AGCACAAGGA 
ACCCAGGCCC 
TGGCCAGGAC 
AAACATGGGA 
ACCCCGGCTG 
AGACCTTGAG 
ACAGCTGGCA 
CACAAGGGCC 
GCCGGCAGCG 
CTGGAGTTCG 
CAGCAGCCTG 
CCCCCATCAA 
CTCGTCAAGT 
GACCCCGGAG 
AATGCACCTA 
GAGAACAAGG 
GCAGTTCATG 
ACCCCCCCAT 
GAAAAAGCCA 
GGCCATGCTG 
AGGAGAACCT 
TTCGCCCCGA 
CATGACCATG 
TGCGGGTCTG 
ATGTATGGCG 
GTGTGACATG 
TAGACCCATC 
TATGGAGCAC 
ACTGCTCAGC 
GAACCACCAA 
CGAGTTGCAT 
TGTGAGACCT 
AGTTCAAGGT 
GAGACAT GGC 
GGAGCTGCAC 
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2501 AGCCGACCAC AGCCCCACAT 

2551 CGATCCTTAT GGCATCATTT 

2601 CCTAGAAGAC AGGCGGGACT 

2651 AGCCTTGCCT TCCCGCTTCT 

2701 CTCGGGGACC CCTCAGTGTA 

2751 CAAGGGCAAC TTTAGCCACG 

2801 ATTCTCTTTT GGCAATGGAG 

2851 ATTGTTTGCT ACCTACCCCC 

2901 TATATGTGCA GAAGAAACAC 

2951 CAGATGCTTG CGATGCAGTG 

3001 TTCATCCCTG CCTTCCTTCC 

3051 TTTTTACAAA GAGCCTTCAT 

3101 GCAGTTGCAG GTAAACTGTC 

3151 TAAAATATTC TATAATTATG 

3201 TAAATCTCTT GCTGGATTTG 

3251 GTAACTGGAT GTTTTGGCAA 

3301 AAGCAACGTA TTCCTGACAC 

3351 TACTGTTCTC TTGTTCACGT 

34 01 ACAAATGATG CTGAGAATAA 

34 51 AGAGAAATAT GAACTCTAAC 

3501 AGGCTCTTCA AAAGATGTAG 

3551 AAAAT AC T GT AAATATGCAG 

3601 ATTTGCTTGT AGAAACAATT 

3651 AGAAGAACAC TTTTCTCCCT 

37 01 AAATTATTGG GACCAGAAAC 

37 51 TTAAATAAGA TGCTATATAA 

3801 TCAATCTACA TTATCAGAAC 

3851 AACCAGTTTG CAGGTGCACA 

3901 AGGTAGTTAC AAAAACATCT 

3951 TCATTTGGTT GGCTTTGTAC 

4001 GAACTAGAAC CCTCAGCACA 

4051 TAAATGGAAT TTTGCACATA 

4101 GTGAAAATAA TTTTTGAAAT 



CTTCCACTTT GTCTACAAAC GCATCAAGAA 
TCCAGAACGG GGAAGAAGAC CTCACCACCT 
TCCCAGTGGT GCATCCAAAG GGGAGCTGGA 
ACATGCTTGA GCTTGAAAAG CAGTCACCTC 
GTGACTAAGC CATCCACAGG CCAACTCGGC 
CAAGGTAGCT GAGGTTTGTG AAACAGTAGG 
AATTGCATCT GATGGTTCAA GTGTCCTGAG 
AGTCAGGTTC TAGGTTGGCT TACAGGTATG 
TTAAGATACA AGTTCTTTTG AATTCAACAG 
CGTCAGGTGA TTCTCACTCC TGTGGATGGC 
TTTCTTXTTC CTTTTTTTTT •j ,f p'p r p i p r j ir j ir p r j ii p 
GTTTTTATAT ATTTCATAGA AATTTTTATA 
AGGATTGGTT TT AAAAT ATT TTTGTAACTT 
CATGTGATTT TAACATTTAA TATTCAAAAA 
AGAGTATTGC ATTTTTAAAG TCTCTCTTCT 
CTTTGTGGGG AGAGACTGCT GGATTTCTTA 
TGGCCACAGA ATGCCTTTGG AAATCGGATG 
TTAGTGGTGT TTTGCTGTTT TGTTTTTTAA 
GGAGAGAAAT GAATGTAGAG AGAGGTAGAG 
AAAGGACTGA GGAGTGCAGT CTGCTGGTTC 
AAAAAGAGAT AGAAGGAACC ACCTATGCTT 
TGAGGTTTGG CAAAATCTAT TCCATGTGTG 
TTGAAAGCCC CTTGAGGAAA ATAAAAATCA 
TTTCCATACA AATTAAAACT TAACAGCATC 
CAAGTAATGT ATAATGTGGC TTTTGTTGAG 
TGGAGAAGAA TTTGAAAATG CACAAAAAAA 
CTGCAGTGAA ATTAAACTTA TGTTAAATAA 
AACTATGAGG GTCTTGTATC CACGTAACAC 
TATTGTACTG TGTAAAGATG CATAGTCATC 
CTTGTACCTT TTTTAGCCTT GGCTTTTGTT 
TACTGTGTTG TACTTTTGTA AATGATTTTT 
ATACATTGTA ATACTGTATG ATAATCATGT 
AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry 122483 from database EMBL : 
Sequence 15 from patent US 5527896. 
Length = 1829 
Plus Strand HSPs: 

Score = 9097 (1364.9 bits), Expect = 0.0, P = 0.0 
Identities = 1821/1823 (99%), Positives = 1821/1823 (99%), 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 20 bp to 2602 bp; peptide length: 861 
Category: known protein 

Classification: Cell signaling/communication 



1 MVRTDVNLEN 
51 SILDRLLHTH 
101 RI.PCEFGAPL 
151 TLKLPYAI ST 
201 VCPASLRQLC 
251 SGGLKRPSTR 
301 PETVNHNKHG 
351 GPELELGTAG 
401 STSSSDSLEF 
451 SSFVLPKLVK 
501 DYVSFLQENK 
551 DQIDVVLEKA 
601 NPQELGVFAP 
651 MENNSGRMYG 
701 GYYLTSAYGA 
751 DDFQNYLRVA 
801 SLFLFVDETW 
851 FQNGEEDLTT 



GLEPAETHSM 
PIWLQLSLSE 
KEFAIKESTY 
AKSEAQLEEL 
LINGVHSIKT 
TPNANGTERT 
NVALPGTKPT 
SPGGAPPEAA 
DRSMPLFGYE 
SQLQKVSGVF 
ECHVSSTDML 
MHKCILKPLK 
TPDFVDVEKI 
ADDFLPVLTY 
LSLIKNFQEE 
FQEVNSGCTG 
QQLAEDTYPQ 
S 



VRHKDGGYSE 
EEAAEVLQAQ 
TFSLEGSGTS 
AQMGLNFWSS 
RTPSELECSQ 
RSPPPRPPPP 
PIPPPRLKKQ 
PGDCTRAPPP 
ADTNSSLEDY 
SSFMTPEKRM 
QTIRQFMTQV 
GHVEAMLKDF 
KVKFMTMQKM 
VIAQCDMLEL 
QAARLLSSET 
KTLLVRPYTT 
KIKAELHSRP 



EEDVKTCARD 
PPGIFLVHKS 
FADLFRLT AF 
PADSKPPNLP 
TNGALCFINP 
AINSLHTSPR 
ASFLEAEGGA 
SSESRPPCHG 
EGESDQETMA 
VRRIAELSRD 
KNYLSQSSEL 
HMADGSWKQL 
YSPEKKVMLL 
DTEIEYMMEL 
RDTLRQWHKR 
TEDVCQICAE 
QPHIFHFVYK 



SGYDSLSNRL 
TKMQKKVLSL 
YCISRDVLPF 
PPHRPLSSDG 
LFLKVHSQDL 
LARTETQTSM 
KTLSGGRPGA 
GRQRLSDMSI 
PPIKSKKKRS 
KCTYFGCLVQ 
DPPIESLIPE 
KENLQLVRQR 
LRVCKLI YTV 
LDPSLLHGEG 
RTTNRTIPSV 
KFKVGDPEEY 
RIKNDPYGII 
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BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_20g21, frame 2 

TREMBL:RNU80076_1 product: "RIN1"; Rattus norvegicus RIN1 mRNA, 
complete cds . , N = 3, Score = 606, P = 6.8e-97 

PIR:A38637 Ras interactor RIN1 - human, N = 3, Score = 587, P = 1.9e-92 

TREMBL : HSRASINL_1 product: "ras inhibitor"; Human ras inhibitor mRNA, 
3' end., N = 2, Score = 592, P = 9.8e-61 

SWISS PROT : RIN1_HUMAN RAS INTERACT I ON /INTERFERENCE PROTEIN 1 (RAS 
INHIBITOR JC99) (FRAGMENT)., N = 2, Score = 587, P = 4.1e-60 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment), N = 1, Score 
= 2446, P = 4.6e-254 



>PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 
Length = 471 

HSPs: 

Score = 2446 (367.0 bits), Expect = 4.6e-254, P = 4.6e-254 
Identities = 471/471 (100%), Positives - 471/471 (100%) 

Query: 391 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 450 

GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 
SbjCt: 1 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPIKSKKKRS 60 

Query: 451 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRKVRRIAELSRDKCTYFGCLVQDYVSFLQENK 510 

SSFVLPKLVKSQLQKVSGVFSSFMTPEKRKVRRIAELSRDKCTYFGCLVQDYVSFLQENK 
Sbjct: 61 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRI AELSRDKCT YFGCLVQDYVSFLQENK 120 

Query: 511 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQIDVVLEKAMHKCILKPLK 570 

ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQI DVVLEKAMHKCILKPLK 
Sbjct: 121 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLIPEDQI DVVLEKAMHKCILKPLK 180 

Query: 571 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 630 

GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 
Sbjct: 181 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 240 

Query: 631 YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 690 

YSPEKKVMLLLRVCKLIYTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 
Sbjct: 241 YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLT YVI AQCDMLELDTEI EYMMEL 300 

Query: 691 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 750 

LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 
Sbjct: 301 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 360 

Query: 751 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 810 

DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 
Sbjct: 361 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 420 

Query: 811 QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 861 

QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 
Sbjct: 421 QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 471 



Pedant information for DKFZphutel_20g21, frame 2 



Report for DKFZphutel_20g21 . 2 



[LENGTH] 

[MW] 

[pl] 

[HOMOL] 

[ FUNCAT ] 

[ FUNCAT ] 

3e-10 

[ FUNCAT ] 

[ FUNCAT ] 

3e-10 

[PIRKW] 

[SUPFAM] 



861 

96380.26 
6.15 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 0.0 
08.13 vacuolar transport [S. cerevisiae, YML097c] 3e-10 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YML097c] 

30.03 organization of cytoplasm IS. cerevisiae, YML097c) 3e-10 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

alternative splicing 3e-59 
Ras interactor RIN1 3e-59 
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[KW] All_Alpha 

[KW] LOW_COMPLEXITY 11.27 % 

SEQ MVRTDVNLENGLEPAETHSMVRHKDGGYSEEEDVKTCARDSGYDSLSNRLSILDRLLHTH 

SEG 

PRD ccccceeeccccccccceeeeeecccccccccceeeeeeccccccchhhhhhhhhhhhhh 

SEQ PIWLQLSLSEEEAAEVLQAQPPGIFLVHKSTKMQKKVLSLRLPCEFGAPLKEFAIKESTY 

SEG . . . xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccccceeeeechhhhhhhhhhhcccccccccceeeeeeecc 

SEQ TFSLEGSGISFADLFRLIAFYCISRDVLPFTLKLPYAISTAKSEAQLEELAQMGLNFWSS 

SEG 

PRD ceeecccccchhhhhhhhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhccccccc 

SEQ PADSKPPNLPPPHRPLSSDGVCPASLRQLCLINGVHSIKTRTPSELECSQTNGALCFINP 

SEG xxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhcccccccccccccccccccccccceeeecc 

SEQ LFLKVHSQDLSGGLKRPSTRTPNANGTERTRSPPPRPPPPAINSLHTSPRLARTETQTSM 

SEG xxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PETVNHNKHGNVALPGTKPTPIPPPRLKKQASFLEAEGGAKTLSGGRPGAGPELELGTAG 

SEG xxxxxxxxxxx xx 

PRD eeeeeccccccccccccccccccccchhhhhhhhhhhccccccccccccccceeeeeccc 

SEQ SPGGAPPEAAPGDCTRAPPPSSESRPPCHGGRQRLSDMSISTSSSDSLEFDRSMPLFGYE 

SEG xxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeccccccceee 

SEQ ADTNSSLEDYEGESDQETMAPPIKSKKKRSSSFVLPKLVKSQLQKVSGVFSSFMTPEKRM 

SEG xxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhcchhhh 

SEQ VRRIAELSRDKCTYFGCLVQDYVSFLQENKECHVSSTDMLQTIRQFMTQVKNYLSQSSEL 

SEG 

PRD hhhhhhhhhhchhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ DPPIESLIPEDQIDVVLEKAMHKCILKPLKGHVEAMLKDFHMADGSWKQLKENLQLVRQR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhccccchhhhhhhhhhhhh 

SEQ NPQELGVFAPTPDFVDVEKIKVKFMTMQKMYSPEKKVMLLLRVCKLIYTVMENNSGRMYG 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhcccccc 

SEQ ADDFLPVLTYVIAQCDMLELDTEIEYMMELLDPSLLHGEGGYYLTSAYGALSLIKNFQEE 

SEG 

PRD cccccccceeecccccchhhhhhhhhhhhhhcccccccccceeeeehhhhhhhhhhhhhh 

SEQ QAARLLSSETRDTLRQWHKRRTTNRTIPSVDDFQNYLRVAFQEVNSGCTGKTLLVRPYIT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhccccccceeeeecccccc 

SEQ TEDVCQICAEKFKVGDPEEYSLFLFVDETWQQLAEDTYPQKIKAELHSRPQPHIFHFVYK 

SEG 

PRD chhhhhhhhhheeecccccceeeeehhhhhhcccccccchhhhhhhhhccccceeeehhh 

SEQ RIKNDPYGI IFQNGEEDLTTS 

SEG 

PRD hhccccceeeeeccccccccc 



(No Prosite data available for DKFZphutel_20g21 . 2) 
(No Pfam data available for DKFZphutel_20g21 . 2) 
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DKFZphutel_20hl3 



group: intracellular transport and trafficking 

DKFZphutel_20hl3 encodes a novel 955 amino acid protein with similarity to alpha-adaptins . 

Adaptins are components of the adaptor complexes which link clathrin to receptors in coated 
vesicles. The alpha-adaptins, which are found exclusively, in endocytic coated vesicles, 
separate into two bands on SDS gels, designated A and C. The novel protein is very similar to 
both alpha adaptin A and C. The novel protein is a new human alpha-adaptin . 

The new protein can find application in modulating endocytosis and vesicle trafficking in 
cells . 



strong similarity to alpha-adaptins 

complete cDNA, complete cds start at Bp 78, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3352 bp 

Poly A stretch at pos . 3297, polyadenylation signal at pos . 3279 



1 GCGCCCGGTC CCCGCTTGCC AGCCCCCGCT GCTCTGTGCC CTGTCCGGCC 
51 AGGCCTGGAG CCGACACCAC CGCCATCATG CCGGCCGTGT CCAAGGGCGA 
101 TGGGATGCGG GGGCTCGCGG TGTTCATCTC CGACATCCGG AACTGTAAGA 
151 GCAAAGAGGC GGAAATTAAG AGAATCAACA AGGAACTGGC CAACATCCGC 
201 TCCAAGTTCA AAGGAGACAA AGCCTTGGAT GGCTACAGTA AGAAAAAATA 
251 TGTGTGTAAA CTGCTTTTCA TCTTCCTGCT TGGCCATGAC ATTGACTTTG 
301 GGCACATGGA GGCTGTGAAT CTGTTGAGTT CCAATAAATA CACAGAGAAG 
351 CAAATAGGTT ACCTGTTCAT TTCTGTGCTG GTGAACTCGA ACTCGGAGCT 
401 GATCCGCCTC ATCAACAACG CCATCAAGAA TGACCTGGCC AGCCGCAACC 
451 CCACCTTCAT GTGCCTGGCC CTGCACTGCA TCGCCAACGT GGGCAGCCGG 
501 GAGATGGGCG AGGCCTTTGC CGCTGACATC CCCCGCATCC TGGTGGCCGG 
551 GGACAGCATG GACAGTGTCA AGCAGAGTGC GGCCCTGTGC CTCCTTCGAC 
601 TGTACAAGGC CTCGCCTGAC CTGGTGCCCA TGCGCGAGTG GACGGCGCGT 
651 GTGGTACACC TGCTCAATGA CCAGCACATG GGTGTGGTCA CGGCCGCCGT 
701 CAGCCTCATC ACCTGTCTCT GCAAGAAGAA CCCAGATGAC TTCAAGACGT 
751 GCGTCTCTCT GGCTGTGTCG CGCCTGAGCC GGATCGTCTC CTCTGCCTCC 
801 ACCGACCTCC AGGACTACAC CTACTACTTC GTCCCAGCAC CCTGGCTCTC 
851 GGTGAAGCTC CTGCGGCTGC TGCAGTGCTA CCCGCCTCCA GAGGATGCGG 
901 CTGTGAAGGG GCGGCTGGTG GAATGTCTGG AGACTGTGCT CAACAAGGCC 
951 CAGGAGCCCC CCAAATCCAA GAAGGTGCAG CATTCCAACG CCAAGAACGC 
1001 CATCCTCTTC GAGACCATCA GCCTCATCAT CCACTATGAC AGTGAGCCCA 
1051 ACCTCCTGGT TCGGGCCTGC AACCAGCTGG GCCAG-TCCT GCAGCACCGG 
1101 GAGACCAACC TGCGCTACCT GGCCCTGGAG AGCATGTGCA CGCTGGCCAG 
1151 CTCCGAGTTC TCCCATGAAG CCGTCAAGAC GCACATTGAC ACCGTCATCA 
1201 ATGCCCTCAA GACGGAGCGG GACGTCAGCG TGCGGCAGCG GGCGGCTGAC 
1251 CTCCTCTACG CCATGTGTGA CCGGAGCAAT GCCAAGCAGA TCGTGTCGGA 
1301 GATGCTGCGG TACCTGGAGA CGGCAGACTA CGCCATCCGC GAGGAGATCG 
1351 TCCTGAAGGT GGCCATCCTG GCCGAGAAGT ACGCCGTGGA CTACAGCTGG 
1401 TACGTGGACA CCATCCTCAA CCTCATCCGC ATTGCGGGCG ACTACGTGAG 
1451 TGAGGAGGTG TGGTACCGTG TGCTACAGAT CGTCACCAAC CGTGATGACG 
1501 TCCAGGGCTA TGCCGCCAAG ACCGTCTTTG AGGCGCTCCA GGCCCCTGCC 
1551 TGTCACGAGA ACATGGTGAA GGTTGGCGGC TACATCCTTG GGGAGTTTGG 
1601 GAACCTGATT GCTGGGGACC CCCGCTCCAG CCCCCCAGTG CAGTTCTCCC 
1651 TGCTCCACTC CAAGTTCCAT CTGTGCAGCG TGGCCACGCG GGCGCTGCTG 
17 01 CTGTCCACCT ACATCAAGTT CATCAACCTC TTCCCCGAGA CCAAGGCCAC 
17 51 CATCCAGGGC GTCCTGCGGG CCGGCTCCCA GCTGCGCAAT GCTGACGTGG 
1801 AGCTGCAGCA GCGAGCCGTG GAGTACCTCA CCCTCAGCTC AGTGGCCAGC 
1851 ACCGACGTCC TGGCCACGGT GCTGGAGGAG ATGCCGCCCT TCCCCGAGCG 
1901 CGAGTCGTCC ATCCTGGCCA AGCTGAAACG CAAGAAGGGG CCAGGGGCCG 
1951 GCAGCGCCCT GGACGATGGC CGGAGGGACC CCAGCAGCAA CGACATCAAC 
2001 GGGGGCATGG AGCCCACCCC CAGCACTGTG TCGACGCCCT CGCCCTCCGC 
2051 CGACCTCCTG GGGCTGCGGG CAGCCCCTCC CCCGGCAGCA CCCCCGGCTT 
2101 CTGCAGGAGC AGGGAACCTT CTGGTGGACG TCTTCGATGG CCCGGCCGCC 
2151 CAGCCCAGCC TGGGGCCCAC CCCCGAGGAG GCCTTCCTCA GCCCAGGTCC 
2201 TGAGGACATC GGCCCTCCCA TTCCGGAAGC CGATGAGTTG CTGAATAAGT 
2251 TTGTGTGTAA GAACAACGGG GTCCTGTTCG AGAACCAGCT GCTGCAGATC 

23 01 GGAGTCAAGT CAGAGTTCCG ACAGAACCTG GGCCGCATGT ATCTCTTCTA 
2351 TGGCAACAAG ACCTCGGTGC AGTTCCAGAA TTTCTCACCC ACTGTGGTTC 

24 01 ACCCGGGAGA CCTCCAGACT CAGCTGGCTG TGCAGACCAA GCGCGTGGCG 
2 4 51 GCGCAGGTGG ACGGCGGCGC GCAGGTGCAG CAGGTGCTCA ATATCGAGTG 
2501 CCTGCGGGAC TTCCTGACGC CCCCGCTGCT GTCCGTGCGC TTCCGGTACG 
2551 GTGGCGCCCC CCAGGCCCTC ACCCTGAAGC TCCCAGTGAC CATCAACAAG 
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2 601 TTCTTCCAGC CCACCGAGAT GGCGGCCCAG GATTTCTTCC AGCGCTGGAA 

2 651 GCAGCTGAGC CTCCCTCAAC AGGAGGCGCA GAAAATCTTC AAAGCCAACC 

27 01 ACCCCATGGA CGCAGAAGTT ACTAAGGCCA AGCTTCTGGG GTTTGGCTCT 

27 51 GCTCTCCTGG ACAATGTGGA CCCCAACCCT GAGAACTTCG TGGGGGCGGG 

2 8 01 GATCATCCAG ACTAAAGCCC TGCAGGTGGG CTGTCTGCTT CGGCTGGAGC 

2851 CCAATGCCCA GGCCCAGATG TACCGGCTGA CCCTGCGCAC CAGCAAGGAG 

2901 CCCGTCTCCC GTCACCTGTG TGAGCTGCTG GCACAGCAGT TCTGAGCCCT 

2951 GGACTCTGCC CCGGGGGATG TGGCCGGCAC TGGGCAGCCC CTTGGACTGA 

3001 GGCAGTTTTG GTGGATGGGG GACCTCCACT GGTGACAGAG AAGACACCAG 

3051 GGTTTGGGGG ATGCCTGGGA CTTTCCTCCG GCCTTTTGTA TTTTTATTTT 

3101 TGTTCATCTG CTGCTGTTTA CATTCTGGGG GGTTAGGGGG AGTCCCCCTC 

3151 CCTCCCTTTC CCCCCCAAGC ACAGAGGGGA GAGGGGCCAG GGAAGTGGAT 

3201 GTCTCCTCCC CTCCCACCCC ACCCTGTTGT AGCCCCTCCT ACCCCCTCCC 

32 51 CATCCAGGGG CTGTGTATTA TTGTGAGCGA ATAAACAGAG AGACGCTAAA 

3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AA 



BLAST Results 



No BLAST result 



Medline entries 



89155572: 

Cloning of cDNAs encoding two related 100-kD coated vesicle proteins 
(alpha-adaptins) . 

97431776: 

Alpha-adaptin, a marker for endocytosis, is expressed in complex 
patterns during Drosophila 
development . 



Peptide information for frame 3 



ORF from 78 bp to 2942 bp; peptide length: 955 
Category: strong similarity to known protein 



1 MPAVSKGDGM 

51 DGYSKKKYVC 

101 LVNSNSELIR 

151 IPRILVAGDS 

201 MGVVTAAVSL 

251 FVPAPWLSVK 

301 QHSNAKNAIL 

351 ESMCTLASSE 

401 NAKQIVSEML 

451 RIAGDYVSEE 

501 GYILGEFGNL 

551 LFPETKATIQ 

601 EMPPFPERES 

651 VSTPSPSADL 

701 EAFLSPGPED 

7 51 LGRMYLFYGN 

801 QQVLNIECLR 

851 QDFFQRWKQL 

901 PENFVGAGII 

951 LAQQF 



RGLAVFISDI 
KLLFIFLLGH 
LINNAIKNDL 
MDSVKQSAAL 
ITCLCKKNPD 
LLRLLQCYPP 
FETISLIIHY 
FSHEAVKTHI 
RYLETADYAI 
VWYRVLQIVT 
IAGDPRSSPP 
GVLRAGSQLR 
SILAKLKRKK 
LGLRAAPPPA 
IGPPIPEADE 
KTSVQFQNFS 
DFLTPPLLSV 
SLPQQEAQKI 
QTKALQVGCL 



RNCKSKEAEI 
DI DFGHMEAV 
ASRNPTFMCL 
CLLRLYKASP 
DFKTCVSLAV 
PEDAAVKGRL 
DSEPNLLVRA 
DTVINALKTE 
RKK1VLKVAI 
NRDDVQGYAA 
VQFSLLHSKF 
NADVELQQRA 
GPGAGSALDD 
APPASAGAGN 
LLNKFVCKNN 
PTVVHPGDLQ 
RFRYGGAPQA 
FKANHPMDAE 
LRLEPNAQAQ 



KRINKELANI 
NLLSSKKYTE 
ALHCIANVGS 
DLVPMGEWTA 
SRLSRIVSSA 
VECLETVLNK 
CNQLGQFLQH 
RDVSVRQRAA 
LAEKYAVDYS 
KTVFEALQAP 
HLCSVATRAL 
VEYLTLSSVA 
GRRDPSSNDI 
LLVDVFDGPA 
GVLFENQLLQ 
TQLAVQTKRV 
LTLKLPVTIN 
VTKAKLLGFG 
MYRLTLRTSK 



RSKFKGDKAL 
KQIGYLFISV 
REMGEAFAAD 
RVVHLLNDQH 
STDLQDYTYY 
AQEPPKSKKV 
RETNLRYLAL 
DLLYAMCDRS 
WYVDTILNLI 
ACHENMVKVG 
LLSTYIKFIN 
STDVLATVLE 
NGGMEPTPST 
AQPSLGPTPE 
IGVKSEFRQN 
AAQVDGGAQV 
KFFQPTEMAA 
SALLDNVDPN 
EPVSRHLCEL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel^20hl3, frame 3 

PIR:B30111 alpha-adaptin C - mouse, N = 1, Score = 3990, P = 0 

PIR:S11276 alpha-adaptin c - rat, N = 1, Score = 3987, P = 0 

SWISSPROT:ADAC_RAT ALPHA-ADAPTIN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 2 
ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA MEMBRANE 
ADAPTOR HA2/AP2 ADAPT IN ALPHA C SUBUNIT)., N = 1, Score = 3982, P - 0 
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SWISSPROT:ADAC_MOUSE ALPHA- ADAPT IN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 
2 ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA 
MEMBRANE ADAPTOR HA2/AP2 ADAPT IN ALPHA C SOBUNIT)., N = 1, Score = 
3976, P = 0 

TREMBL:AB020706_1 gene: "KIAA0899"; product: "KIAA0899 protein"; Homo 
sapiens mRNA for KIAA0899 protein, partial cds., N = 1, Score = 3932, P 
= 0 



>PIR:B30111 alpha-adaptin C - mouse 
Length = 938 

HSPs : 

Score = 3990 (598.6 bits). Expect = 0.0e+00, P = 0.0e+00 
Identities = 787/955 (82%), Positives - 858/955 (89%) 

Query: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 
Sbjct: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

Query: 61 KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 120 

KLLFIFLLGHDIDFGHMEAVNLLSSN+YTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 
Sbjct: 61 KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 120 

Query: 121 ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 180 

ASRNPTFM LALHCIANVGSREM EAFA +IP+ILVAGD+MDSVKQSAALCLLRLY+ SP 
Sbjct: 121 ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP 180 

Query: 181 DLVPMGEHTARVVHLLNDQHMGVVTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 240 

DLVPMG+WT+RVVHLLNDQH+GVVTAA SLIT L +KNP++FKT VSLAVSRLSRIV+SA 
Sbjct: 181 DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 240 

Query: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 300 

STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP D AV+GRL ECLET+LNKAQEPPKSKKV 
Sbjct: 241 STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP-DPAVRGRLTECLETILNKAQEPPKSKKV 299 

Query: 301 QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 360 

QHSNAKNA+LFE ISLIIH+DSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 
Sbjct: 300 QHSNAKNAVLFEAISLIIHHDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 359 

Query: 361 FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 420 

FSHEAVKTHI +TVINALKTERDVSVRQRA DLLYAMCDRSNA+QIV+EML YLETADY+I 
Sbjct: 360 FSHEAVKTHIETVINALKTERDVSVRQRAVDLLYAMCDRSNAQQIVAEMLSYLETADYSI 419 

Query: 421 REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQIVTNRDDVQGYAA 480 

REEIVLKVAILAEKYAVDY+WYVDTILNLIRIAGDYVSEEVWYRV+QIV NRDDVQGYAA 
Sbjct: 420 REEIVLKVAILAEKYAVDYTWYVDTILNLIRIAGDYVSEEVWYRVIQIVINRDDVQGYAA 479 

Query: 481 KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 540 

KTVFEALQAPACHEN+VKVGGYILGEFGNLIAGDPRSSP +QF+LLHSKFHLCSV TRAL 
Sbjct: 480 KTVFEALQAPACHENLVKVGGYILGEFGNLIAGDPRSSPLIQFNLLHSKFHLCSVPTRAL 539 

Query: 541 LLSTYIKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 600 

LLSTYIKF+NLFPE KATIQ VLR+ SQL+NADVELQQRAVEYL LS+VASTD+LATVLE 
Sbjct: 540 LLSTYIKFVNLFPEVKATIQDVLRSDSQLKNADVELQQRAVEYLRLSTVASTDILATVLE 599 

Query: 601 EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTP STVSTPSPS 657 

EMPPFPERESSILAKLK+KKGP + L++ +R+ S D+NGG EP P S STPSPS 
Sbjct: 600 EMPPFPERESSILAKLKKKKGPSTVTDLEETKRERSI-DVNGGPEPVPASTSAASTPSPS 658 

Query: 658 ADLLGLRAAPP-PAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPIP 716 

ADLLGL A PP P PP S+G G LLVDVF A+ ++ P L+PG ED 
Sbjct: 659 ADLLGLGAVPPAPTGPPPSSGGG-LLVDVFSDSAS— AVAP LAPGSEDN 704 

Query: 717 EADELLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTVVHP 776 

+FVCKNNGVLFENQLLQIG+KSEFRQNLGRM++FYGNKTS QF NF+PT++ 
Sbjct: 705 FARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFIFYGNKTSTQFLNFTPTLICA 759 

Query: 777 GDLQTQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLP 836 

DLQT L +QTK V VDGGAQVQQV+NIEC+ DF P+L+++FRYGG Q +++KLP 
Sbjct: 760 DDLQTNLNLQTKPVDPTVDGGAQVQQVVNIECISDFTEAPVLNIQFRYGGTFQNVSVKLP 819 

Query: 837 VTINKFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDN 896 

+T+NKFFQPTEMA+QDFFQRWKQLS PQQE Q IFKA HPMD E+TKAK++GFGSALL+ 
Sbjct: 820 ITLNKFFQPTEMASQDFFQRWKQLSNPQQEVQNIFKAKHPMDTEITKAKI IGFGSALLEE 879 

Query: 897 VDPNPENFVGAGIIQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 955 
VDPNP NFVGAGI I TK Q+GCLLRLEPN QAQMYRLTLRTSK+ VS+ LCELL++QF 
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Sbjct: 880 VDPNPANFVGAGIIHTKTTQIGCLLRLEPNLQAQMYRLTLRTSKDTVSQRLCELLSEQF 938 
Pedant information for DKFZphutel_20hl3, frame 3 
Report for DKFZphutel_20hl3.3 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

[ FUNCAT ] 

YBL037w] 5e-67 

[ FUNCAT ) 

[ FUNCAT ] 

[FUNCAT] 

4e-04 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



955 

105361.97 
7.75 

PIR:A30111 alpha-adaptin A - mouse 0.0 

30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



08.19 cellular import [S. cerevisiae, YBL037w] 5e-67 

06.10 assembly of protein complexes [S. cerevisiae, YBL037w] 5e-67 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDR238c] 

heterodimer 0.0 
transmembrane protein le-65 
membrane trafficking 0.0 
receptor 0.0 
beta-adaptin 5e-16 
MYRISTYL 7 
IGMHC 1 
AMIDATION 1 
CK2_PHOSPHO_SITE 11 
TYR_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 15 
ASN_GLYCOSYLATION 1 
All_Alpha 

LOW COMPLEXITY 6.81 % 



SEQ MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhh 

SEQ KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 

SEG 

PRD hhhhhhhcccccccchhhhhhhhhcccccchhhhhhhhhhhhhcchhhhhhhhhhhhhcc 

SEQ ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 

SEG 

PRD cccccchhhhhhhhhhccchhhhhhhhhhhhhheeeccccchhhhhhhhhhhhhhhhhcc 

SEQ DLVPMGEWTARWHLLNDQHMGVVTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 

SEG 

PRD cccccccchhhhhhhhhcccceeeehhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcc 

SEQ STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 

SEG 

PRD ccccccceeeecccchhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccccc 

SEQ QHSNAKNAILFETISLIIHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 

SEG 

PRD cccccchhhhhhhhhhhhhcccccceeeeehhhhhhhhhhccccceeeehhhhhhhhhcc 

SEQ FSHEAVKTHIDTVINALKTERDVSVRQRAADLLYAMCDRSNAKQIVSEMLRYLETADYAI 

SEG 

PRD cchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccch 

SEQ REEIVLKVAILAEKYAVDYSWYVDTILNLIRIAGDYVSEEVWYRVLQI VTNRDDVQGYAA 

SEG 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhccccchhhhhhhheeeccccchhhhhh 

SEQ KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 

SEG 

PRD hhhhhhhhhhcccccceeeeeeeecccccccccccccccchhhhhhhhhhhcccchhhhh 

SEQ LLST YI KFINLFPETKAT I QGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 

SEG 

PRD hhhhhhhhhhccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhhh 

SEQ EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTPSTVSTPSPSADL 

SEG KXXXXXXXXXXXXXX 

PRD hccccccchhhhhhhhhhccccccccccccccccccccccccccccccccccccccccce 

SEQ LGLRAAPPPAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPIPEADE 

SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . 
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PRD eecccccccccccccccccceeeeeeccccccccccccccceeecccccccccccccccc 

SEQ LLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTVVHPGDLQ 

SEG 

PRD cceeeeeccccccchhhhhhhhcchhhhhccccceeeccccccccccccceeeeccchhh 

SEQ TQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLPVTIN 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccchhhhhhhhhhccccccccceeeeeeccccccccccccccccc 

SEQ KFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDNVDPN 

SEG 

PRD cccccchhhhhhhhhhhhhhhchhhhhhhhhhhcccchhhhhhhhhhccccceeeecccc 

SEQ PENFVGAGIIQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 

SEG 

PRD ccceeeceeeeeccccceeeeecccchhhhhhhhhhhccccchhhhhhhhhhccc 



Prosite for DKFZphutel_20hl3 . 3 
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->790 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


819 


->822 


PKC PHOSPHO 


"site 


PDOC000C5 


PS00005 


832 


->835 


PKC PHOSPHO 


"site 


PDOC000C5 


PS00005 


935 


->938 


PKC PHOSPHO 


site 


PDOC00005 


PS00005 


938 


->941 


PKC_PH05PH0" 


'site 


PDOC00005 


PS00006 




5->9 


CK2 PHOSPHO" 


site 


PDOC00006 


PS00006 


104 


->108 ' 


CK2_PH0SPH0~ 


"site 


PDOC00006 


PS00006 


368 


->372 


CK2 phospho" 


"site 


PDOC00006 


PS00006 


379 


->383 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


470 


->474 


CK2 PHOSPHO" 


site 


PDOC00006 


PS00006 


482 


->486 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


597 


->601 


CK2_PH0SPH0^ 


"site 


PDOC00006 


PS00006 


626 


->630 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


636 


->640 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


698 


->702 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


938 


->942 


CK2 PHOSPHO 


"site 


PDCC00006 


PS00007 


388 


->395 


TYR PHOSPHO 


"site 


PDCC00007 


PS00007 


411' 


->419 


TYR PHOSPHO" 


"site 


PDCC00007 


PS00007 


434 


->443 


TYR PHOSPHO" 


site 


PDOC00007 


PS00008 


202 


->208 


MYRISTYL 




PDOC00008 


PS00008 


508 


->514 


MYRISTYL 




PDOC00008 


PS00008 


561- 


->567 


MYRISTYL 




PDOC00008 


PS00008 


623 


->629 


MYRISTYL 




PDOC00008 


PS00008 


759 


->765 


MYRISTYL 




PDOC00008 


PS00008. 


826 


->832 


MYRISTYL 




PDOC00008 


PS00008 


908 


->914 


MYRISTYL 




PDOC00008 


PS00009 


630 


->634 


AMIDATION 




PDOC00009 


PS00290 


127 


->134 


IG MHC 




PDOC0C262 
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DKFZphutel_20mll 



group: cell cycle 

DKFZphutel_20mll encodes a novel 225 amino acid protein with similarity to yeast sds22 and 
protein phosphatase-1 regulatory subunits. 

sds22 is a regulatory polypeptide of protein phosphatase-1 that is required for the completion 
of mitosis in both fission and budding yeast. The novel protein seems to be a new regulator 
protein for protein phosphatase-1. 

The new protein can find application in modulating/blocking the activity of protein 
phosphatase-1 and in modulating the cell cycle. 



similarity to suppressor protein sds22 

complete cDNA, complete cds, EST hits 
localisation? only a part of the 5T5 matches 



Sequenced by AGOWA 
Locus: /map="17"? 
Insert length: 5822 bp 

Poly A stretch at pos. 5803, polyadenylation signal at pos. 5786 



1 GGGCGCTTGG TTCCCCAGCA ACCGGGAGAC GCGTCTGCTG CGTGGAACCG 
51 CCGAGTTCCC AGCGCTTGAG AAGGAAAATT CTGGATCTGT TATCTGTGAG 
101 GAGGCCACTC CGTTGACAGT TGTGTAAAAC TCTGCTGCTT TCCCCAGCTC 
151 CAACCTCTCT GGTCTTCAAC AACACTATCA TCAGGGAAAA CGTGGGGGAA 
201 GATGAACCAG CCGTGCAACT CGATGGAGCC GAGGGTGATG GACGATGACA 
251 TGCTCAAGCT GGCCGTCGGG GACCAGGGCC CCCAGGAGGA GGCCGGGCAG 
301 CTGGCCAAGC AGGAGGGCAT CCTCTTCAAG GATGTCCTGT CCCTGCAGCT 
351 GGACTTTCGG AACATCCTCC GCATAGACAA CCTCTGGCAG TTTGAGAACT 
401 TGAGGAAGCT GCAGCTGGAC AATAACATCA TTGAGAAGAT CGAGGGCCTG 
4 51 GAGAACCTCG CACACCTGGT CTGGCTGGAT CTGTCTTTCA ACAACATTGA 
501 GACCATCGAG GGGCTGGACA CACTGGTGAA CCTGGAGGAC CTGAGCTTGT 
551 TCAACAACCG GATCTCCAAG ATCGACTCCC TGGACGCCCT CGTCAAGCTG 
601 CAGGTGTTGT CGCTGGGCAA CAACCGGATT GACAACATGA TGAACATCAT 
651 CTACCTCCGG CGGTTCAAGT GCCTGCGGAC GCTCAGCCTC TCTAGGAACC 
7 01 CTATCTCTGA GGCAGAGGAT TACAAGATGT TCATCTGTGC CTACCTTCCT 
751 GACCTCATGT ACCTGGACTA CCGGCGCATT GATGACCACA CAGCAAGTGT 
801 CTCCCTCTCA GTCTCCCAGC CCTGTGAGAC AGATTCCTCA AGCCCCCAGG 
851 TTTCTTGGAA AAGGGGCATT GAAGAGTAGC TTCCCCTGCC CACAACTAGG 
901 AGAGAAAGGG CAGCTCCCTC TTCCTAATCC CTTTACCTGA CTCTGTCAGA 
951 GTGATTCCAG CAGCACCCTT GTAAGTACTG TTTTGTGTGC GTTCCCAGGG 
1001 GCCAGGCCTC TTCCACACAC TGTCCCAGGG CCACCTCACA GCCATCCTGC 
1051 ACTGTCTAGT TTTCCAGATG AAGAAGCTGA GGAGGGCTGG GAGCAGTGGC 
1101 TCACGCCTGT AATCCCAGCA CTTTGAGAGG CTGAGGCGGG AGGATCGCTT 
1151 GAGCCAAGGA GTTCAAGACC AGCCTGGGCA ACATAGGGAG ACCCCATCTC 
1201 TACAGAAACT ACCAAAATTA GCCAGGTGTG GTGGCACACA CCAGTAATCC 
1251 TGGCTACTCA CAAGGCCGAG GTAGAAGAAT CGCTTGAGAC TAGGAGTTTG 
1301 AGGCTGCAGT GAACTAAGAA GATGCCATTG CACTCCAGCC TGGGCAACAG 
1351 AGTGAAAAAA TTAAAAAATT AGAAAAGAAA AGAAGTTGAG GAGGCCCAAG 
14 01 GAGGGCAAGC AGCCAGGATC ACTGGCTCAA GGCCAAGCCA GGATTCACCC 
14 51 TAAGTTGGTG TCATCCCAGG AGCAATATTA ACAGCTGAGC TCCAGAGGGA 
1501 ACCAGGCCAT CAGAGGCTCA GGCCTGGCTC TC AGGGGC AG AGTCAGGGCT 
1551 GGAGGTAGAG ACCTGAGTGT CATCTGAGGA TTGCCAATTG GCAGTAGTTG 
1601 AAGCCATGGT ACAGGTGGGA TCACCTGGGG CACATGGAGT GAGCTGGGGG 
1651 ACGGGGACTA AGTTCTAGAG GTGCCAGCAT TCCTGGCCAG GTACAGGGGG 
1701 ATGAGCCAGT GCGGTGGAGA GAGCCAAGGG CCAGACCCTC GTGACCAGCC 

17 51 CTATGGCCTC ACTCTACCTC TGTCCTGTTG TCCTCCTTCC CTAAAAGAGG 
1801 GCCAGAAGGC CTGCTGAGGG CTGTTGGGAG TGAGAGAGCA AGTCCTCTGT 

18 51 GGAGAACACC CAGTCTGGGG CGAGGGGAGC GCTCCATTGC TGTGGCTCCT 
1901 GCCCTGGAGA TGGCCCCGGG AACCCCAGCC TGCCACGCTG CCTTCCGCTC 
1951 CTCCTGGTCT TTCCCTGATT TCCCTGCGCT CACAAAAACC TGGTGAGGGT 
2001 CATCAGGAGA TGGGCATTCT CATCCACGAG ACCTCATGGC TTTCACAGCC 
2051 TTCATGCAGG CCCCTGTGCA ACACCCCTGC CCATGCGCGG GAGGCTGCAG 
2101 CATGGCAGAG GCGGCATGGC AGAGGCGGTG TGGCTCGGAG GAACCTCTGG 
2151 TAACAATGCC ACTCCCGTTC CCTGGTCAGA AAAAGCTTGC GGAGGCTAAG 
2201 CACCAGTACA GCATCGACGA GCTGAAGCAC CAGGAGAACC TGATGCAGGC 
2251 CCAGCTGGAG GACGAGCAGG CGCAGCGGGA GGAGCTAGAG AAGCACAAGA 
2301 CTGCGTTTGT GGAACACCTG AATGGCTCCT TCCTGTTTGA CAGCATGTAC 
2351 GCTGAGGACT CAGAGGGCAA CAATCTGTCC TACCTGCCTG GTGTCGGTGA 
2401 GCTCCTTGAG ACCTACAAGG ACAAGTTTGT CATCATCTGC GTGAATATTT 
2451 TTGAGTATGG CCTGAAACAG CAGGAGAAGC GGAAAACAGA GCTTGACACC 
2501 TTCAGTGAAT GTGTCCGTGA GGCCATCCAG GAAAACCAGG AGCAGGGCAA 
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2 551 ACGCAAGATT GCCAAATTCG AGGAGAAGCA CTTGTCGAGT TTAAGTGCCA 
2 601 TTCGAGAGGA GTTGGAACTG CCCAACATTG AGAAGATGAT CCTAGAATGC 
2 651 AGTGCTGACA TCAGTGAGTT GTTCGATGCG CTCATGACGC TGGAGATGCA 
2701 GCTGGTGGAG CAGCTGGAGG TAAGGCTGGG CCCTGGGCAC AAGTGCCAGA 
27 51 ATCTGGCGAT GCAGCTGCAC ATCCATAGGT GAACTGTAGC CTTCATGGGC 
2801 ACGCCTCTGC TGGAAACGTC CAGCACGACT CAGCGTGGCA GGCTGTAGCT 
2 8 51 TTCTTGCTCA TCAGTCCTGT TTGCTTTTAT TACATTTTAA TCATTTACAT 
2901 TGGAAGTGAT TCTTGTGGAA AATGAGAGGT GAGCTCATTC TTCTGAAATG 

2 951 GTCCCCCTAT CCTGGAAGTC AGTGGGGAGA GGTTTTTGAT TAGACCCCTG 
3001 GAGCTATCCG GGTACTCTAA AGGCAAAGCG CACCCCCACT TGGGGACCAA 
3051 ACAAAGACCC CTCCGCATTG CAGCCTGCAG TTGCCGCTTC TCAGGTGACG 
3101 TGAGGAGGCT GCAACTCAGC ACTAAGTAGT GAAAATGAAA AGCGCCGCTG 
3151 TCTGAAATTC ATTAGCAGCC AGAGTATGTG TTACAAGGCA GCGGAGGCTG 
3201 GGAGTCTGAA GTGGTGTGAT GAATTGAACC TCATCGGATG CTGCTGTGGC 
3251 TGGGCCAAGT GATAGCACCT AATCAATTCC TCACACGTCA AGTGACACCT 
3301 CAGACATGGG ATAGATTTCC CCATCACATC ACAGGGCAGG TGCTCCCTCC 
3351 CTGCTGGAGA GCACAGGCAC TGCAGAAGCA GCGCACAGTG CCAGGGGCGA 
3401 GTGAGGCAGC AGCTCCCAGC CTTTTCAGGC ACGGAGATTG CCTTTCAACA 
3451 TCCAAACATT TCCCAGAACC CATGTGCCAT CCTACTTGTA TTACTGGTGG 
3501 CCAGAAAGCC ACAAGCGCAA TCATGCTTTT CAATGACCCT ATTTTTATTC 
3551 ACGAGAACAG CACATACATG TGTTTGAAAA TTATGTGAGG TGCTCACTCT 

3 601 GCAGACAGTA CTCACATTCC TATAGATTCC ACCCCTGCCC ACCTTGCAGC 
3651 CCCTGGAGTC TATAGCAGAT GGGAGTGGGG CACTCCGAGA GTGGCAGGCC 
3701 TGGAGATCAC ATCTTCCATT GTTCCTTCAA T C A AC AC T AA CTCCCATTTG 
3751 GGCCTTAGGT GCCTTGCTAA GCACCACAAA ACAGCAACTA ACTGAAAGAG 
3801 ATCTGGAGTG CCAGCCCGCT CCTACTGAGG GCCTCCTCTC TGTCAGGCAC 
3851 CTTGCAAAGC ATTTTGTGTG AAGTGACTCA TTTAACCTCA CCACAACGCC 
3901 ACAACGCAGG GATTATGCAG GTAACCTATT TCCCAGATGA GGAAGATAAG 
3951 GCCCAAGGAG GTGAAATGCC TTTCCCAGAG TTACACAGAG TGCTGGAGCT 
4001 GGGAATACTG ACCCAGGCAG TCTAGCTCTT AACAGCTCAC TCCACTGTTT 
4051 CCCTGGAGGT GATGCACAGA TGTCACTGGG AAACCCAAAG GAGAGGGGGT 
4101 TGGCTGTGTG TGTGTGTGTT GGGCAGGCAG GTAAGGGGAG TAAGACCAGG 
4151 ACAAGTGTTC CTGGCAAAGT TCCGGTGACA GCATTAAACA TTCAGATGGT 
4201 GAGGGAGTTA ATATGGTTGG AGAACAACAA CTTTAGAGAG AGCAGAGGGG 
4251 TCAGTTCACA ACCATCTGCT CAGGAGGGTC AAGATGGGTG GTCTTTATGC 
4301 TGAAGGTCTG TGATTAGAGG AGCTGGTTGC TAAATTTTGA GGAGTACCTT 
4351 TTGCTCTGTG CTGGACATCT AAATATGCAT GTTAACTGTG TTCTTTAACA 
4401 TTTCCAGGAG ACTATAAACA TGTTTGAAAG GAACATTGTT GACATGGTAG 
4451 GACTGTTTAT CGAAAATGTC CAAAGCCTAT ATCCTTTCTG TGATGACCTT 
4501 CCCCATGGGG AGGTGCTACA GAGCCCCTGG GCTTGTCCCG GCCTCTGGAC 
4551 AAAAGAATGT TCCACAGGGT CTGAGGAGGT TTCCCGACCC TCAGAACAAT 
4601 GATGGCCTGG TTAGAGCTGT GGTTTGGATG CCCAGAGGGA CAACATCCAA 
4651 ACTGTTTGCA GTAGGCTCCC AGCATGATTG TTCTCATATG AGTGATGTTC 
4701 ACTAGGAAAT GACGCCCCCT GTGTTGCAGG CAAGCACACT CTGGGGTTGA 
47 51 GGCAACCCCC ACGTGGAAGA CACTATAAGG AGTACATCAG GTGAAATGTT 
4801 AGGGTGAGGA GCCAACATCG GAGCATGGCC AACCCTTCTT CCACCCGAAC 
4851 TCAGGGCACT CCACATGGGG CAAACTGCTG TGCTCCAGCT AGCAGCAGCC 
4901 CTGTGGTCCT GCCCTCCTGG GGCTCACAGT CCCTCAGGGA GACAAGTTGT 
4951 AGAGGCAACA AGTGGTGCCA AATGCACAGG GTGAGAAGCA GTTAACCCAG 
5001 AGGCCAGGAG CCTCCATGCA GGAGGGAGAG AAGAGTGTGA TGGCAGGGGC 
5051 CGAGGGTCCG TCCGAGGTGT GGGGCAGGGG CAGGGAGTCG AGGAAGGCCC 
5101 AGGGTTCGGA GCTTGTGAGT GGACGGTGCT GCCAGCCAGA ATTTCCGAGC 
5151 TCGCCTTGGG CCCTTAAAGT CTGTCTCCCG CCGTCTGAGA GCATCAGGGA 
5201 CGCGCCGGGC CTGCTCCTCC CGGGCCTTTG CTTAACTCGG GGCTGCACGA 
5251 TGGCTCAGTG CCGGGACCTG GAGAATCACC ACCACGAGAA GCTCCTGGAG 
5301 ATCTCTATCA GCACCCTGGA GAAGATTGTC GAGGGCGACC TGGACGAGGA 
5351 CCTGCCTAAC GACCTGCGCG CGCTTTTTGT CGATAAAGAT ACGATTGTTA 
54 01 ATGCTGTCGG GGCATCGCAC GACATCCACC TCCTGAAGAT TGACAATCGA 
5451 GAAGATGAGC TGGTGACCAG AATCAACTCT TGGTGTACAC GTTTAATAGA 
5501 CAGGATTCAC AAGGATGAGA TCATGAGGAA CCGCAAGCGC GTGAAGGAGA 
5551 TCAATCAGTA CATCGACCAC ATGCAGAGCG AACTGGACAA CCTGGAATGT 
5601 GGCGACATCC TAGACTAGAT GAATGTCAGC CACAGGAGCT TCTTCAAAAC 
5651 ATAGCACCAG CCCCAGCCAG GAGAAGGAAG TGCACACGCC TCACCCGCAC 
5701 CTCTAGAGAG TTGCTGGGCA TCTCTCAACC GCGATCCCCA ACACCATTCT 
57 51 TCCCCCACCC CTGGAAAAAC TTCCAAAAGT AGAGAAAATA AAGGACTCAT 
5801 TTCACAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry HS1292248 from database EMBL: 
human STS SHGC-53917. 
Score = 874, P = 3.3e-33, identities = 180/185 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 876 bp; peptide length: 225 
Category: similarity to known protein 



1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 

51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 

101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRIDNMMNII 

151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 

201 SLSVSQPCET DSSSPQVSWK RGIEE 



Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL : HSSDS22MR_1 gene: "sds22"; 

product: "yeast sds22 homolog"; H. sapiens sds22-like mRNA 

Score = 234, P = 1.2e-19, identities = 61/143, positives = 93/143 

Entry A38439 from database PIR: 

suppressor protein sds22(+) - fission yeast (Schizosaccharomyces pombe) 
>TREMBL : 5PSD522_1 gene: "sds22+"; S. pombe sds22+ gene, complete cds. 
Score = 208, P = 5.6e-17, identities = 52/127, positives = 71/127 

Entry S43988 from database PIR: 

protein suppressor sds22 - fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PP1 REGULATORY SUBUNIT 
SDS22. >TREMBL:SPAC4A8_12 gene: "sds22"; product: "phosphatases ppl 
regulatory subunit"; S. pombe chromosome I cosmid c4A8. 
Score = 208, P = 8.5e-17, identities = 52/127, positives = 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2. 

Score = 214, P = 3.6e-16, identities = 50/125, positives = 75/125 



BLASTP hits 



Alert BLASTP hits for DKFZphutel_20mll, frame 1 



No Alert BLASTP hits found 



Pedant information for DKFZphutel_20mll, frame 1 



Report for DKFZphutel_20mll . 1 



palmitylation, 



[FUNCAT] 
[FUNCAT] 



[LENGTH] 
[MW] 



[pi] 



[HOMOL] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 



225 

25955.87 
4.63 

PIR:S68209 sds22 protein homolog - human le-18 

03.22 cell cycle control and mitosis [S. cerevisiae, YKL193c] 2e-ll 
30.10 nuclear organization [S. cerevisiae, YKL193c] 2e-ll 
06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YKL193c] 2e-ll 

30.05 organization of centrosome [S. cerevisiae, YOR373w] 2e-06 

01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 



YJL005w] 3e-05 



[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 



[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[EC] 



03.10 sporulation and germination [S. cerevisiae, YJL005w] 3e-05 

30.02 organization of plasma membrane [S. cerevisiae, YJL005w] 3e-05 

10.04.03 second messenger formation [S. cerevisiae, YJL005w] 3e-05 
04.07 rna transport [S. cerevisiae, YPL169c] 9e-04 

04.05.01.04 transcriptional control [S. cerevisiae, YCR065w] 9e-04 

4.6.1.1 Adenylate cyclase 2e-06 

nucleus 5e-16 

duplication 2e-06 

tandem repeat 2e-06 

cAMP biosynthesis 2e-06 

glycoprotein 2e-06 

phosphorus-oxygen lyase 2e-06 

leucine-rich alpha-2-glycoprotein repeat homology 5e-16 
fibromodulin 3e-07 

yeast adenylate cyclase catalytic domain homology 2e-06 
yeast adenylate cyclase 2e-06 
CK2_PHOSPHO_SITE 2 
PKC PHOSPHO SITE 1 
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SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPQEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNIIEKIEGLENLAHLVWLDLSFNNIETIEGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ ISKIDSLDALVKLQVLSLGNNRIDNMMNIIYLRRFKCLRTLSLSRNPISEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRIDDHTASVSLSVSQPCETDSSSPQVSWKRGIEE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 



Prosite for DKFZphutel_20mll . 1 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 

PS00006 122->126 CK2_PHOSPHO_SITE PDOC00006 

PS00006 169->173 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_20ml 1 . 1 ) 
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DKFZphutel_2 0m24 



group: metabolism 

DKFZphutel_2 0m24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 



strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 is involved in the assembly of the core oligosaccharide 
Glc3Man9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8 95 4 corresponding genomic DNA (1 exon ) 

Sequenced by AGOWA 

Locus: /map="ll" 

Insert length: 1986 bp 

Poly A stretch at pos. 1966, polyadenylation signal at pos. 1949 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 



TTCTTTTTTC 
TGAAGGGCAG 
CTGCGGGAGC 
CGAGTTATCT 
CTGCTTTCAA 
AACATCTCTG 
CCTCATCTAT 
CCATTCGCTC 
CATGCAAGAA 
ATGTCTTCTG 
CTGTGTGCAA 
TTGGTTCTCA 
TAGCTTCTGT 
ACAAGACTTC 
TGGCCATTCA 
CATGAAACAC 
TACTATTTCT 
TTGGTGATTG 
TGGACCTGAT 
GATTTCTGAA 
CTGACTTCTC 
AGGCCACCCG 
TTTTCTTCAT 
CCACTTATAT 
TTACCACTTT 
CGAATTGGCT 
TCTCGCTCTG 
TCCAGAATTT 
AAGGCAGACC 
AGCAGCTTCC 
GTTCAGAGGT 
GGATTGTTCC 
TATATTGATA 
AGAAACACCC 
GCTTGGCCTA 
CGGGCATTCT 
CTACACCATC 
GAGGTTAGCA 
TTGATTCCAG 
ATAAAGGTCT 



CCCAGGCTTG 
CGGGGCCAGC 
TGCTGGGCAG 
GGGAACAAAG 
GTGTCTGCTT 
ACTGTGATGA 
GGGGAAGGGT 
CTATGCTTAC 
TTCTACAAAC 
GCTTTTGTGA 
GAAGTTTGGG 
GCACTGGCAT 
ATGTACACTA 
CATTGCTGTG 
GTGCAGCTCT 
AGGTGGAAGA 
GGTGCCTGTG 
CACCACTCAA 
CTTTATGGTA 
TTTCAATGTA 
TTATGGAATA 
TATTGGCTTA 
CCAGCCTCAC 
GTCTCTGTGG 
GTGTTTCAAC 
GGCATTAGGA 
TGGCACTGTT 
TACCGAATTG 
TGTGAATGTC 
TTCTTCCTGA 
CAGTTACCAA 
TACTGACATG 
TCAGTAAATG 
CGGGAGCCAA 
TAGACCATTC 
ATGTCCCCTT 
CTCAAACCCC 
ACACACCTGT 
TGACCTGACT 
TCTGACATGA 



CCATGGCTAG 
AGTGGGGATA 
CCGAGAGGCG 
CAGGACAAGT 
TCAGCAAGGT 
AACATTCAAC 
TTCAGACTTG 
CTGTTGCTTC 
TAATAAGATT 
GCTGTATTTG 
TTGCACGTGA 
GTTTTGCTCA 
CGTTGATAGC 
CTGGGAGTAG 
TGGTTTACCC 
GTTTCTTTCA 
GTGGTCATTG 
CATTGTTTTG 
CAGAACCCTG 
GCCTTTGCTT 
CCTGCTGCAG 
CCTTGGCTCC 
AAAGAGGAGA 
CCCTGTGGCT 
GATATCGCCT 
ACTGTCTTCC 
CAGAGGATAT 
CTACAGACCC 
TGTGTGGGAA 
CAATTGGCAG 
AACCTTTTGC 
AATGACCAGA 
CCATTATTTA 
AATATTCATC 
CTTGATGCTT 
CCTGTCAGAT 
GGAAAGCAAA 
GGCCCCAAAG 
CCCTGCAAGT 
AAAAAAAAAA 



TCGAGGGGCT 
CGGCCCCGGC 
GGCGGCGCGG 
CTGGGCACCT 
TATGTGCTGC 
TACTGGGAGC 
GGAATATTCC 
ATGCCTGGCC 
CTTGTGTTTT 
TGAACTTTAC 
GTCGAATGAT 
TCATCAGCAT 
CATGACTGGA 
CAGCTGGGGC 
ATTGCCTITG 
TTGGTCGCTG 
ACAGCTACTA 
TATAATGTCT 
GTATTTCTAT 
TGGCTCTCCT 
AGATTTCATG 
AATGTATATT 
GATTTCTTTT 
CTCTCTGCAC 
GGAGCACTAT 
TGTTTGGGCT 
CACGGGCCCC 
AACCATCCAC 
AAGAGTGGTA 
CTTCAGTTCA 
AGAAGGACCT 
ATCTAGAAGA 
GTGGATTTGG 
CAATAAAGAA 
CTAGATCTTC 
CAGTATACAG 
GCAAATCAGG 
GACAACCATC 
CATCGCCTGT 
AAAAAA 



CGGCAGCGCC 
TGCGGACAAG 
AGCACCGGAC 
GAAGGATCTA 
TCTCCTGAGC 
CAACACACTA 
CCAGCATATG 
AGCTGCATTT 
ACTTTTTGCG 
TTTTACAAGG 
GCTAGCCTTC 
TCCTTCCTAG 
TGGTATATGG 
TATCTTAGGC 
ATTTGCTGGT 
ATGGCCCTCA 
TTATGGGAAG 
TTACTCCTCA 
TTAATTAATG 
AGTCCTACCA 
TTCAGAATTT 
TGGTTTATAA 
CCCTGTGTAT 
TTCAGAAATG 
ACTGTGACAT 
CTTGTCATTT 
TTGATTTGTA 
ACTGTCCCAG 
TCGATTTCCC 
TTCCATCAGA 
CTGGCCACCC 
GCCATCCAGA 
ACACCATGAG 
GAATGGATCA 
AAAGCTGCTG 
TGTACGTAAA 
AAGAAAAGTG 
TTGTTAACTA 
AACATTTGTA 



BLAST Results 



Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJ159ol, complete sequence. 
Length = 42,771 

Entry HSB8954 from database EMBL: 
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cSRL- 50A3-U cSRL flow sorted chromosome 11 specific cosmid Homo 
sapiens genomic clone CSRL-50A3. 
Length = 601 



Medline entries 



96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharomyces cerevisiae: 
identification of the ALG9 gene encoding a putative 
mannosyl transferase. 



Peptide information for frame 2 



ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 



1 MASRGARQRL 
51 GQVWAPEGST 
101 QTWEYSPAYA 
151 CICELYFYKA 
201 LIAMTGWYMD 
251 FFHWSLMALI 
301 EPWYFYLING 
351 LAPMYIWFII 
401 YRLEHYTVTS 
451 TDPTIHTVPF, 
501 PFAEGPLATR 
551 YSSNKEEWIS 
601 KAKQIRKKSG 



KGSGASSGDT 
AFKCLLSARL 
IRSYAYLLLH 
VCKKFGLHVS 
KTSIAVLGVA 
LFLVPVVVID 
FLN FNVAFAL 
FFIQPHKEER 
NWLALGTVFL 
GRPVNVCUGK 
IVPTDMNDQN 
LAYRPFLDAS 
G 



APAADKLREL 
CAALLSNISD 
AWPAAFHARI 
RMMLAFLVLS 
AGAILGWPFS 
SYYYGKLVIA 
ALLVLPLTSL 
FLFPVYPLIC 
FGLLSFSRSV 
EWYRFPSSFL 
LEEPSRYIDI 
RSSKLLRAFY 



LGSREAGGAE 
CDETFNYWEP 
LQTMKILVFY 
TGMFCSSSAF 
AALGLPIAFD 
PLNI VLYNVF 
MEYLLQRFHV 
LCGAVALSAL 
ALFRGYHGPL 
LPDNWQLQFI 
SKCH YLVDLD 
VPFLSDQYTV 



HRTELSGNKA 
THYLIYGEGF 
FLRCLLAFVS 
LPSSFCMYTT 
LLVMKHRWKS 
TPHGPDLYGT 
QNLGHPYWLT 
QKCYHFVFQR 
DLYPEFYRIA 
PSEFRGQLPK 
TMRETPREPK 
YVNYTILKPR 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_20m24, frame 2 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score = 957, P = 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N - 1, Score = 533, P = 2.3e-51 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score = 957, P = 2.7e-96 

PIR:S63177 mannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-51 



>SWISSPROT: YTH3_CAEEL HYPOTHETICAL 7 5.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
1 1 . 

Length = 653 

HSPs: 



Score 


= S57 


(143.6 bits), Expect = 2.7e-96, P = 2.7e-96 




Identities = 


= 206/514 (40%), Positives = 296/514 (57%) 




Query : 


48 


NKAGQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSP 


107 






N W + FK LLS R+ A+ I+DCDE +NYWEP H + YGEGFQTWEYSP 




Sbjct: 


43 


NNPDNDWPFS FGS VFKMLLS I RI SGAIWGIINDCDEVYNYWEPLHLFL YGEGFQTWEYSP 


102 


Query : 


108 


AYAIRSYAYLLLHAWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGL 


167 






YAIRSY Y+ LH PA+ A + KI+VF +R + + E Y + A+CKK + 




Sbjct: 


103 


VYAIRSYFYIYLHYIPASLFANLFGDTKIVVFTLIRLTIGLFCLLGEYYAFDAICKKINI 


162 


Query: 


168 


HVSRMMLAFLVLSTGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGW 


227 






R + F + S+GMF +S+AF+PSSFCM T + + + + + VA ++GW 




Sbjct: 


163 


ATGRFFILFSIFSSGMFLASTAFVPSSFCMAITFYILGAYLNENWTAGIFCVAFSTMVGW 


222 


Query : 


228 


PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLY 


287 
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PFSA LGLPI D+L++K F SL+ + V+ DS+Y+GK V+APLNI LY 

Sbjct: 223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

Query: 288 NVFTPHGPDLYGTEPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 347 

NV + GP LYG EP FY+ N F N+N+ A PL+ + Y + + Q+ 

Sbjct: 283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNI VI FAAPFGFPLS — LAYFTKVWMSQDRNVAL 340 

Query: 348 WLTLAPMYI WFI IFFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 400 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 
Sbjct: 341 YQRFAPIILLAVTTAAWLLIFGSQAHKEERFLFPIYPFIAFFAALALDATNR LCLKK 397 

Query: 401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPE 460 

++ N L+ + + F +LS SR+ ++ Y +++Y T+ T + 

Sbjct: 398 LGMD NILSILFILCFAILSASRTYSIHNNYGSHVEI YRSLNAELTNRT-NFKNF 450 

Query: 461 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFIPSEFRGQLPKPFAEGPL ATRI 511 

P+ VCVGKEW+RFPSSF +P +++FI SEFRG LPKPF + TR 

Sbjct: 451 HDPIRVCVGKEWHRFPSSFFI PQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

Query: 512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
Sbjct: 511 IPTEMNNLNQEEISRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQNY 556 



Pedant information for DKFZphutel_20m2 4 , frame 2 



Report for DKFZphutel_20m24 . 2 



[LENGTH] 611 

[MW] 69863.78 

[pi] 8.91 

[HOMOL] SWISSPROT: YTH3CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2e- 
93 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YNL219c] 4e-69 

[FUNCAT1 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YNL219c] 
4e-69 

[FUNCAT] 01.05.01 carbohydrate utilization (S. cerevisiae, YNL219c] 4e-69 

[PIRKW] glycosyltrans f erase 9e-68 

[PIRKW] transmembrane protein 9e-68 

[PIRKW] hexosyltransferase 9e-68 

[PROSITE] MYRISTYL 9 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] PKC_PIIOSPIIO_SITE 6 

[PROSITE] ASN_GLYCOS YLATION 2 

[KW] TRANSMEMBRANE 7 

[KW] LOW_COMPLEXITY 6.71 % 



SEQ MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 

SEG 

PRD ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccccccch 

MEM MMMMMM 

SEQ AFKCLLSARLCAALLSNISDCDETFNYWEPTHYLIYGEGFQTWEYSPAYAIRSYAYLLLH 

SEG . . . xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhccccceeeccccceeeeeccccceeecccchhhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ AWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGLHVSRMMLAFLVLS 

SEG 

PRD cchhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAILGWPFSAALGLPIAFD 

SEG xxxxxxxxxxxxx 

PRD cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 

MEM MMMMMMMMMMMMMM 

SEQ LLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNI VLYNVFTPHGPDLYGT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 

MEM MMMMMMM . MMMMMMMMMMMMMMMMMMMMM 

SEQ EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 

SEG xxxxxxxxxxxxxxx 

PRD cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 
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SEQ FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWLALGTVFL 

SEG 

PRD hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

SEQ FGLLSFSRSVALFRGYHGPLDLYPEFYRI ATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

SEG 

PRD eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

MEM 

SEQ LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 

SEG 

PRD ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

MEM 

SEQ TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 

SEG 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

MEM 

SEQ KAKQI RKKSGG 

SEG 

PRD hhhhhhccccc 

MEM 



Prosite for DKFZphutel_20m24 . 2 



psooooi 


77->81 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


593->597 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


606->610 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


67->70 


PKC PHOSPHQ 


SITE 


PDOC00005 


PS00005 


133->136 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


541->544 


PKC PHOSPHO" 


~SITS 


PDOC30005 


PS00005 


545->548 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


553->556 


PKC PHOSPHO" 


"sits 


PDOC000C5 


PS00005 


572->575 


PKC_PHOSPHO~ 


"sits 


PDOCOOOCS 


PS00006 


16->20 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


79->33 


CK2_PHOSPHO^ 


"site 


PDOC0000S 


PS00006 


329->333 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


457->461 


CK2 PHOSPHO" 


"sits 


PDOC00006 


PS00006 


541->545 


CK2 PHOSPHO" 


"sits 


PDOC00006 


PS00006 


545->549 


CK2 PHOSPHO" 


"sits 


PDOC00006 


PS00006 


553->557 


CK2_PHOSPH0_ 


"sits 


PDOC00006 


PS00008 


12->18 


MYRISTYL 




PDOC00008 


PS00008 


14->20 


MYRISTYL 




PDOC00008 


PS00008 


32->38 


MYRISTYL 




PDOC00008 


PS00008 


47->53 


MYRISTYL 




PDOC00008 


PS00008 


166->172 


MYRISTYL 




PDOC00008 


PS00008 


182->188 


MYRISTYL 




PDOC0G008 


PS00008 


218->224 


MYRISTYL 




PDOC00008 


PS00008 


222->228 


MYRISTYL 




PDOC00008 


PS00008 


234->240 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_20ra24 . 2 ) 
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DKFZphutel_21dl5 



group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /chromosome="3" 
Insert length: 5292 bp 

Poly A stretch at pos. 5273, polyadenylation signal at pos . 5252 

1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 
51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 

101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 

151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 

201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 

251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 

301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 

351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 

401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 

451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 

501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 

551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 

601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 

651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 

701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 

751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 

801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 

851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 

901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 

951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 
1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 
1051 CAGCCAGATC CTGCCTACTG AAGAGTATGA AGAGGCAATG AGCACTATGC 
1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 
1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 
1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 
1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTGC TATCCCCGTG 
1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 
1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 
1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 
1451 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 
1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 
1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 
1601 GAGGCCAGCT AGAGCTTGGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 
1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 
1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 
1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 
1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 
1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 
1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 
1951 GTGAAGAATC TGGCCGTGAC CTGGCTGCCA CACTGCTATA GGCCCCAGAA 
2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 
2 051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 
2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 
2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 
2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 
2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 
2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 
2351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 
2401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 
2 451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 
2 501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 
2551 GCTCTGGGTC TTCCTGTAAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 
2 601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 
2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 
2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 
2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 
2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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2851 AACAGCCACC ATACCTGGCT 
2901 GCGTGCCATC CGCCAGAGGT 
2951 GGGGAGAAGA CTGGGCAGGG 
3001 AGGACAGAAT GGATTAACCC 
3051 GGATTGGGAC CCACTGAAAG 
3101 CCTTGCAGGC ACACAATGGG 
3151 CTTTCTGATT AGATAAATGA 
3201 GTCACAGCAG GAAAAGGGTT 
3251 GGACCTCAGG ACTCCCCGCC 
3301 CACATAGCAG GTGTCTCTGT 
3351 GAGTAACCCC CTCCTGCTCT 
3401 CTTCCAGGGG AGGTGGGTAG 
3451 CCTTGGCCAG CTCCTTCAGA 
3501 ATGCCTGCTG CCCACCAGGG 
3551 TCGTGGAGCT CAGCGAGCCG 
3601 CACTACCATG CCCACGTGGA 
3651 CTCCCATACC AAGCTGGTAG 
3701 GCCGGCAAGT ATCTCCCAAC 
3751 ACACCCATGA CACAGGCACA 
3801 GGGGCCAGGA GATCACTGGG 
38 51 CCCACAAGTT GTTTACCCAA 
3901 TGACCACTGG AGTCAACACA 
3951 CCCCCTGAGT TCTGAAGCAA 
4001 CCCATTCCTC CAGGTGTTGA 
4 051 TGCCTCCCTC CCCTGTCAAG 
4101 GGCCCAGCCC CTTCCCATCC 
4151 TCTGCTAGCC TACCTTTCCC 
4 201 TAACTAAGTG CACCTGTGAT 
4 251 AAGAGACTGG GTTTCGGGGA 
4 301 CTGCCCTATT GTCTCCCATC 
4 351 CCTGGGCAGC TTATCCTGCC 
4 401 TGGGGACCTG CTCAGTGCCC 
4451 TTTTATTTGA ACAACGTCAC 
4 501 AGATAACAGA ACCTACGATG 
4 551 TTGTGGGCTG GCAGGGGCTT 
4601 TAAGGATGTG GGCCCAAATT 
4 651 TTGGTCACCC TTGGCTGGCC 
4701 CCACCACCCT GCTGCCCACA 
4 751 TGACACACGG AGGCACTGTG 
4 801 AGGGCACAGC AGTCTTCTGG 
4 851 GTGGGTGACG TAGACGACTA 
4 901 CGGCACCAAG TGGATTGCCA 
4 951 CGCGGCAAGC GCTGTTCCAA 
5001 GGCACCGACT CACAGCCCGA 
5051 GCGCGTGGAA CTCTGAGGGA 
5101 GCCAGTTGCC CAAGATCAGG 
5151 CTAAAGGTCT GGCCAATGTC 
5201 CAGTTCCTAT ATTCATGTTA 
5251 CAAATAAAAA ACCACAAGGT 



CTACCAGGGT GAGGGTGCCC ACCACATCAT 
GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
ATTTGGGATT AAGTTCCATT TGTTAGACCA 
ACAGGCAATT AACAAAGGCA AATTAGCCCT 
CAACTGGGGT TAGATAGAGA TTGAGCACTT 
CCTCTTATCT TTGACCCCTT ATCTGACCCC 
TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
CCCTTTATTT AGTGGAAATG TCAACATTTC 
CTTTGGCATC TGAGGGAGAA GGATCATCAT 
TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
CAGTGGGCCT GTGTACCCAG AGACCATCTG 
CCAACGAGTC TGTACCCTTC GAGACCTCCT 
TGGGGGCTGC CTTCAATCCT CAGACCAGGA 
GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
TTATCCCGGT TAGTGATGCC CTCACCTCTC 
TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
GACTGATGTA CCCACAGACA CCAAAACTTG 
GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
CTTGGCCAAA AAACCATTGC AACTCACAGT 
AGGAGGGGCT AGGGACATTT TGGCACTGGC 
CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
CACAGGTAAG CCCCTGGGAG CATCCACAAC 
CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
AAATGGTAAG GGTCAAC7GG GCTATTACTC 
AGACAAGTGA AGTACACACC TCTCCAGGTC 
ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
TGGCCATAGA GTGGGGACAG GTTGAACACC 
GAGTCTGATT CAGGATGACG TGGACCTCCG 
ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
TACAACTACC TGCCTGATGG GCAAGGTTGG 
CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
AGAGTTAGCC CCGGTTCCCA GCCGCGGGTC 
GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
TTGCCCCACC CCGCCAGCCG CGATACGGCG 
TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
TCGAAAAAAA AAAAAAAAAA GG 



BLAST Results 



Entry HSU64252 from database EMBL: 
Human STS sequence NOTI-225. 

Score = 959, P = 1.2e-36, identities = 195/199 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LLSASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPQAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 



BLASTP hits 
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No BLAST P hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 1 
No Alert BLASTP hits found 



Peptide information for frame 2 



ORF from 320 bp to 892 bp; peptide length: 191 
Category: putative protein 
Classification: no clue 

1 MAAAAVTGQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 

51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL FVHYSNGDES SDPGPQHRAQ 

101 GPGPEPTLGP LTRLEGIKVR TSLPRRAPGP ARLSPRGPAL SPGPHAAPGA 

151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5, frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N = 2, 
Score = 106, P = 0.0067 



>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1,298 

HSPs: 

Score = 106 (15.9 bits), Expect = 6.7e-03, Sum P(2) - 6.7e-03 
Identities = 36/103 (34%), Positives = 44/103 (42%) 

Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRGPALSPGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

Sbjct: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLSGPGRLGPFWAARSGAPAARCAP 189 

AAP AA ++R P+ GP LG W + P+ AP 

Sbjct: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score = 40 (6.0 bits), Expect = 6.7e-03, Sum P(2) = 6.7e-03 
Identities = 8/21 (38%), Positives = 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 

Sbjct: 212 DHAREARAVGRGPS SAAPAAP 232 



Pedant information for DKFZphutel_21dl5, frame 1 



Report for DKFZphutel_21dl5 . 1 



[LENGTH] 117 

[MW] 11797.32 

[pi] 10.68 

[KW] Irregular 

[KW] S I GNAL_PE PT I DE 22 

[KW] LOW_COMPLEXITY 38.46 % 



SEQ LPLVYALMVPLLSASTLGTLASDLESVQLCPTATQLGKRSPSVGWGSRRRKAEPGADAGG 

SEG xxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQHPQAPSPSDRGARGPGGRCPGDCAARAPPRPLPWARARPGCHGGSGGDRPAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphutel_21dl5 . 1 ) 
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(No Pfam data available for DKFZphutel_21dl5 . 1 ) 



Pedant information for DKFZphutel_21dl5, frame 2 



Report for DKFZphutel_21dl5 . 2 

[LENGTH] 191 

[MWJ 19916.88 

[plj 10.43 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 29.84 % 

SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

SEG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 
MEM 

SEQ FLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . .xxxx 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 

MEM 

SEQ GAPAARCAPFP 
SEG xxxxxxxxx . . 
PRD ccccccccccc 
MEM 

(No Prosite data available for DKFZphutel_21dl5 .2) 
(No Pfam data available for DKFZphutel_21dl5.2) 
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DKFZphutel_22d2 



group: signal transduction 

DKFZphutel_22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the ras 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of a 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and YAL048c 

Sequenced by BMFZ 

Locus: /map="17" 

Insert length: 3247 bp 

Poly A stretch at pos. 3230, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



CTCCTGGTGA 
GAGAGCCGCC 
CTAGAGTTGG 
CCAGAAGAGG 
CACCCCAGAG 
AGAGTGATGA 
ATAGTGTATG 
ATGGATTCCT 
TAATATTGGT 
ACCATCCTTC 
GTGTTCAGCG 
AGAAAGCTGT 
GAGATGAAAC 
TGATCAAGAT 
AGAGGATTTG 
AAGAATGTAG 
GACCCTGAAA 
GACACGAAAC 
CTGGATTTGA 
TTGCACTACT 
TTGACAAGCA 
AAAGATTTAT 
TAACACAGTT 
TTTCCCAGTG 
TATTTGGGCT 
TTCAGCTGTT 
AAACTCAAAG 
GGGAAAAGTG 
GAAGAAAATT 
ATGTATATGG 
GAATTTCTAA 
TGATGTCAGC 
AACACTTTAT 
GACCTGCATG 
CAGGAAACAC 
ATGCCCCCAG 
CCGTAAGTAC 
CATGCCATTA 
CAGCAACAGA 
CAAGTTTGGT 
AATATCTGTA 
AATAAAACAC 
AAATGGGTTT 
AACAGAAAGT 
CTAAAATATT 
GATATGTCTT 
CTGTATAAAT 
TTTATTATAT 
CTCTGTAGTT 
GCCTCAAGTA 
TCAGGCAGTG 
GTTAGTCTCT 



GAGGAGTCCA 
GACATGAAGA 
GAAGACATCA 
TTCCTCCCCG 
AGAGTTCCAA 
ACAACTTCAT 
CCGTTAACAA 
CTCATAAATG 
TGGGAACAAA 
CTATTATGAA 
AAAAACCTGA 
TCTTCATCCT 
CAGCTTGTAT 
AATGATGGTA 
TTTCAACACT 
TCAGAAAACA 
GGTTTTCTCT 
TACTTGGACT 
CACCTGAATA 
GAATTAAATC 
TGATTTGGAT 
TTAAAGTTTT 
TGTACCAATG 
GACGCTCACG 
ATCTAGGCTA 
ACAGTGACAA 
AAATGTGTTC 
GAGTTCTTCA 
CGTGAAGATC 
ACAAGAGAAA 
CTGAAGCTGA 
AATCCCAAAT 
GGACAGCAGA 
AAGTTAAACA 
AAAATGCCTC 
TAAGGATATC 
TTGCTGTCTT 
TTAGCCATGA 
AAGATACTTT 
TTGAATGCCA 
TATTTTTGAG 
AACCCCCCAC 
GGCATCATGT 
TTATATTTTT 
TTATTAATTT 
TTTTAAGTGC 
GTTTTACATT 
ATCTATACAT 
TACTAACTGC 
GTGTGTTTGT 
CGTTTCTCAG 
AAATTATTTT 



CTCCGTGCGT 
AAGACGTGCG 
CTGATTATGT 
GGCAGAAGAA 
CACACATTGT 
CAAGAAATAT 
CAAGCATTCT 
AAAGAACAGA 
TCTGATCTGG 
CCAGTATACA 
AGAACATATC 
ACAGGGCCCC 
AAAAGCCCTT 
CTCTCAATGA 
CCATTAGCTC 
TATAAGTGAT 
TTTTACACAC 
GTGCTTCGAC 
TTTGTTCCCC 
ATCATGCATA 
AGAGACTGTG 
CCCTTACATA 
AAAGAGGCTG 
ACTTATTTAG 
TTCAATATTG 
GAGATAAAAA 
AGATGTAATG 
GGCTCTTCTT 
ATAAATCCTA 
TACTTGTTGT 
AATCATTTGT 
CCTTTGAATA 
ATACCTTGCT 
AGAATACAGT 
CACCACAAGC 
TTTGTTAAAT 
CATTTTCATG 
AGGGAATATC 
GTAATGAGAA 
TAATAAAATG 
CAGGCTGTAA 
CCAGCATTAA 
TGTTTTATGC 
CTGTTTTTGA 
ATGTTGAAAT 
TGTAAAGAGT 
AAGTGTTACG 
GCATATGCAC 
CTTAAAATTG 
ATAAATTCTG 
GACTTTATAG 
TCTTCTTATG 



GCGGGCGGAG 

GATCCTGCTG 

CTCTGGTCAG 

ATCACCATTC 

AGATTACTCA 

CTCAGGCIAA 

ATTGATAAGG 

CAAAGACAGC 

TGGAATATAG 

GAAATAGAAA 

AGAGCTCTTT 

TGTACTGCCC 

ACTCGTATAT 

TGCTGAACTC 

CTCAAGCTCT 

GGTGTGGCTG 

ACTTTTTATC 

GATTTGGTTA 

CTGCTGAAAA 
TTTATTTCTC 

CTTTGTCACC 
CCTTGGGGGC 
GATAACCTAC 
ATGTACAGCG 
ACTGAGCAAG 
GATAGACCTG 
TAATTGGAGT 
GGAAGAAACT 
CTATGCGATT 
TGCATGATAT 
GATGTTGTAT 
CTGTGCCAGG 
TAATCGTAGC 
ATTTCACCTA 
CTTCACTTGC 
TGACAACAAT 
TTGCATGGTT 
TTTGTCACAT 
GGTACAAATT 
ATATAAACAG 
CTATCTTAAT 
AAAATAGTTT 
TTATAAAGCA 
CCTTAGGTAT 
TGTGGGTATG 
AGTTGTAATT 
AGCCACAAAT 
AAGCACATAA 
CATGGTTCTT 
TTTTGTAACA 
CTTATTCTAC 
AAAACTACAG 



GCCGGCCCCC 
GTGGGAGAAC 
TGAAGAATTT 
CAGCTGATGT 
GAAGCAGAAC 
TGTCATCTGT 
TAACAAGTCG 
AGGCTGCCTT 
TAGTATGGAG 
CCTGTGTGGA 
TATTACGCAC 
AGAGGAGAAG 
TTAAAATATC 
AACTTCTTTC 
GGAGGATGTC 
ACAGTGGGTT 
CAGAGAGGGA 
TGATGATGAC 
TACCTCCTGA 
CAAAGCACCT 
TGATGAGCTT 
CAGATGTGAA 
CAGGGATTCC 
GTGCCTGGAA 
AGTCTCAAGC 
CAGAAAAAAC 
GAAAAACTGT 
TAATGAGGCA 
AACACTGTTT 
CTCAGAATCG 
GCCTGGTATA 
ATTTTTAAGC 
TGCAAAGTCA 
CTGATTTCTG 
AATACTGCTG 
GGCCATGTAT 
CATAACATTG 
AGGAATTGTT 
TGAGTAAATG 
TGCTTCTGAC 
AGAATAGTAC 
TACTGGAATA 
TTTTCATATG 
ATGAAGTTTT 
CTTCAGTTAG 
GGAATTTCTA 
TTCATGTACA 
CTGTGGTCAT 
AATGGCATTC 
AAATAGTTTT 
TTATTCTTAT 
TGTAACACAG 
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2 601 AGTAATAATC AAACATTGCT 

2651 TGTTGATTTG TACAGATTTT 

2701 TACCTAATAC TTTTTTTAGA 

2751 TCGAAGACAA GTCATTCCTA 

2801 GTTTTACCTT TGCTTTAAAA 

2851 ATTACAGAGA CAGACTCTCC 

2901 AATTTGAGAA AAATCTGGGT 

2951 TTCTTTATGA CTTCTGTGGG 

3001 AAATCTCCTT TCTTGCCATA 

3051 CTGCTTGGTC AGGCTTCCTG 

3101 TACATAAATG TACTTCTTTA 

3151 TTTTACATCT GCATTTTTAA 

3201 GTGTGTTATA TGATAAATGT 



ATAAACCAAG AATGACATTT TTCAAAAAGG 
TAAAGTCAGT TAACTTTACT GCTATTTTAT 
TGCAACAAAC CCTTGAATTT CTATTTGTAT 
TTATTATAGA ATAACCAAAA CCTTATTTAT 
CTCTCATGTA TGTTATCTAC AGAGAGGATC 
CGAGACATGG GCCACACTGA TAGAATAGAG 
CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 
ATTTTGTTGA TATTTTCTTA GAGAATGACC 
ATTAACATTT AGTAATTATG TAGAAACGCA 
CCTAGCTATA TATTACGTTG TCTTCCTTAC 
ATCTTGTGAT TACAGTAACT GCAAGTGTGT 
AACATTTTAC TGTAATTCTG TTGTGTGTGT 
ACATACATGG AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS *** NF1 -related locus, Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities - 387/396 



Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score = 1826, P = 7.5e-78, identities = 388/406 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 54 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 



1 MKKDVRILLV GEPRVGKTSL IMSLVSEEFP EEVPPRAEEI TIPADVTPER 

51 VPTHIVDYSE AEQSDEQLHQ EISQANVICI VYAVNNKHSI DKVTSRWIPL 

101 INERTDKDSR LPLILVGNKS DLVEYSSMET ILPIMNQYTE IETCVECSAK 

151 NLKNISELFY YAQKAVLHPT GPLYCPEEKE MKPACIKALT RIFKISDQDN 

201 DGTLNDAELN FFQRICFNTP LAPQALEDVK NVVRKHISDG VADSGLTLKG 

251 FLFLHTLFIQ RGRHETTWTV LRRFGYDDDL DLTPEYLFPL LKIPPDCTTE 

301 LNHHAYLFLQ STFDKHDLDR DCALSPDELK DLFKVFPYIP WGPDVNNTVC 

351 TNERGWITYQ GFLSQWTLTT YLDVQRCLEY LGYLGYSILT EQESQASAVT 

401 VTRDKKIDLQ KKQTQRNVFR CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 

451 EDHKSYYAIN TVYVYGQEKY LLLHDISESE FLTEAEIICD VVCLVYDVSN 

501 PKS FEYCARI FKQHFMDSRI PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 

551 MPPPQAFTCN TADAPSKDI F VKLTTMANYP 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_22d2, frame 1 

TREMBL:CEUK08F11_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11., N - 1, Score = 1357, P = l.le-138 

TREMBL:SPCC320_4 gene: "SPCC320 . 04c" ; product: "hypothetical protein"; 
S.pombe chromosome III cosmid c320., N = 1, Score = 889, P = 4.4e-89 

TREMBL : CEUC47C12_3 gene: "C47C12.4"; Caenorhabditis elegans cosmid 
C47C12., N = 2, Score - 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048c - yeast (Saccharomyces 
cerevisiae) , N = 1, Score = 677, P = 1.3e-66 



>TREMBL:CEUK08F11_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11. 

Length = 625 



HSPs: 
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Score = 1357 (203.6 bits), Expect = l.le-138, P = l.le-138 
Identities = 263/582 (45%), Positives = 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHIVDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct: 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLSIKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QANVIC+VY+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICVVYSVTDESTVDGIQTKWLPLIRQSFGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A-NNTDKILPIMEANTEVETCVECSARTMKNVSEIFYYAQKAVIYPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDGVAD 243 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V DGVA+ 
SbjCt: 188 RARKALIRVFKICDRDNDGYLSDTELNDFQKLCFGIPLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTELNH 303 

L L GFL+LH LFI+RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL++LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKIDLQKKQTQRNVF 419 

+ W +TT +++ + EL YLG+ + +A ++ VTR++K DL+ T R VF 

Sbjct: 368 AYWNMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSIRVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDI 476 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL ++ 
Sbjct: 428 QCLVVGAKDAGKTVFMQSLAGRGMADVAQIGRRH-SPFVINRVRVKEESKYLLLREVDVL 486 

Query: 477 SESEFLTEAEI ICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRIPCLIVAAKSDLHEVKQ 536 

S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 
Sbjct: 487 SPQDALGSGETSADVVAFLYDISNPDSFAFCATVYQKYFYRTKTPCVMI ATKVEREEVDQ 54 6 

Query: 537 EYSISPTDFCRKHKMPPPQAFTCNTADAPSKDI FVKLTTMAMYP 580 

+ + P +FCR+ ++P P F+ S IF +L MA+YP 

Sbjct: 547 RWEVPPEEFCRQFELPKPIKFSTGNIGQSSSPIFEQLAMMAVYP 590 



Pedant information for DKFZphutel_22d2, frame 1 



Report for DKFZphutel_22d2 . 1 



[LENGTH] 


580 


[MW] 


66541.' 


[pi] 


5.! 


56 


[HOMOL] 


TREMBL 


149 






[FUNCAT] 


99 


unci 


[ FUNCAT] 


03 


.04 1 


3e-ll 






[FUNCAT] 


03 


.99 i 


cerevisiae. 


YNL09I 


3c] 1 


[FUNCAT] 


10 


.04.1 


[FUNCAT] 


03 


.10 : 


[FUNCAT] 


11 


.01 ! 


[FUNCAT] 


03 


.22 i 


[FUNCAT] 


01 


.03.: 


8e-09 






[ FUNCAT ] 


01 


.05.1 


8e-09 






[FUNCAT] 


30 


.03 i 


[FUNCAT] 


11 


.10 ( 


[FUNCAT] 


10 


.02. ( 


[FUNCAT] 


30 


.04 ( 


[FUNCAT] 


30 


.08 i 


[FUNCAT] 


08 


.07 ' 


9e-08 






[FUNCAT] 


30 


.09 < 


YFL005w] 9e- 


-08 




[FUNCAT] 


30 


.02 ( 


[FUNCAT] 


08 


.13 ' 



"K08F11.5"; Caenorhabditis elegans cosmid K08F11. le- 



proteins 



[S. cerevisiae, YAL048c] 5e-81 
ind filament formation [S. cerevisiae, 



YKR055w] 



[S. 



8e-09 

g-proteins 



[S. 



cerevisiae, YNL098c] 8e-09 
lination [S. cerevisiae, 

cerevisiae, YNL098c] 8e-09 



YNL098c] 8e-09 

YNL098c] 8e-09 
[S. cerevisiae, YNL098c] 

[S. cerevisiae, YNL098C] 

YORlOlw] 4e-08 



01.05.04 regulation of carbohydrate utilization 

:ytoplasm [S. cerevisiae, 

[S. cerevisiae, YORlOlw] 4e-08 
[S. cerevisiae, YPR165w] 7e-08 
:ytoskeleton [S. cerevisiae, 
30.08 organization of golgi [S. cerevisiae, YPR165w] 7e-08 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YFL005w] 



g-proteins 



YPR1 65w] 7e-08 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



lembrane [S. cerevisiae, YFL005w] 9e-08 

[S. cerevisiae, YNL093w] le-07 
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[ FUNCAT ] 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YNL093w] 


le-07 






[ FUNCAT ] 


08.19 cellular import [S. cerevisiae, YNL093w] le-07 




[ FUNCAT ] 


10.05.07 g-proteins [S. cerevisiae, YLR229C] 8e-07 




[ FUNCAT ] 


03.07 pheromone response, mating- type determination, sex-specific proteins 


[S. 


cerevisiae, YLR229c] 8e-07 




[FUNCAT] 


10.99 other signal— transduction activities [3. cerevisiae, 


, YC R027c] 3 e - 0 6 


[ FUNCAT ] 


09.09 biogenesis of intracellular transport vesicles 


[S. cerevisiae/ 


YGL210w] 9e 


-04 




[BLOCKS] 


RT , 0 O d 1 f) Jl riwn amnn f ami 1 \? nrnffi n 




[SCOP] 


rilnlk 3 7^1 3 1 (^H-nP 1 R^i ^ orotpin f human f Homn qani pn ^ \ 


2e-42 


[SCOP] 


Hlnnpipi ^ ? S 1 "\ 10 Rani A rHum^n /Hnmn ^flnipnO Sp — S9 




[PIRKW] 


transmembrane protein l6 — 79 




[PIRKW] 


rnornhv anp I" r^f f i ^-V i t\c\ ^p-fl^ 

UL^UULS i_ ulIC LiQl 1 L L~ s\ .L 1 1 %J C \Z \J \1 




[PIRKW] 


acetyl a ted amino end 3e-09 




[PIRKW] 


[JLCllylClLCU L V J LCllIC JC \J D 




[PIRKW] 


b-Lylla-L LldllhUUCUlOIl 1c U ' 




[PIRKW] 


t rdris forming protein 3e — 09 




[PIRKW] 


± jiuiiccii ct Lc cdiiy pr o u t? j.ri oe uo 




[PIRKW] 


a± LCi.113 D [J J. X I Li ly TC <jo 




[PIRKW] 


L 1UUU .1 C J.U 




[PIRKW] 


1 ipopro t e in 7 e — 1 0 




[PIRKW] 


pro to oncogene 3 e — 09 




[PIRKW] 


mpthvl stpr) ra rhfixvl pnH To- DQ 

mc niyi.aLCU ^ cx i- \j /\ y j. ciiu «j ^ u j 




[ PIRKW] 


membrane protein 3e — 09 




[ PIRKW] 


GTP binding le- 10 




[ PIRKW] 


thiolester bond 7e-10 




[ SUPFAM] 


ras transforming protein le-10 




[PROSITE] 


ATP__GTP A 2 




[PROSITE] 


MYRISTYL 3 




[PROSITE] 


EF HAND 1 




[ PROSITE] 


CAMP PHOSPHO SITE 1 




[ PROSITE] 


CK2 PHOSPHO SITE 14 




[ PROSITE] 


TYR PHOSPHO SITE 4 




[PROSITE] 


PKC PHOSPHO SITE 5 




[PROSITE] 


ASN_GLYCOSYLATION 3 




[PFAM] 


Ras family (contains ATP/GTP binding P-loop) 




[KW] 


Irregular 




[KW] 


3D 





SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHI VDYSE 

ljai- . . . EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICI VYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

ljai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKE 

ljai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDG 

ljai- 

SEQ VADSGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKIPPDCTTE 

ljai- 

SEQ LNHHAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYI PWGPDVNNTVCTNERGWITYQ 

ljai- 

SEQ GFLSQWTLTTYLDVQRCLEYLGYLGYSILTEQESQASAVTVTRDKKIDLQKKQTQRNVFR 

ljai- 

SEQ CNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDISESE 

ljai- 

SEQ FLTEAEI ICDVVCLVYDVSNPKSFEYCARI FKQHFMDSRI PCLIVAAKSDLHEVKQEYSI 

ljai- 

SEQ SPTDFCRKHKMPPPQAFTCNTADAPSKDI FVKLTTMAMYP 

ljai- 



Prosite for DKFZphutel_22d2 . 1 



PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



118->122 
154->158 
346->350 
411->415 
94->97 
105->108 



AS N_GL YCOS YLAT I ON 
ASN_GL YCOS YLAT I ON 
AS N_GL YCOS YLAT I ON 
C AMP_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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PS00005 


148 


->151 


PKC PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


247 


->250 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


414- 


->417 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


59->63 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


105- 


->109 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


126 


->130 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


139- 


->143 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


143- 


->147 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


196- 


->200 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


203 


->207 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


311- 


->315 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


325 


->329 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


370- 


->374 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


390 


->394 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


477- 


->481 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


483 


->487 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


541- 


->545 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00007 


153- 


->161 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


376- 


->384 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


153- 


->162 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


448- 


->457 


TYR_PHOSPHO_ 


"site 


PDOC00007 


PS00008 


240- 


->246 


MYRISTYL 




PDOC00008 


PS00008 


425- 


->431 


MYRI5TYL 




PDOC00008 


PS00008 


433- 


->439 


MYRISTYL 




PDOC00008 


PS00017 


11->19 


ATP GTP A 




PDOC00017 


PS00017 


425- 


->433 


ATP GTP A 




PDOC00017 


PS00018 


197- 


->210 


EF HAND 




PDOC00018 



Pfam for DKFZphutel_22d2 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP /GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYI PTIGvDFYtKTIEIDGKtIK 
++L+G+ VGK++-L ++ EF+EE +P ++ T ++ +++ 
6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

LQIWDTAGQERYRSMRPMYYRGAMGFMLVYDITNRqSFENIr .NWweEIr 
ID E+ + + + +A+++ +VY+++N+ S ++++ +W++ 1+ 
53 THIVDYSEAEQSDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWIPLIN 102 

RHCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKT 
+ D+D+ P +LVGNK+-DL + ++T + +E+SAK+ 

103 ERTDKDSRLPLILVGNKSDLVEYSSMETILPIMNQYTEI-ETCVECSAKN 151 

NiNVEEAFMEIvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 
N+ E F+ + +++L + +++ +++++ + C+ 

152 LKNISELFYYAQKAVLHPT GPLYCPEEKEMK-PACI— 186 
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DKFZphutel_22el2 



group: signal transduction 

DKFZphutel 22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegans, 
Drosophila and mammalian proteins. 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathway 
involving hte EGF- receptor . 

The new protein can find application in modulating the cornichon modulated signal transduction 
way and also the EGF receptor signaling processes. 



strong similarity to S.cerevisiae YGL054c and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 519 bp 

Poly A stretch at pos. 499, no polyadenylation signal found 



1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 

51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 

101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 

151 GATTACATTA ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 

201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 

2 51 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 

301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 

351 TTGAAGTCAG CCTACACTAC AGTGCACAGT TGAGGAGCCA GAGACTTCTT 

401 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 

4 51 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 

501 TGAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 



Peptide information for frame 1 



ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 



1 MEAVVFVFSL LDCCALIFLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTIVTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22el2, frame 1 

PIR:S64058 probable membrane protein YGL054C - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 185, P = 5.7e-17 

TREMBL:SPAC2C4_5 gene: "SPAC2C4 . 05"; product: "cornichon homolog"; 
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S.pombe chromosome I cosmid c2C4 . , N = 1, Score = 163, P = 3.7e-12 

PIR:S46084 probable membrane protein YBR210w - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 162, P = 4.8e-12 

TREMBL:AF104398_1 product: "cornichon"; Homo sapiens cornichon mRNA, 
complete cds . , N = 1, Score = 141, P = 8e-10 

SWISSPROT:CNI_DROVI CORNICHON PROTEIN . , N = 1, Score = 139, P = 1.3e-09 



>PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) 

Length = 138 



HSPs: 



Score = 185 (27.8 bits). Expect = 5.7e-17, Sum P(2) - 5.7e-17 
Identities = 35/85 (41%), Positives = 56/85 (65%) 

Query: 1 MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTIVTV 60 

M A +F+ +++ C +F V+F I +DLE DYIN CSK+NK + PE H +++ 
Sbjct: 1 MGAWLFILAVVVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFIFLLNLPVATWNI YRM 85 

L L++ +WF+FLLNLPV +N+ ++ 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score - 37 (5.6 bits), Expect = 5.7e-17, Sum P(2) = 5.7e-17 
Identities = 7/9 (77%), Positives = 9/9 (100%) 



Query: 82 IYRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 



Pedant information for DKFZphutel_22el2, frame 1 



Report for DKFZphutel_22el2 . 1 



[LENGTH] 92 

[MW] 10614.98 

[pi] 5.04 

[HOMOL] PIR:S64058 probable membrane protein YGL054C - yeast (Saccharomyces cerevisiae) 
5e-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGL054C] 
2e-15 

[PIRKW] transmembrane protein 2e-ll 

[PROSITE] CK2_PH0SPH0_SITE 3 

[KW] SIGNAL_PEPTIDE 33 

[KW] TRANSMEMBRANE 2 



SEQ MEAVVFVFSLLDCCALI FLSVYFI ITLSDLECDYINARSCCSKLNKWVI PELIGHTI VTV 
PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 
MEM MMMMMMMMMM 



SEQ LLLMSLHWFIFLLNLPVATWNI YRMILALIND 

PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 
MEM MMMMMMMMMMMMMMMMMMM . .MMMMMMM. . . . 



Prosite for DKFZphutel_22el2 . 1 

PS00006 9->13 CK2_PHOSPHO_SITE PDOC00006 

PS00006 26->30 CK2_PHOSPH0_SITE PDOC00006 

PS00006 28->32 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_22el2 . 1) 
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DKFZphutel_22n2 



group: uterus derived 



DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



unknown 



complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="553.3 cR from top of Chrll linkage group" 



Insert length: 1556 bp 

Poly A stretch at pos . 1534, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
•751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ACAACAGGCT 
ACTGGAGTGA 
GACGGATTCA 
ATGAGCGTCT 
CCAACCCCAA 
GGCTATGGCT 
AGAAGAAGAA 
GAGGATGACG 
TGATGATGAT 
CTGCAGACTA 
CAGTACATCA 
GAAGCCTTTC 
TCTTAAAGGT 
GTATTGGATG 
CTGGTTAACA 
TAAAAAGCCT 
ATTGAGAGCA 
CTACACCAGG 
CGGAGTTTGA 
GATTGCAGCC 
CCCTGTCTAC 
ACTCAGAATT 
AAAGCATTCA 
GACATTAACC 
GCTGGCCCCT 
ATGTCCTTGC 
GCCCTCTCTG 
CCTCAGATTG 
TTTGAGTTAT 
GAGGAAAGCT 
TTTCATTCGT 
AAAAAA 



GGTTGCTTGG 
GACCCCAGCC 
GAAGGGCTAC 
AGAGATTAAC 
GACACCAAGG 
GATAACAGCA 
GACCTCACAG 
ATGATGATGA 
GATGAAGAGC 
TGAGCATTTG 
GTAGGTACAC 
ATTCCTGATT 
CCCACGTCCT 
AACCTTCTAC 
GAGAATTCTA 
AGAAGATGCA 
TCTCTGAATT 
CCCATGCCCG 
AGAGCTTTTG 
TGGCAGAGTA 
AAGAGTCGGA 
CAAGAACTCA 
CTCCTTCATC 
TTCAGCTGAG 
CTGCCCCAGC 
CCATGCCACA 
CCAGAGGGCA 
CTGTCCCCGG 
AAGAATTGGA 
TAAAAGATGT 
AAAGTTAGTG 



CGTGGAATCC 
CTAGGCTGGG 
AGACCAAGGT 
GACTCCGAAG 
ACTTCCTCGT 
GTGATGAGTG 
TTGACACCTC 
TGATGATTCA 
ATGGAGCCCC 
CCAGTTTCTG 
ACCTCAGTTG 
TTATCCCAGC 
GATGGAAAGC 
AAAGCAGTCA 
AGCAGCACAA 
GAAAAGAATC 
ACACCGTTCT 
ACATTGACAC 
GGCAAGGTAA 
CATTGACATG 
TCCAGTCCCT 
CAGCATTTTA 
CAATTCCACC 
ACACT'l'CCCA 
TGAGATGGAC 
GCTTGGCTCA 
CAGAACATGT 
GGAGTTAATG 
ATTTCTGAGA 
CCTTTTTGTG 
AGTAAAGATT 



TAAAGTGGCC 
GTTCTTTCCA 
TGTTGAAAAC 
AGGTTGCAAG 
TCTGCCCATC 
TGAAGAGGAA 
AACGGGGCTT 
TCTGAAACTG 
TCTGGAAGGG 
CTGAAATTAA 
ATTCACCTGG 
TGTCGGGGAT 
CTGACAACCT 
GACCCTACGG 
CATCACACAA 
CCAAAGCCAT 
AAGCCCCCTG 
GCTGATGCAG 
GCCTGCCCAC 
ATCTGTGCCA 
CCATCTGCTC 
AAGCTCTCGC 
TCCCAAGCTG 
AGCIGCTGTT 
AGATCGTTGT 
GGGGCAGTGC 
TTGTTTAATG 
CATCTACACC 
TCCCATGGAG 
AGAGGGATGG 
TTATAAATCA 



TGGCTTTGAG 
TTATAGAGGA 
CAGACATATG 
TATTTATACT 
TTCCTAACAA 
AATAACAAGG 
TAGTGAAAAT 
ATTCTGATTC 
GCCTATGACC 
GGAACTCTTC 
ACCACAAACT 
ATTGATGCAT 
TGGCCTATTG 
TGCTCTCACT 
CATATGAAAG 
TGACACGTGG 
CGACTG7GCA 
GAATGCTCCC 
GGCAGAGATT 
TTCTAGACAT 
TTTTCCCTCT 
TGAAGGCAAG 
GAGACATGGA 
TCAAGGCTGA 
CAGCTACTTG 
ATGTCCTGCT 
AACCTGCCTG 
ACTGTGGGGA 
GTTAGATTGG 
AATTGTTTTC 
AAAAAAAAAA 



BLAST Results 



Entry HS188252 from database EMBL: 
human STS WI-12265. 
Score = 2554, P = 4.1e-109, identities = 556/587 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE EENNKEKKKT SQLTPQRGFS ENEDDDDDDD DSSETDSDSD 
51 DDDEEHGAPL EGAYDPADYE HLPVSAEIKE LFQYISRYTP QLIDLDHKLK 
101 PFIPDFIPAV GDIDAFLKVP RPDGKPDNLG LLVLDEPSTK QSDPTVLSLW 
151 LTENSKQHNI TQHMKVKSLE DAEKNPKAID TWIESISELH RSKPPATVHY 
201 TRPMPDIDTL MQEWSPEFEE LLGKVSLPTA EIDCSLAEYI DMICAILDIP 
251 VYKSRIQSLH LLFSLYSEFK NSQHFKALAE GKKAFTPSSN STSQAGDMET 
301 LTFS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22n2, frame 3 

PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 132, P = le-05 



>PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae) 
Length = 562 

HSPs: 

Score = 132 (19.8 bits), Expect = 1.0e-05, P - 1.0e-05 
Identities = 24/63 (38%), Positives = 35/63 (55%) 

Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 4 97 EEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 
D 

Sbjct: 557 IID 559 

Score = 122 (18.3 bits), Expect - 1.4e-04, P - 1.4e-04 
Identities = 20/52 (38%), Positives = 33/52 (63%) 

Query: 4 NSSDECEEEHNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

N+ +E ++E+ +E + T + + N+DDDDDDDD + D D DDDD++ 

Sbjct: 494 NNEEEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDD 545 



Pedant information for DKFZphutel_22n2, frame 3 



Report for DKFZphutel_22n2 . 3 



[LENGTH] 304 

[MW] 34285.85 

[pi] 4.37 

[PROSITE] AMIDATION 1 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASNJ3LYCOSYLATION 3 

[KW] All_Alpha 

[KH] LOW_COMPLEXITY 11.84 % 

SEQ MADNSSDECEEENNKEKKKTSQLT PQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPL 
SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 



PRD ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

SEQ EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFI PAVGDIDAFLKVP 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

SEQ RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

SEG 

PRD ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 

SEQ TWIESISELHRSKPPATVHYTRPMPDIDTLMQEWSPEFEELLGKVSLPTAEIDCSLAEYI 

SEG 

PRD hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 

SEQ -DMICAILDIPVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 

SEG 
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PRD hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 

SEQ LTFS 

SEG 

PRD cccc 



Prosite for DKFZphutel_22n2 . 3 



PS00001 




4->8 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


159- 


>163 


asn~ 


"GLYCOSYLATION 


PDOC00001 


PS00001 


290- 


>294 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


17 


->21 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


18 


->22 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


138- 


>141 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 




5->9 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


30 


->34 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


43 


->47 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


45 


->49 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


47 


->51 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


49 


->53 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


168- 


>172 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


131- 


>185 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


185- 


>189 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


235- 


>239 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00009 


280- 


>284 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphutel_22n2 . 3) 
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DKFZphutel_22o2 



group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S.pombe SPBC3E7 . 03c 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: map="llpl5 . 5" 
Insert length: 2714 bp 

Poly A stretch at pos . 2695, polyadenylation signal at pos . 2677 



1 GCAGGGCACG GTGGGGGCTG AGATCGTTTC CTGTTGGAAC TTCTGGCCCA 
51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 
101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 
151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGTTA AAGCGCCACC 
201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 
251 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 
' 301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 
351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 
401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 
451 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 
501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 
551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 
601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 
651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 
701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 
751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 
801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 
851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 
901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 
951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 
1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC ATCAAGGGGG AGGTGGACGA 
1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 
1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 
1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 
1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 
1251 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 
1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 
1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 
1401 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 
1451 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 
1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 
1551 GTGTGCCCCG ATTCATCAAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 
1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 
1651 AGAGGATGAG GACACAGACA CAGATGAGTA CAAGGAAGCC AAAGCCAGCA 
17 01 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 

17 51 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 

18 01 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 
1851 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 
1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 
1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 
2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 
2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 
2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 
2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 
2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 
22 51 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 
2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 
2 351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 
24 01 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 
2451 CCTGTCTTCA CTGAGCGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 
2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 
2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 
2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 
2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 



Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score = 3356, P = 2.0e-144, identities = 672/673 

Entry HS263253 from database EMBL: 
human STS SHGC-15914. 
Score = 1143, P = 9.0e-46, identities = 245/255 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 326 bp to 193 6 bp; peptide length: 537 
Category: similarity to unknown protein 



1 MEPRAVAEAV ETGEEDVIME ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 

51 VSVLEQGLPP SHRVIWLQSV RILSRDRNCL DPFTSRQSLQ ALACYADI SV 

101 SEGSVPESAD MDVVLESLKC LCNLVLSSPV AQMLAAEARL VVKLTERVGL 

151 YRERSFPHDV QFFDLRLLFL LTALRTDVRQ QLFQELKGVR LLTDTLELTL 

201 GVTPEGNPPP TLLPSQETER AMEILKVLFN ITLDSIKGEV DEEDAALYRH 

251 LGTLLRHCVM IATAGDRTEE FHGHAVMLLG NLPLKCLDVL LTLEPHGDST 

301 EFMGVNMDVI RALLI FLEKR LHKTHRLKES VAPVLSVLTE CARMHRPARK 

351 FLKAQGWPPP QVLPPLRDVR TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 

401 FLFVLCSESV PRFIKYTGYG NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 

451 EYKEAKASIN PVTGRVEEKP PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 
501 VIQPMGMSPR GHLTSLQDAM CETMEQQLSS DPDSDPD 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22o2, frame 2 

TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c" ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7., N = 1, Score = 112, P = 0.0023 



>TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7. 
Length = 362 

HSPs: 

Score = 112 (16.8 bits), Expect = 2.3e-03, P = 2 . 3e-03 
Identities = 71/289 (24%), Positives = 124/289 (42%) 

Query: 215 SQETERAM-EILKVLFNITLDSIKGEVDEEDAALYRHLGTLLRHCVMIATAGDRTEEFHG 273 

SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

Sbjct: 12 SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

Query: 274 HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVIRALLIFLEKRLHKTH RL 327 

HA N LL NL L LD + + T + H + +LEK L+ + 
Sbjct: 66 HATNALLSFNLQLLSLDQAIYVSEIACQT LQSILISREVEYLEKGLNLCFDIAAKY 121 

Query: 328 KESVAPVLSVLTECARMHRPARKFLKAQGWPPPQVLPPLRDVRTRP-EVGEMLRNKLVRL 386 

+ ++ P+L++L + +LPDR++G+R L+RL 

Sbjct: 122 QNTLPPILAILLSLLSFFNIKQNL SMLLFPTNDDRKQSLQKGKSFRCLLLRL 173 

Query: 387 MT-HLDTDVKRVAAEFLFVLCSESVPRFIKYTGYGNAAGLLAARGLMAGGRPEGQYS 442 

+T++ ALLC + + GGAG+ M P+ + 

Sbjct: 174 LTIPIVEPIGTYYASLLNELCDGDSQQIARIFGAGYAMGISQHSETMPFPSPLSKAASPV 233 

Query: 443 -EDEDTDTDEYKEAKASINPVTGRV — EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 499 

+ + +E +I+P+TG + +E +++E+KE EA +L +F +L +N 

Sbjct: 234 FQKNSRGQENTEENNLAIDPITGSMCTNRNKSQRLE-LSQEEKEREAERLFYLFQRLEKN 292 
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Query: 
Sbjct: 



500 RVIQ 503 
IQ 

293 STIQ 296 



Pedant information for DKFZphutel_22o2 , frame 2 
Report for DKFZphutel_22o2 . 2 



[LENGTH] 537 

[MW] 60372.53 

[pi] 5.20 

[BLOCKS] BL00415L Synapsins proteins 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PHOSPHO_SITE 13 

[PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 9.50 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 

ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhhhhhhhhhhhhccccc 

SHRVIWLQSVRILSRDRNCLDPFTSRQSLQALACYADISVSEGSVPESADMDVVLESLKC 

cceeeeeccccccccccccccccchhhhhhhhhhhhceeeeccccccccchhhhhhhhhh 

LCNLVLSSPVAQMLAAEARLVVKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 
XXXXXXXXXXXXXXX . . . 

hhhhccccchhhhhhhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 
QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 
hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccchhh 
DEEDAALYRHLGTLLRHCVMIATAGDRTEEFHGHAVNLLGNLPLKCLDVLLTLEPHGDST 
hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 
EFMGVNMDVIRALLIFLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 
eeeehhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 

QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFIKYTGYG 
XXX 

cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhccccccee eecccc 



SEQ NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

SEG xxxxxxxxxxxxxxx xxxxxxxxx 

PRD chhhhhhhhhccccccccccccccccccchhhhhhhhhccccccceeecccccccchhhh 

SEQ QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 



Prosite for DKFZphutel_22o2 . 2 



PS00001 


230- 


>234 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00005 


61 


->64 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


69 


->72 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


84 


->87 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


117- 


>120 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


145- 


>148 


PKC" 


"PHOSPHO" 


[SITE 


PDOC00005 


PS00005 


218- 


>221 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PS00005 


235- 


>238 


PKC" 


"PHOSPHO 


[SITE 


PDOC00005 


PS00005 


324- 


>327 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


463- 


>466 


PKC" 


"PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


508- 


>511 


PKC 


PHOSPHO 


"site 


PDOC00OO5 


PS00006 


12 


->16 


CK2" 


"PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


34 


->38 


CK2 


"PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


52 


->56 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


99- 


>103 


CK2 


PHOSPHO" 


"site 


PDOC0000 6 


PS00O06 


104- 


>108 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


263- 


>2 67 


CK2_ 


"PHOSPHO 


[site 


PDOC00006 


PS00006 


371- 


>375 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 
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PS00006 


388- 


>392 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


442- 


>446 


CK2~PHOSPHCf 


"site 


PDOC00006 


PS00006 


447- 


>451 


CK2~PHOSPHO" 


"site 


PDOC00006 


PS00006 


491- 


•>495 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


515- 


>51S 


CK2 PHOSPHO" 


"site 


PDOC00006 


PSOO0OS 


530- 


>534 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


57 


->63 


MYRISTYL 




PDOC00008 


PS00008 


420- 


>426 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


430- 


>436 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel 22o2 . 2) 
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DKFZphutel_23el3 



group: metabolism 

DKFZphtes3_15j 18 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins . 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 



heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map="578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos . 1831, polyadenylat ion signal at pos . 1810 



1 GGTTTATTAA GCTCCTGGCT CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 
51 AGCCTGGGCA GCCTGGGAAG CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 
101 GTGAGGCAGT GCGGACGGGG ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 
151 GGGGTTACCT TTGGGGGCTG GGACCCCAGT CGAGGGGACA CAACCGTCCC 
201 TGGCAGTGGT TGGTTCTGCT TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 
251 GCTGAAGAAT AAGCTAGCCC AGCCACACCA CCTTGTTGTG TGACCTTGGG 
301 CAGGTGGTTC TGTCTCTCTG AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 
351 ACCATGGCTG ACGGTCAGAT GCCCTTCTCC TGCCACTACC CAAGCCGCCT 
401 GCGCCGAGAC CCCTTCCGGG ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 
451 ATGGCTTTGG CATGGACCCC TTCCCAGACG ACTTGACAGC CTCTTGGCCC 
501 GACTGGGCTC TGCCTCGTCT CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 
551 GGGCATGGTG CCCCGGGGCC CCACTGCCAC CGCCAGGTTT GGGGTGCCTG 
601 CCGAGGGCAG GACCCCCCCA CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 
651 GTGAATGTGC ACAGCTTCAA GCCAGAGGAG TTGATGGTGA AGACCAAAGA 
701 TGGATACGTG GAGGTGTCTG GCAAACATGA AGAGAAACAG CAAGAAGGTG 
751 GCATTGTTTC TAAGAACTTC ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 
801 GATCCTGTGA CAGTATTTGC CTCACTTTCC CCAGAGGGTC TGCTGATCAT 
851 CGAAGCTCCC CAGGTCCCTC CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 
901 ACAACGAGCT TCCCCAGGAC AGCCAGGAAG TCACCTGTAC CTGAGATGCC 
951 AGTACTGGCC CATCCTTGTT TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 
1001 CAGGATACAT TACTTTAGCT GAACTCAGAT TTAGTGCAAG TAAAATGTTA 
1051 GAGGGTGCGG GGGTGAGGAC TGACCACAGA TTCCCTGGAT AGTGTAGTGG 
1101 TAGATTTCTC CACAGGATAG CGCAATTGGC AAATCATGCT TGGTTGTGTT 
1151 AGGCCAAAAT ACTAGTTTTG CTTTCTTTAC CTTTTCTATC TTGATGAAAA 
1201 TGTTGCACAT TCTATAGTTG CAAAACACAT AAAAGGGGAC TTAACATTTC 
1251 ACGTTGTATC TTACTTGCAG TGAATGCAAG GGTTACTTTT CTCTGGGGAC 
1301 CTCCCCCATC ACCCAGGTTC CTACTCTGGG CTCCCGATTC CCATGGCTCC 
1351 CAAACCATGC CGCATGGTTT GGTTAATGAA ACCCAGTAGC TAACCCCACT 
1401 GTGCTTCCAC ATGCCTGGCC TAAAATGGGT GATATACAGG TCTTATATCC 
1451 CCATATGGAA TTTATCCATC AACCACATAA AAACAAACAG TGCCTTCTGC 
1501 CCTCTGCCCA GATGTGTCCA GCACGTTCTC AAAGTTTCCA CATTAGCACT 
1551 CCCTAAGGAC GCTGGGAGCC TGTCAGTTTA TGATCTGACC TAGGTCCCCC 
1601 CTTTCTTCTG TCCCCTGTGT TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 
1551 CTCCAGACAG CTCCATCAGG AACCAAGCAA AGGCCAGATA GCCTGACAGA 
1701 TAGGCTAGTG GTATTGTGTA TATGGGCGGG ACGTGTGTGT CATTATTATT 
1751 TGAGTTATGC TGTTGTTTAG GGGTAAATAA CAGTAAATAA TTAATAATAA 
1801 TAATAATAAT AATAAAGGAG CTGACGTTCT TAAAAAAGAA AAAAAAAAAA 
1851 AAAA 



BLAST Results 



Entry HS286348 from database EMBL: 
human STS TIGR-A002 J47 . 
Score = 510, P - 1.2e-16, identities = 102/102 
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Medline entries 



95394379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein. 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 



Peptide information for frame 3 



ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE ASP (28-39) 



1 MADGQMPFSC HYPSRLRRDP FRDSPLS3RL LDDGFGMDPF PDDLTASWPD 

51 WALPRLSSAW PGTLRSGMVP RGPTATARFG VPAEGRTPPP FPGEPWKVCV 

101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGI VSKNFT KKIQLPAEVD 

151 PVTVFASLSP EGLLIIEAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_23el3, frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N = 1, Score = 304, P = 
4.3e-27 

PIR:JN0924 heat shock 27 protein - rat, N = 1, Score = 301, P = 8.9e-27 

TREMBL :MM03561_1 product: "heat shock protein HSP27"; Mus musculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N = 1, Score = 301, P = 8.9e-27 



>PIR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs: 



Score 


= 304 


(45.6 bits). Expect = 4.3e-27, P = 4.3e-27 




Identities = 


= 80/182 (43%), Positives = 102/182 (56%) 




Query: 


1 


MADGQMPFSC-HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSS 


58 






M + ++PFS PS DPFRD P SRL D FG+ P++ W W S 




Sbjct: 


1 


MTERRVPFSLLRSPSW DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 


50 


Query : 


59 


AWPGTLRSGMVP RGPTATARFGVPAEGR — TPPPFPG EPWKVCVNVHSF 


105 






WPG +R +P GP A A PA R + G + W+V ++V+ F 




Sbjct: 


51 


GWPGYVRP — IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 


108 


Query: 


106 


KPEELMVKTKDGYVEVSGKHEEKQQEGGI VSKNFT KKIQLPAEVDPVTVFASLSPEGLL I 


165 






PEEL VKTKDG VE++GKHEE+Q E G +S+ T K LP VDP V +SLSPEG L 




Sbjct: 


109 


APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 


168 


Query: 


166 


IEAPQVPPYSTFGE 17 9 








+EAP P + E 




Sbjct: 


169 


VEAPMPKPATQSAE 182 





Pedant information for DKFZphutel_23el3, frame 3 



Report for DKFZphutel_23el3 . 3 



[LENGTH] 196 

[MW] 21604.37 
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5 . 00 


[ HOMOL ] 


PIR:JC4244 heat-shock 27K protein - dog 3e-22 


[BLOCKS ] 


BL01031C 


[ PIRKW] 


bloc ked amino end le — 1 3 


[ PIRKW] 


acetylated amino end 4e — 13 


[ P I RKW ] 


phosphopr ote in 7e — 2 1 


[ P I RKW ] 


gl ycopirot ein 2 e - 1 1 


f PTRKW1 




[ PIRKW] 


molecul air chapeirone 4 e — 13 


[ PIRKW] 


al terna ti ve splicing 1 e- 1 9 


[ PIRKW] 


pvp 1 pn<! fip— 14 


r PTRKWl 


ctTp"? 1 ?-! nrtnepri nTnfpi n 7p- 21 


[SUPFAM] 


alpha-crystallin 7e-21 


[PROSITE] 


SUBTILASE ASP 1 


[PROSITE] 


MYRISTYL 2 


[PROSITE] 


CK2 PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO SITE 6 


[PROSITE] 


ASNJ3LYC0SYLATI0N 1 


[PFAMJ 


Heat shock hsp20 proteins 


[KW] 


All Beta 


[KW] 


LOW COMPLEXITY 7.14 % 



SEQ MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 

SEG xxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 

SEQ PGTLRSGMVPRGPTATARFGVPAEGRTPPPFPGEPWKVCVNVHSFKPEELMVKTKDGYVE' 

SEG 

PRD cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeecccceee 

SEQ VSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLIIEAPQVPPYSTFGES 

SEG 

PRD eccchhhhhcccceeeeccccccccccccccceeeecccccceeeeeccccccccccccc 

SEQ SFNNELPQDSQEVTCT 

SEG 

PRD cccccccccceeeccc 



Prosite for DKFZphutel_23el3 . 3 



PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00008 
PS00008 
PS00136 



138->142 

27- >30 
63->66 
76->79 

104->107 
122->125 
140->143 

47->51 
176->180 

62->68 
132->138 

28- >39 



ASN_GLYCOSYLATION 

PKC_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PKOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

SUBTILASE ASP 



PDOC00001 
PDOC00005 
PDOC000U5 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC0000 6 
PDOC00008 
PDOC00008 
PDOC0012 5 



Pfam for DKFZphutel_23el3 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Heat shock hsp20 proteins 

*AMMrpPWDWRE DpDHFeVrMDMPGFKPEEIKVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

77 ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 



123 



EHEREEEREDDkWWWHERIYRHFMRRFrLPENVDpDqlkAsMSdNGVLTI 
+HE E++ + + ++ F +++LP +VDP + AS+S++G+L I 

124 KHE EKQQ EGGI VSKNFTKKIQLPAEVDPVTVFASLSPEGLLI I 166 

TVPKpEP* 
++P ++P 
167 EAPQVPP 173 
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DKFZphutel_23gll 



group: uterus derived 

DKFZphutel_23gll encodes a novel 256 amino acid protein with similarity to S.pombe 
SPAC31G5.12c and S. cerevisiae Maflp. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to SPAC31G5.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1674 bp 

Poly A stretch at pos . 1664, polyadenylation signal at pos . 1644 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 



GGGGGAGGCG 
AGCGCTCTGT 
GAAGGGGGCG 
CGGGAGTAAC 
GCCCGAGCGG 
GCGCGGAGCC 
CCCAGCCAGA 
AATACTAGCC 
ATTGGAGAAC 
CCGGAGATGC 
GCAGGAGACG 
CCACGTGCTG 
GCAGACTCAG 
AAGTGCAGCC 
CTTCAGGCCT 
GGGAGCCCAG 
TCAGCTGTGC 
GGTGGACGAG 
CAGACTTGGA 
AACTACTTCT 
CCGTTCCATC 
TGGACATGGA 
GGCAGTGGGG 
GATCTGTATT 
ACCAATGCCT 
GCTGGCCAGA 
CCTGCCTAGC 
CTCCTGCTGC 
AGGAGCGACT 
CGCAGAGTTT 
GGGTCTGTGG 
GGTCTTGGGC 
GCCGCGTGCA 
AGTTTCTGTG 



GAGGTCGCTC 
GGCGGTCGGC 
AGGCTATGTC 
GGGACGTCGC 
AGGCCGCGGC 
CGCCCCCGCC 
CCCGGCCCGG 
CCTCTGGAGC 
TCGAGCTTTG 
CCACATCATT 
ACAAACACAT 
GAGGCACTTT 
CAAAAGCCAA 
GCAAGACCCT 
GACTATGACT 
CCTTAGCTGG 
GGGAGGACTT 
GAGATCTGCC 
CTCAGATCCC 
TCTACAACAA 
AGTGGCTCCA 
GCTGGGGGAG 
CCGAGGAGAC 
TGATGAGGAG 
GGACCTGTCC 
CCCTGGCGCT 
CCTTTGGCTC 
CCATGCTGTG 
GCCCTGCCCA 
ATTTTTGTAT 
CCGGAGGCCC 
CGGCCCCGGT 
CTGAGTGTCA 
ACTTAAAAAA 



GCTCGCTCGC 
GGCAGGTCGG 
GCGGTGGCAG 
CGCGGAGCTT 
GCCGCCCTCC 
TGCGCACCGG 
CGCGGCCTGA 
ACGGAGCTCC 
AAGCCATCAA 
GGCAGGATTG 
GTTCAAGCAG 
CTCCACCCCA 
GGCGGTGAGG 
CTTCTACCTG 
TCAGCACAGC 
GTGGTGAATG 
CAAGGATCTG 
TGGCTGAATG 
TTCGGGGAGG 
GCGGCTCAAG 
CCTACACACC 
GAGGAGGTGG 
CAGCACCATG 
GAGCCGAGGC 
ACCTGAGAGG 
GCCACAGTCC 
CATCCTGTGG 
GCCGGACTTG 
AATGAACTGC 
TTCTACTGGG 
CACGAGCAGG 
GCCCACCTGT 
CTTTGCTGCA 
AAAA 



TCGGCTCGCT 
TCGCGAGAGC 
CCCGGATGGG 
CTTCCCCCGG 
GATCTTGAAG 
CACCGACGCG 
TCTAACCCAG 
TTCCCCAAAG 
CTCACAGCTG 
AGAGCTACTC 
TTCIGCCAGG 
GACTTCAGGA 
AGGAGGGCCC 
ATTGCCACGC 
CCGCAGCCAT 
CAGTCAACTG 
AAACCACAGC 
TGACATCTAC 
ATGGTAGCCT 
CGAATCGTCT 
CTCAGAGGCA 
AGGAAGAAAG 
GAGGAGGACA 
CCCAGCTTCA 
CCCCTGGGGC 
TGGCACTGCC 
ATGCCCACTC 
TCAGCAGGGG 
CACAGCAGGG 
CCTGCACACT 
CCCCAGCAGT 
ACCCCCACCT 
GCTCGTTTCT 



GACTCGCCGG 
GGGCTCTGTG 
CCGGCAGGGC 
ATACAGTGCG 
AGCCCGCGCT 
GAGCGACCAG 
CCAGGCAGGC 
ACATGAAGCT 
ACTGTGGAGA 
ATGTAAGATG 
AGGGCCAGCC 
CTGAGCCCCA 
CCTCAGTGAC 
TCAATGAGTC 
GAGTTCAGCC 
CAGTCTGTTC 
TGTGGAACGC 
AGCTATAACC 
CTGGTCCTTC 
TCTTTAGCTG 
GGCAACGAGC 
CAGAAGCAGG 
GGGTCCCAGT 
TCCAGCTTCA 
CTCCCCAGCT 
CAAGGCCATA 
ACCCCTCAGA 
GCCTGGTGGG 
ACAGCTGGAC 
CCAGCCCAAA 
CACCGGCTCT 
CGCCCATTTG 
TTCCAATAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 



528 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD 
51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR 
101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR 
151 WNAVDEEICL AECDIYSYNP DLDSDPFGED GSLWSFNYFF 
201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA 
251 VPVICI 



KHMFKQFCQE 
KTLFYLIATL 
EDFKDLKPQL 
YNKRLKRIVF 
EETSTMEEDR 



BLASTP hits 



Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c31G5. 

Score = 272, P = 9.3e-24, identities = 51/127, positives = 80/127 
Entry SPD656_1 from database TREMBL: 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds . 

Score = 263, P = 8.4e-23, identities = 50/127, positives = 79/127 
Entry S50986 from database PIR: 

MAF1 protein - yeast ( Saccharomyces cerevisiae) >SWISSPROT :MAF1_YEAST 
MAF1 PROTEIN. >TREMBL : SCI 94 92_1 gene: "MAF1 " ; product: "Maflp"; 
Saccharomyces cerevisiae Maflp (MAF1) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp"; product: "Maflp"; S. cerevisiae 
chromosome IV cosmid 8119. 

Score = 180, p = 2.3e-17, identities = 43/133, positives = 75/133 

Entry AF098499_2 from database TREMBL : 

gene: "C43H8.2"; Caenorhabditis elegans cosmid C43H8. 

Score - 263, P = 9.2e-23, identities = 78/252, positives = 118/252 



Alert BLASTP hits for DKF2phutel_23gll , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_23gll, frame 3 

Report for DKFZphutel_23gll .3 



[LENGTH] 256 

[MW] 28869.95 

[pi] 4.51 

[HOMOL] TREMBL : SPAC3 1G5_12 gene: "SPAC31G5 . 12c" 

S.pombe chromosome I cosmid C31G5. 4e-23 



product: "hypothetical protein"; 



[FONCAT] 

6e-13 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KM] 

[KW] 



06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR005c] 

MYRISTYL 3 

CK2_PHOSPHO_SITE 5 

PKC_PHOSPHO_SITE 6 

ASN_GLYCOS YLAT I ON 3 
All_Alpha 

LOW COMPLEXITY 7.81 % 



SEQ MKLLENSSFEAINSQLTVETGDAHIIGRIESYSCKMAGDDKHMFKQFCOEGOPHVLEALS 

SEG 

PRD cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 



SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWVVNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDIYSYNPDLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 



SEQ GSLWSFNYFFYNKRLKRIVFFSCRSISGSTYTPSEAGNELDMELGEEEVEEESRSRGSGA 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 



SEQ EETSTMEEDRVPVICI 

SEG XX 

PRD cccccccccceeeccc 
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Prosite for DKFZphutel_23gll . 3 



psooooi 


6->10 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


101->105 


ASN GLYCOSYLATION 


PDOC00001 


PSOOOOI 


132->136 


ASN_GLYCOSYLATION 


PDOC00001 


PS00005 


33->36 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


85->88 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


89->92 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


103->106 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


112->115 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


202->205 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


7->ll 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


99->103 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


212->216 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


238->242 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


244->248 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


66->72 


MYRISTYL 


PDOC00008 


PS00008 


181->187 


MYRISTYL 


PDOC00008 


PS00008 


239->245 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphutel_23gll . 3) 
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DKFZphutel_24cl9 



group: transmembrane protein 

DKFZphutel_24cl9 encodes a novel 195 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 

unknown 

membrane regions: 1 

Summary DKFZphutel_24cl9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 



unknown 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

insert length: 769 bp 

Poly A stretch at pos . 746, polyadenylation signal at pos. 735 



1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 
51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 
101 TGGGAGGAGA GCATAAGGCT CAAAATGGAA AATCATAAAT CCAATAATAA 
151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 
201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 
251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 
301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 
351 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 
401 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 
451 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 
501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 
551 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 
601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 
651 AACAATATAA ACT7ACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 
701 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 
751 ACAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 



1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLIA 

51 NSLFRRILNV TKARIAAGLP MAGI PFLTTD LTYRCFVSFP LNTGDLDCET 

101 CTITRSGLTG LVIGGLYPVF LAIPVNGGLA ARYQSALLPH KGNILSYWIR 

151 TSKPVFRKML FPILLQTMFS AYLGSEOYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphutel_24cl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_24cl9, frame 2 



Report for DKFZphutel_24cl9.2 



[LENGTH] 195 

[MW] 21527.45 

[pi] 9.36 

[PROSITE] MYRISTYL 6 

[PROSITE] CK2_PH0SPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOS YLATION 3 

[KW] TRANSMEMBRANE 1 



SEQ MENHKSNNKENITI VDISRKINQLPEAERNLLENGSV YVGLNAALCGLI ANSLFRRILNV 

PRD cccccccccceeeeeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 

MEM 

SEQ TKARIAAGLPMAGI PFLTTDLTYRCFVSFPLNTGDLDCETCTITRSGLTGLVTGGLYPVF 

PRD hhhhhhhccccccceeeeecccccccccccccccccccccccccccccceeeecccceee 

MEM MMMMMMMMMMMMMM 

SEQ LAIPVNGGLAARYQSALLPHKGNILSYWIRTSKPVFRKMLFPILLQTMFSAYLGSEQYKL 

PRD eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 

MEM MMM 

SEQ LIKALQLSEPGKEIH 

PRD hhhhhhhcccccccc 

MEM 



Prosite for DKFZphutel^24cl9 . 2 



PS00001 


11->15 


ASM GLYCOS YLATION 


PDOC00001 


PS00001 


34->38 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


59->63 


ASN GLYCOS YLATION 


PDOC00001 


PS00005 


1B->21 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


82->85 


PKC PHOSPHO" SITE 


PDOC00005 


PS00005 


151->154 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2_PHOSPHO_SITE 


PDOCC0006 


PS00008 


40->46 


MYRISTYL 


PDOC00008 


PS00008 


47->53 


MYRISTYL 


PDOC00008 


PS00008 


68->74 


MYRISTYL 


PDOCC0308 


PS00008 


110->116 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


y.YRl ST YL 


PDOC00008 


PS00008 


142->148 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphutel_24cl9.2) 
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DKFZphutel_24ell 



group: intracellular transport and trafficking 

DKFZphutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golgi 
4 -transmembrane spanning transporter MTP. MTP may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane- 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport . 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane-bound 
compartments . 



similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane -bound compartment? 
Sequenced by Qiagen 
Locus: /map="8" 
Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylation signal at pos . 1963 



1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 

51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 

101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 

151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 

201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 

251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 

301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 

351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 

4 01 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 

451 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 

501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 

551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 

601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 

651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 

701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 

7 51 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 
801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 

8 51 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 
901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 
951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 

1001 TAGATTGAAA ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 

1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 

1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 

1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 

1201 TCCCTCTCTT TTGAAAATGT AAAATAAAAC CAAAAATAGA CAACTTTTTC 

1251 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 

1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG 

1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 

1401 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 

1451 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 

1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 

1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 

1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 

1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 

17 01 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 

17 51 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 

1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 

1851 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 

1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 

1951 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 
2001 AAAAA 



BLAST Results 



Entry HS012351 from database EMBL : 
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human STS SHGC-31823. 
Score = 1629, P = 3.1e-67, identities = 343/354 



Medline entries 



96199248 : 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 



Peptide information for frame 1 



ORF from 184 bp to 861 bp; peptide length: 226 
Category: strong similarity to known protein 



1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 
51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 
101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 
151 PTCLVLIILL FISIILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 
201 TVLLPPYDDA TVNGAAKEPP PPYVSA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_24ell, frame 1 

SWISS PROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., N = 1, Score = 551, P = 2.9e-53 

SWISS PROT:MTRP_MOUSE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP., N 
= 1, Score = 539, P = 5.3e-52 

TREMBL:HS304981_1 product: "E3 protein"; Human retinoic acid-inducible 
E3 protein mRNA, complete cds., N = 1, Score = 127, P = 3.4e-06 



>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANS MEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108) . 

Length = 233 

HSPs: 



Score 


= 551 


(82.7 bits). Expect = 2.9e-53, P = 2.9e-53 




Identities = 


= 102/221 (46%), Positives = 148/221 (66%) 




Query : 


9 


RF YSNSCCLCCH VRTGT I LLGVWYL I IN AVVLLILLSALADPDQY NFSSSELGGDF- 


64 






RFYS CC CCHVRTGTI+LG WY+++N ++ ++L + P+ N +G + 




Sbjct: 


13 


RFYSTRCCGCCHVRTGTIILGTWYMVVNLLMAILLTVEVTHPNSMPAVNIQYEVIGNYYS 


72 


Query : 


65 


-EFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAITVL 


123 






E M D N C+ A+S+LM +1 +M YGA + W+IPFFCY++FDF L+ LVAI+ L 




Sbjct: 


73 


SERMAD-NACVLFAVSVLMFI ISSMLVYGAISYQVGWLIPFFCYRLFDFVLSCLVAISSL 


131 


Query : 


124 


IYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCYRYI 


183 






Y I+EY+ QLP +FPY+DD+++++ +CL+ I+L+F ++ + FK YLI+CVWNCY+YI 




Sbjct : 


132 


TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFIVLVFFALFIIFKAYLINCVWNCYKYI 


190 


Query : 


184 


NGRNSSDVLVYVTSN-DTTVLLPPYDDA TVNGAAKEPP PPYVSA 22 6 








N RN ++ VY +LP Y+ A V KEPPPPY+ A 




Sbjct: 


191 


NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 





Pedant information for DKFZphutel_24ell, frame 1 



Report for DKFZphutel_24ell . 1 



[LENGTH] 22 6 

[MW] 25419.11 
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[pi] 


4.65 




[ HOMOL ] 


SWI SS PROT : MTRP_HUMAN 


GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP (KIAA0108) 


5e-40 






[PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC_PHOSPHO SITE 


1 


[PROSITE] 


ASN GLYCOSYLATION 


3 


[KW] 


SIGNAL_PEPTIDE 49 




[KW] 


TRANSMEMBRANE 2 




[KW] 


LOW COMPLEXITY 20. 


.80 % 



SEQ MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

SEG xxxxxxxxxxxxxxxx 

PRD ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 

MEM 

SEQ GGDFEFMDDANMCIAIAISLLMILICAMATYGAYKQRAAWIIPFFCYQIFDFALNMLVAI 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TVLIYPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISIILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . . 

SEQ RYINGRNSSDVLVYVTSNDTTVLLPPYDDATVNGAAKEPPPPYVSA 

SEG 

PRD eecccccccceeeeeecccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphutel_24ell . 1 



PS00001 
PS00001 
PS00001 
PS00005 
PS00006 
PS00006 
PS00006 
PS00007 



54->58 
187->191 
198->202 
167->170 

56->60 
128->132 
196->200 
186->195 



ASN_GLYCOSYLATION 
ASN_GLYCOS YLATION 
ASN_GLYCOSYLATION 
PKC_PHOS PHO_S I TE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
TYR PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 



(No Pfam data available for DKFZphutel_24ell . 1 ) 
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DKFZphutel_24j 6 



group: cell structure and motility 

DKFZphutesl_24 j 6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator (CARD . 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 



strong similarity to rat CAR1 A.thaliana T19C21.5 

complete cDNA, complete cds, EST hits 

potential frame shift at Bp 1241 according to CAR1 

but frame shift might be in CAR1 sequence! 

ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map="939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos . 3316, no polyadenylation signal found 



1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 
51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 
101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 
151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 
201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 
251 CACCTCAGCG AGAGCAGCAG CAGCGATAGC AGCCGCAGAA GAGCCAGCGG 
301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 
351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 
401 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 
451 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 
501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 
551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 
601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGGAATCAT CCTGATGATG 
651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 
701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 
751 CCAGTACTGC TACTGCAATC ACAATCCAAA GGGATTGGAT TGTTGTTGTT 
S01 GCAGGAGAAG ACAGAAGCAA ACTAGCAAAT ATGAATGCCA CAATACGAAG 
851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 
901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 
951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 
1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 
1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 
1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 
1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 
1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 
1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 
1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 
1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 
1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 
1451 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 
1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 
1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 
1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 
1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGCAGGCGTC 
1701 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 
1751 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 
1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 
1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 
1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 
1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 
2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 
2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 
2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 
2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 
2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 
2251 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 
2 301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 
2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 
2 401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 
2 451 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 
2 501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACATGTATG 
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25 51 TGGAAAAACA AAACACTTAA CTAGAATTCT CTAATAAGGT TTATGGTTTA 

2601 GCTTAAAGAG CACCTTTGTA TTTTTATTAT CAGATGGGGC AACATATTGT 

2651 ATGAAGCATA TGTAGCACTT CACAGCATGG TTATCATGTA AGCTGCAGGT 

2701 AGAAGCAAAG CTGTAAAGTA GATTTATCAC ACAATGACTG CATACAGACT 

2751 TCAAATATGT CAATAGTTTG GTCATAGAAC CTAGAAGCCA AAAGCCACAC 

2801 AGAAGGGCAA GAATCCCAAT TTAACTCATG TTATCATCAT TAGTGATCTG 

2851 TGTTGTAGAA CATGAGGGTG TAAGCCTTCA GCCTGGCAAG TTACATGTAG 

2901 AAAGCCCACA CTTGTGAAGG TTTTGTTTTA CAAATCACTT GATTTAACAC 

2951 ACTCAGGTAG AATATTTTTA TTTTTACTGT TTTATACCCA GAAGTTATTT 

3001 CTACATTGTT CTACAGCAAG AATATTCATA AAAGTATCCC TTTCAAATGC 

3051 CTTTGAGAAG AATAGAAGAA AAAAAGTTTG TATATATTTT AAAAAATTGT 

3101 TTTAAAAGTC AGTTTGCAAC ATGTCTGTAC CAAGATGGTA CTTTGCCTTA 

3151 ACCGTTTATA TGCACTTTCA TGGAGACTGC AATACGTTGC TATGAGCACT 

3201 TTCTTTATCC TTGGAGTTTA ATCCTTTGCT TCATCTTTCT ACAGTATGAC 

32 51 ATAATGATTT GCTATGTTGT AAAATCTTTG TAAAAAATTT CTATATAAAA 

3301 ATATTTTGAA AATCTTAAAA AAAAAAAAAA AAA 



BLAST Results 



Entry HS389210 from database EMBL : 
human STS SHGC-10164. 
Score = 1592, P = 1.5e-64, identities = 346/364 

Entry HS933343 from database EMBL: 
human STS WI-16551 . 
Score = 1193, P = 5.7e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 



1 MTRAGDHNRQ RGCCGSLADY LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 
51 VELYGNSLLL TAVYGLVVAG SVLVLGAIIG DWVDKNARLK VAQTSLVVQN 
101 VSVILCGIIL MMVFLHKHEL LTMYHGWVLT SCYILIITIA NIANLASTAT 
151 AITIQRDWIV VVAGEDRSKL ANMNATIRRI DQLTNILAPM AVGQIMTFGS 
201 PVIGCGFISG WNLVSMCVEY VLLWKVYQKT PALAVKAGLK EEETELKQLN 
251 LHKDTEPKPL EGTHLMGVKD SNIHELEHEQ EPTCASQMAE PFRTFRDGWV 
301 SYYNQPVFLA GMGLAFLYMT VLGFDCITTG YAYTQGLSGS ILSILMGASA 
351 ITGIMGTVAF TWLRRKCGLV RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 
401 SVSPFEDIRS RFIQGESITP TKIPEITTEI YMSNGSNSAN IVPETSPESV 
451 PIISVSLLFA GVIAARIGLW SFDLTVTQLL QENVIESERG IINGVQNSMN 
501 YLLDLLHFIM VILAPNPEAF GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 
551 FACGPDAKEV RKENQANTSV V 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_24j 6, frame 3 

TREMBLNEW:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds . , N 
= 1, Score = 1472, P = 7.2e-151 

TREMBL:AC004683_5 gene: "T19C21.5"; Arabidopsis thaliana chromosome II 
BAC T19C21 genomic sequence, complete sequence., N - 2, Score = 437, P 
= 2.8e-60 



TREMBL: AF039046_2 gene: "R09B5.4"; Caenorhabditis elegans cosmid 
R09B5., N = 2, Score = 323, P = 1.5e-43 



>TREMBLNEW:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; 

Rattus norvegicus cell adhesion regulator (CAR1) mRNA, complete cds. 
Length = 405 
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HSPs: 



Score - 1472 (220.9 bits), Expect = 7.2e-151, P = 7.2e-151 
Identities = 288/319 (90%), Positives = 297/319 (93%) 



Query: 


1 


MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 






MT++ D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 




Sbjct : 


1 


MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 


Query : 


61 


TAV YGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLWQNVSVI LCGI I LMMVFLHKHEL 


120 






TAVYGLWAGSVLVLGAIIGDWVDKNARLKVAQTSLWQNVSVILCGI ILMMVFLHK+EL 




Sbjct : 


61 


TAV YGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLWQNVSVI LCGI I LMMVFLHKNEL 


120 


Query: 


121 


LTMYHGWVLTSCYILIITIANIANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 


180 






L MYHGWVLT CYI LI ITIANI AN LASTATAITIQRDWIVVVAGE+RS+LA+MNATIRRI 




Sbjct: 


121 


LNMYHGWVLTVCYILI ITIANI ANLASTATAITIQRDWIVVVAGENRSRLADMNATIRRI 


180 


Query: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 


240 






DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEY LLWKVYQKTPALAVKA LK 




Sbjct: 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 


240 


Query: 


241 


EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 


300 






EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+AEPFRTFRDGWV 




Sbjct : 


241 


VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 


300 


Query: 


301 


S YYNQPVFLAGMGLAF-LY 318 








SYYNQPVFL G F LY 




Sbjct: 


301 


SYYNQPVFLGWHGPGFPLY 319 





Pedant information for DKFZphutel_24 j 6, frame 3 



Report for DKFZphutel_24j 6 . 3 



[LENGTH] 


571 




[MW] 


62542.72 




[pi] 


6. 08 




[HOMOL] 


TREMBL:U7 6714_1 gene 


: "CAR1"; product: "cell adhesion regulator"; Rattus 


norvegicus 


cell adhesion regulator 


(CARD mRNA, complete cds. le-141 


[BLOCKS] 


BL0O341D 




[PROSITE] 


MYRISTYL 15 




[PROSITE] 


MITOCH CARRIER 1 




[PROSITE] 


CK2 PHOSPHO SITS 


6 


[ PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


PKC PHOSPHO SITE 


4 


[PROSITE] 


ASN GLYCOSYLATION 


4 


[PFAM] 


Laminin B (Domain IV) 


[KW] 


TRANSMEMBRANE 4 




[KW] 


LOW COMPLEXITY 8 


.76 % 



SEQ MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 

SEG 

PRD ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 

MEM MMMMMMMMMMMMM 

SEQ TAV YGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLWQNVSVI LCGI I LMMVFLHKHEL 

SEG . xxxxxxxxxxxxxxxx 

PRD ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MMMMMMMmMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LTMYHGWVLTSCYILI ITIANI ANLASTATAITIQRDWIVVVAGEDRSKLANMNATIRRI 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhcccccchhhhhhhhhhhhhhhhhhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 

MEM MMMMMMM 

SEQ DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 

SEG 

PRD hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 

MEM 

SEQ EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 

SEG 

PRD hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 

MEM 

SEQ SYYNQPVFLAGMGLAFLYMTVLGFDCITTGYAYTQGLSGSILSILMGASAITGIMGTVAF 

SEG 

PRD eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhhhh 
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MEM 

SEQ TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITP 

SEG xxx 

PRD hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 

MEM 

SEQ TKIPEITTEI YMSNGSNSANI VPETSPESVPIISVSLLFAGVIAARIGLWSFDLTVTQLL 

SEG xxxxxxxxxx 

PRD ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 

MEM MMMMMKMMM MMMMMMMMMMMMMMM 

SEQ QENVIESERGI INGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVS FVAMGHIMYFR 

SEG 

PRD hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 

MEM KMMMMMMMMMMMMMMMMMMMMMMMMhM. . . 

SEQ FAQNTLGNKLFACGPDAKEVRKENQANTSVV 

SEG 

PRD eecccccceeeeccccchhhhhhhhcccccc 

MEM 



Prosite for DKFZphutel_24 j 6 . 3 



PS00001 


100->104 


ASN GLYCOSYLATION 


PDOC00001 


psooooi 


174->178 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


434->438 


ASN GLYCOSYLATION 


PDOC00001 


PSOOOOI 


567->571 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


23->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


176->179 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


294->297 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


487->490 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


36->40 


CK2 PHOSPHO SITE 


PDOCD00CG 


PS00006 


294->298 


CK2 PHOSPHO SITE 


PDOC000C6 


PS00006 


396->400 


CK2 PHOSPHO SITE 


PDOC30006 


PS00006 


403->407 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


445->449 


CK2 PHOSPHO SITE 


PDOC00006 


PS00003 


12->18 


MYRISTYL 


PDOC00008 


PS00008 


65->71 


MYRISTYL 


PDOC00008 


PS00008 


76->82 


MYRISTYL 


PDOC00008 


PS00008 


193->199 


MYRISTYL 


PDOC00008 


PS00008 


267->273 


MYRISTYL 


PDOC00008 


PS00008 


3U->317 


MYRISTYL 


PDOC00008 


PS00008 


336->342 


MYRISTYL 


PDOC00008 


PS00008 


339->345 


MYRISTYL 


PDOC00008 


PS00008 


353->359 


MYRISTYL 


PDOCQ0008 


PS00008 


368->374 


MYRISTYL 


PDOC00008 


PS00008 


373->379 


MYRISTYL 


PDOC00008 


PS00008 


435->441 


MYRISTYL 


PDOC00008 


PS00008 


461->467 


MYRISTYL 


PDOC00008 


PS00008 


490->496 


MYRISTYL 


PDOC00008 


PS00008 


494->500 


MYRISTYL 


PDOC00008 


PS00013 


122->133 


PROKAR LIPOPROTEIN 


PDOC00013 


PS00215 


404->414 


MITOCH CARRIER 


PDOC00189 



Pfam for DKFZphutel_24 j 6. 3 



HMM_NAME Laminin B (Domain IV) 

HMM * YWRlPERFLGDQvTsYGGkLG* 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNTLGNKLFACGPDAK 558 
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DKFZphutel_2h3 



group: differentiation/development 

DKFZphutel_2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site (CAAX box) and seems to be post- 
translationally modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-osteogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 



strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2033 bp 

Poly A stretch at pos . 2007, polyadenylation signal at pos . 198S 



1 GGACCGAGGC TGCACCGGCA 
51 CAGCCATGGT GAAGATTAGC 
101 GACAAGGCTG ACAAGGCGTC 
151 CGAGATCCTG CTGACGCCGG 
201 CCAAGAGGGG GAGCTCAGTG 
251 GTCGTGCTGC TCATGGGCCT 
301 CTTCTTTCTT GCACAGCTGG 
351 TGTATGAGGA CTCCCTGTCC 
401 GAGGATGTGA AAATCTACCT 
451 TGTGCCCCAG TTTGGCGGCG 
501 AGCGGGGTCT GACTGCGTAC 
551 ATCGAACTCA ACACCACCAT 
601 CCTCATGAAC GTGAAGAGGG 
651 AGGAGGAGAT GGTGGTCACG 
701 TCCTTCATCT ACCACCTGTG 
751 CCGGGCAACG CGGAGGCGGA 
801 CCATCCGCCA CTTCGAGAAC 
851 GTGGTGTGAG GCCCTCCTCC 
901 TTCTTTCCAG CTGCTCTCTG 
951 CTTTGGACGC GTTTCTATAG 
1001 CCTGCCCACC TCCCTGTACC 
1051 CTCTGCTGAC CTGGGTGTGG 
1101 TCTGTGTCCC ACTGTCTTGA 
1151 CTGCACCGGC AGCCCAAGGG 
12 01 AGGCCCTGGG CAAGGGGATG 
1251 AGAAGTATCT GCACAATTAG 
1301 TACACTTTCT TCACTGTCCC 
1351 TGGGACGATG TGCCCAGGGA 
14 01 TACCTGGGGG TGTCCCAGGG 
1451 GAGCTTGGAG TTTGGGGAGT 
1501 CTGAGTGGAA CCAAAGAAGC 
1551 AGGAGCACAA GCAGGGTCCC 
1601 GGAACGGGGC AGGCAAGGTC 
1651 CGTGGGTTCT GCTGAGTAGG 
1701 CTGTTTTGAA AGATAACACA 
1751 CCACCCTGCC TCCTCTGTTC 
1801 TGCACCTTTT TCCCTTTCCT 
1851 CGATATGCTA ACCGTTCTCA 
1901 GCCTTCAGTC AGTCTCTGGG 
1951 CTCAATTAGA TCTCTTTTCA 
2001 ACTTCTGAAA AAAAAAAAAA 



GAGGCTGCGG GGCGGACGCG CGGGCCGGCG 
TTCCAGCCCG CCG m 3GCTGG CATCAAGGGC 
GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 
CTAGGGAGGA GCAGCCCCCA CAACATCGAT 
GGCGGCGTGT GCTACCTGTC GATGGGCATG 
CGTGTTCGCC TCTGTCTACA TCTACAGATA 
CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 
TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 
CGACGAGAAC TACGAGCGCA TCAACGTGCC 
GTGACCCTGC AGACATCATC CATGACTTCC 
CATGATATCT CCCTGGACAA GTGCTATGTC 
TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 
GGACCTACCT GCCGCAGACG TACATCATCC 
GAGCATGTCA GTGACAAGGA GGCCCTGGGG 
CAACGGGAAA GACACCTACC GGCTCCGGCG 
TCAACAAGCG TGGGGCCAAG AACTGCAATG 
ACCTTCGTGG TGGAGACGCT CATCTGCGGG 
CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
GCCCTCCTCC TTCCCCCTCC TTAGCTTGTA 
AGGTGACATG TCTCTCCATT CCTCTCCAAC 
AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 
CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 
AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 
GAAGGACCGG TTGGGGGAGC CGGGCATGTG 
GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 
AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 
TATTCCTAGA CCTGGGGCTT GAGCTGAGGA 
GGGACCCACC AGAGCACAAG AGAAGGTGGC 
ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 
GGGGATGAGT CCGTCAAGCA CAACTGTTCT 
AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 
CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 
ACTGCTCAGT CACGTCCACG GGGGACGAGC 
TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 
GAGGGAAAGG GAGAGCCACC TGGTACTTGT 
TGAAATTCCA TCCCCCTCAG CTTAGGGGAA 
TCTCACTTTT GCATGTTTTT ACTGATCATT 
GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 
GATGAAACTC TTAAATGCTT TGTATATTTT 
GAAGTGTCTA TAGAACAATA AAAATCTTTT 
AAAAGGGCGG CCG 



BLAST Results 



Entry B64417 from database EMBL: 

CIT-HSP-2023A7.TR CIT-HSP Homo sapiens genomic clone 2023A7. 
Length = 715 
Plus Strand HSPs: 
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Score = 1546 (232.0 bits), Expect = 7.8e-64, P = 7.8e-64 
Identities = 310/311 (99%) 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
raultigene family of 

integral membrane proteins . 



Peptide information for frame 2 



ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGSSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADI IHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMKVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFVVE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_2h3, frame 2 

SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1, Score = 573, P = 1.3e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN)., N = 
1, Score = 560, P = 3.2e-S4 

SWISSNEW: ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN)., N = 1, 
Score = 456, P = 3.3e-43 



>SWISSNEW; ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length = 2 62 

HSPs: 



Score 


= 573 


(86.0 bits), Expect = 1.3e-55, P = 1.3e-55 




Identities = 


= 117/264 (44%), Positives = 172/264 (65%) 




Query: 


1 


MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 


60 






MVK+SF A+A + A+K ++ ++L+ P ++P G 




Sbjct: 


1 


MVKVSFNSALA— HKEAAMKEEENS QVLILPPDAKEPEDVVVPAGHKRAWCWC 


51 


Query: 


61 


LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTQM — 


112 






+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 




Sbjct: 


52 


MCFGLAFMLAGVILGGAYLYKYFAFQQ GGVYFCGIKYIEDGLSLPESGAQLKSARYH 


108 


Query: 


113 


ELEEDVKIYLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTI 


172 






+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 




Sbjct: 


109 


TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRRLTAYLDLSLDKCYVIPLNTSV 


168 


Query: 


173 


VLPPRNFWELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRR 


232 






V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 




Sbjct: 


169 


VMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQR 


228 


Query: 


233 


RATRRRINKRGAKNCNAIRHFENTFVVETLIC 2 64 








+ + I KR A NC IRHFEN F +ETLIC 




Sbjct: 


229 


KEAMKGIQKREAVNCRKIRHFENRFAMETLIC 2 60 








Pedant information for DKFZphutel 2h3, frame 2 
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Report for DKFZphutel_2h3 . 2 



[LENGTH] 

[MW] 

[pi] 

[HOMOLJ 

le-49 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



267 

30253.96 
8.16 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3- 

MYRISTYL 4 
PRENYLATION 1 

CAMP_PHOSPHO_SITE 3 

CK2_PHOSPHO_SITE 3 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 4 

ASN_GLYCOSYLATION 1 
TRANSMEMBRANE 1 
LOW COMPLEXITY 15.36 % 



SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMM 

SEQ LSMGMVVLLMGLVFASVYIYRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . . xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAI RHFENTFVVETLICGVV 

SEG xx 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphutel_2h3 . 2 



PS00001 


169->173 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


50->54 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


187->191 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


232->236 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


49->52 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


209->212 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


227->230 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


235->238 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


30->34 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


110->114 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


209->213 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


119->127 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


52->58 


MYRISTYL 


PDOC00008 


PS00008 


71->77 


MYRISTYL 


PDOC00008 


PS00008 


13B->144 


MYRISTYL 


PDOC00008 


PS00008 


243->249 


MYRISTYL 


PDOC00008 


PS00294 


264->268 


PRENYLATION 


PDOC00266 



(No Pfam data available for DKFZphutel_2h3 .2) 
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DKFZphmcf l_lall 



group: transmembrane protein 

DKFZphmcf l_la 11 encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29A3_3 protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells. 



similarity to YDR255C and SPBC29A3.03C 
membrane regions : 1 

Summary DKFZphmcf l_lall encodes a novel 393 amino acid protein, with 
similarity to YDR255c and SPBC29A3 . 03c . 



similarity to YDR255c and SPBC29A3.03c 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 



Sequenced by DKFZ 

Locus: /map="542.7 cR from top of Chr5 linkage group" 
Insert length: 1819 bp 

Poly A stretch at pos. 1808, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



CCCGGCCCAG 
TCAAACGTCC 
GAGGCCACCA 
CCTGCAGAAG 
AGCTGCTGCA 
CTCCAGGGGA 
CTGCCGGAAG 
ACATTCACAG 
GACTCTGAGA 
ACAGCAGCAG 
AGGGCATGCT 
GTGGACTTGG 
AGCCCTGCAC 
ACAGGCAGCG 
CGACTGCACT 
GGCCCTCAGC 
GGGAGATCCA 
GAGAAGTCAC 
CTGTGAGACC 
AGTCCCCCCT 
TTGATGAACA 
GAATCACAAG 
GGTACCACTC 
TCCAACCCTC 
ACTCAATAAG 
TGGAGCAGAA 
GGAAGGAATT 
CGGTAGGGTG 
CTGAGGAGTT 
GAGGAGGGAG 
GGAAAGGGAG 
TTTGCGTTTG 
CAGCAGTAGA 
ATGCCAATGC 
TGGCCCACCT 
ACTGTAAATA 
ACAAATGTAA 



CCCCCGAAGA 
AGTCCTCGTG 
TGGAGCAGTG 
TTCCTGACCT 
CTACGTGGGC 
CCCCTCTCTC 
ATCAAAGATA 
CAGTGTATCC 
TCTGTGGTGT 
CAGATCCTGC 
CAGCGTGGCC 
ATTTCAAGCA 
GAACAAGACC 
CCTGCTGGAA 
TCATCCGCCT 
TATGCTCGGC 
GGTGATGATG 
CCTACTGCCA 
TTTACCCGGG 
TAGCGTCAGC 
TCAAGGCTGT 
GACGAGTTAC 
CGTGTTCGCT 
CCATCAAGCT 
CTCATTAATG 
CCCGGCAGAT 
TTGTTGAAAG 
GTCAACTTCA 
CCACTGAGGG 
ATGGACCAGC 
ATGCTGGCCT 
ACTTAGTAGC 
CATCCTTCCA 
TATGTCCACC 
CTTCCTCCCA 
GTCCCAGTTA 
AAAAAAAAA 



GCCGCCTCAG 
CGACCGCGCT 
TGCGTGCGTG 
ACGGGCAGCA 
CAGCTGCGGG 
AGCCACCCTC 
CGGTGCAGAA 
CGAGTGGGCA 
TGTGTCAGAT 
AGATGGCCAT 
GAGGAGCTGT 
GCCTTTCCTA 
TGGGTCCTGC 
CTCAACAGCT 
CTTGGCAGGA 
ACTTCCAGCC 
GGCAGCCTGG 
CCTGCTGGAC 
ACGCCTGTTC 
TTTGCCTCTG 
GATTGAGCAG 
CGATTGAGAT 
TGCCCCATCC 
CATCTGTGGC 
GAGGAAAGCT 
GGGAAACGCA 
GGGTTTTCAC 
GTGGACTGTG 
GAGCACTGGA 
CCACGCCTGG 
CTGTGCTCCT 
AACCGACAGA 
CCCCTGCCCT 
CTTGCCCCTC 
CTACAGCCTC 
GAACGGAATG 



CCGGGGGGAG 
GGGTCGGAAG 
GAGAGAGAGC 
CTGTGAGCGG 
CTGAGCTGGC 
TCTCTGGTGA 
ACTGGCTTCG 
AAGCCATTGA 
GCGGTGTGGG 
CGTGGAACAC 
GCCAGGAATC 
GAGTTGAATC 
GTTGGAATGG 
CCCTGGAGTT 
GGCCCCGCGA 
CTTTGCTCGG 
TGTACCTGCG 
AGCAGCCACT 
CCTGCTGGGG 
GCTGTGTGGC 
CGGCAGTGCA 
TGAACTAGGC 
TCCGCCAGCA 
CATGTTATCT 
GAAGTGTCCC 
TCATATTCTG 
CTGTGAGCCT 
GTTGGTTTCA 
GCAGCCCTTT 
CACCTGGCTC 
GCTGTCTTTT 
GTGGCAAGGG 
CAGCCAAGTC 
GGCCCAAGAG 
AACAGTATGT 
CCGTTGTTTT 



TTGCTCGGAC 
TGAGCAGGCT 
TGGACAAGGT 
AGCCTGGAGG 
CAGCGCAGCC 
TGTCACAGTG 
GACCATAAGG 
CAGGAACTTC 
ACGCGCGGGA 
CTGTATCAGC 
AACGCTGAAT 
GAATCCTGGA 
GCCGTCTCCC 
CAAGCTGCAC 
AGCAGCTGGA 
CTGCACCAGC 
GCTGGGCTTG 
GGGCAGAGAT 
CTTTCTGTGG 
GCTGCCTGTG 
CTGGGGTCTG 
ATGAAGTGCT 
GACGTCAGAT 
CCCGAGATGC 
TACTGTCCCA 
ATTCCTACCT 
TGGTCTGTCT 
GAGCGCCTGG 
GGCAGAGGCT 
CATGGCATAA 
CCTGTTTCTG 
ATTTGGTCTT 
TCTTGCTGCC 
TGTCCAGCGG 
ACCATCTCCC 
ATAACTTTGA 



BLAST Results 



Entry HS579359 from database EMBL: 
human STS WI-6350. 
Score - 1027, P = 9.9e-40, identities = 207/209 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 1288 bp; peptide length: 393 
Category: similarity to unknown protein 



1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 
51 TPLSATLSLV MSQCCRK1KD TVQKLAS DHK DIHSSVSRVG KAIDRNFDSE 
101 ICGVVSDAVW DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQESTLNVDL 
151 DFKQPFLELN RILEALHEQD LGPALEWAVS HRQRLLELNS SLEFKLHRLH 
201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 
251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVAL PVLMN 
301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 
351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR IIF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lall , frame 2 

TREMBL:SPBC29A3_3 gene: "SPBC2 9A3 . 03c" ; product: "hypothetical 
protein"; S.pombe chromosome II cosmid c29A3., N = 2, Score = 302, P 
3.4e-42 

PIR:S67312 probable membrane protein YDR255c - yeast ( Saccharomyces 
cerevisiae), N = 1 , Score = 271, P = 5.3e-22 

TREMBL : CET07D1_2 gene: "T07D1.2"; Caenorhabditis elegans cosmid 
T07D1., N = 1, Score = 193, P = 5.6e-13 



>TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c29A3. 
Length = 398 

HSPs: 



Score 


= 302 


(45.3 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities = 


= 55/142 (38%), Positives = 89/142 (62%) 




Query: 


252 


YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 


311 






Y +LD W + F R+ C+ LG+S+ESPL + +G +ALP+L+ + ++++++ 




Sbjct: 


258 


YIDVLDLD-WKSLELLFVREFCAALGMSLESPLDI VVNAGAIALPI LLKMSSIMKKKHTE 


316 


Query: 


312 


GVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVISRDALNKLING 


371 






W + ELP+EI L +HSVF CP+ ++Q ++ NPP+ + CGHVI +++L +L 




Sbjct: 


317 


— WTSQGELPVEI FLPSS YHFHSVFTCPVSKEQATEENPPMMMSCGHVI VKESLRQLSRN 


374 


Query : 


372 


G — KLKCPYCPMEQNPADGKRI I F 393 








G + KCPYCP E AD R+ F 




Sbjct: 


375 


GSQRFKCPYCPNENVAADAIRVYF 398 




Score 


= 161 


(24.2 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities = 


= 51/221 (23%), Positives = 102/221 (46%) 




Query: 


22 


GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 


81 






G C L EL +++L+P++LVCK+ L K 




Sbjct: 


15 


GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA — CSEKTQQVFDDLKRTEKK 


67 


Query : 


82 


I HS S VS RVGKAI DRNFDS EI CG VVS DAVWDAREQQQQI LQMAI VEHL YQQGMLS VAEELC 


141 






H+S++R GK +++ F+ ++ + + +++++++ + A+ H ++QG + +A C 




Sbjct: 


68 


FHTSLNRFGKTLEKKFNFDLEDIKLHSSFESKKRE IDTALSLHFFRQGDVELAHLFC 


124 


Query: 


142 


QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 


201 






+E+ + + F L I++ + ++DL +EWA R L SSLE+ L + 




Sbjct: 


125 


KEAGIEEPSESLHVFTLLKSIVQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 


184 


Query: 


202 


IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 242 








+ K+A+YR+ F4H +IQ M +L + 
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Sbjct: 185 VSNYL — TTKDIMAAIRYCRTNMAEFQKKHLADIQKTMIALFF 225 
Pedant information for DKFZphmcf l_lall, frame 2 



Report for DKFZphmcf l_lall . 2 



[LENGTH] 


393 






[MW] 


44414.77 






[pi] 


6.15 






[HOMOL] 


TREMBL : SPBC29A3 3 gene: "SPBC29A3. 


.03c"; product: "hypothetical protein" 


S.pombe chromosome II cosmid c29A3. 


2e-39 




[ FUNCAT] 


99 unclassified proteins (S. 


cerevisiae, YDR255C] 8e-23 


[PIRKW] 


transmembrane protein 


2e-21 




[PROSITE] 


MYRI STYL 2 






[PROSITE] 


AMI DAT I ON 1 






[PROSITE] 


CK2 PHOSPHO SITE 


3 




[PROSITE] 


PROKAR LIPOPROTEIN 


1 




[PROSITE] 


TYR PHOSPHO SITE 


3 




[PROSITE] 


PKC PHOSPHO SITE 


1 




[PROSITE] 


ASN GLYCOSYLATION 


1 




[KW] 


TRANSMEMBRANE 1 







SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKI KDTVQKLAS DHKDIHSSVS RVGKAI DRNFDSEICGVVSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhccccceeeechhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYQQGMLSVAEELCQESTLNVDLDFKQPiTLELNKILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLHFIRLLAGGPAKQLEALSYARHFQPFARLHQREIQVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 

PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ IKAVIEQRQCTGVWNHKDELPIEIELGMKCWYHSVFACPILRQQTSDSNPPIKLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhcccccccc c-ccceee 

MEM MMMMMM 

SEQ SRDALNKLINGGKLKCPYCPMEQNPADGKRIIF 

PRD eehhhhhhhccccccccccccccchhhhhcccc 

MEM 



Prosite for DKFZphmcf l_lall . 2 



PS00001 


189->193 


PS00005 


180->183 


PS00006 


28->32 


PS00006 


135->139 


PS00006 


190->194 


PS00007 


211->219 


PS00007 


27->36 


PS00007 


244->253 


PS00008 


37->43 


PS00008 


50->56 


PS00009 


387->391 


PS00013 


282->293 



ASN_GLYCOS YL AT I ON 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

PROKAR LIPOPROTEIN 



PDOC00001 
PDOC00005 
PDOC00006 
FDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC.00013 



(No Pfam data available for DKFZphmcf l_lall . 2) 
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DKFZphmcf l_lc23 



group: mammary carcinoma derived 

DKFZphmcf l_lc23 . 1 encodes a novel 311 amino acid proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 



unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3077 bp 

Poly A stretch at pos . 3067, polyadenylation signal at pos . 304B 



1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 
101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 
151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 
201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 
251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 
301 GGGCCTCCCA GGGAGGACGT AGGTGCGCCC CTGGTCACGC CCTCGCTCCT 
351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 
401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 
451 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 
501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 
551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 
601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 
651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 
701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 
751 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 
801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 
351 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 
901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 
951 CCCGGGCTCA GACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 
1351 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 
1401 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
1451 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGGGGCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
1851 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2 051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
2 451 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2 551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2 601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2 651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2 701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2 751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 

2 801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 

2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAAAAGC CTCTGATTTT 

2 901 TAATTTCCAT AAAATGTTAG AAGTATATAT ATACATATAT ATATTTCTTT 
2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 

3 001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 49 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 



1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 
51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 
101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGLHAA 
151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 
201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 
251 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23 , frame 1 

PIR:S49915 extensin-like protein - maize, N = 1, Score = 215, P = 
6.1e-15 

PIR:A28996 proline-rich protein M14 precursor - mouse, N = 1, Score = 
191, P = 3.8e-13 



>PIR:S49915 extensin-like protein - maize 
Length = 1, 188 

HSPs: 

Score = 215 (32.3 bits), Expect = 6.1e-15, P = 6.1e-15 
Identities = 81/269 (30%), Positives = 115/269 (42%) 



Query: 


5 


Sbjct: 


598 


Query: 


56 


Sbjct: 


655 


Query: 


116 


Sbjct: 


712 


Query: 


175 


Sbjct: 


772 


Query: 


234 


Sbjct: 


824 


Score 


= 206 



PPP 



--VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP — DPPP A 55 

V SP P P SP PA +SS ++ PP +P PPP + 



PP P PA S P 



PP + + P + PS 



P+ 



PS P++P+ 



+ ++SP PAP S 



+LA 



S + + PP 



+Q+ P +P++ L V+ + + PP AP 

-PPPVQVSSPPPAPKSSPPLAP— VSSPPQVEKTSPPPAPL 823 



SP 



P V V PPP 



S P 



P+++PP 



206 (30.9 bits), Expect = 9.1e-14, P = 9.1e-14 
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Identities = 82/261 (31%), Positives = 108/261 (41%) 



Query : 


17 


PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 


69 






P P G P SP + PAAS+ ST + P P+P P P P P P +P 




Sb j ct : 


410 


PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 


468 


Query : 


70 


AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 


128 






+ P PV G S P V P + +V+L AP G+P P + ++P P 




Sb j ct : 


469 


DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 


528 


Query: 


129 


LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 


188 






+ G SP P P S + +K+ AG + P PPE P PP AS 




Sbjct: 


529 


I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP--PPEKSPPPPAPVASPPP 


577 


Query : 


189 


FIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVARKPS- 


247 






+ S L P P ++ VA + PP P SP P PVA P 




Sbjct : 


578 


PVKSPPPPTLVASPP-- PPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 


635 


Query : 


248 


VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 








+ PPP +SP P P PP P ++ 




Sbjct: 


636 


MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 




Score 


= 202 


(30.3 bits), Expect = 2.9e-13, P = 2.9e-13 





Identities = 81/254 (31%), Positives - 110/254 (43%) 

Query: 16 SPEPAGPSGSPELV--SSP— AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 70 

SP PA P SP L SSP SS ++ PP +P PP P PA S P HV+ 
Sbjct: 817 SPPPA-PLSSPPLAPKSSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKPA SPPAHVS 872 

Query: 71 KLPQ KEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQ 126 

P+ P + PP E +P TP L ++S P +P + P + 

Sbjct: 873 SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 932 

Query: 127 KPLRRAL SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 183 

P+ + + ++SP PAP S A K+ A L P PPE + PP +P 

Sbjct: 933 PPVVVSSPPPTVKSSPPPAPVSSPPATP--KSSPPPAPVNL P — PPEVKSSPPPTP 984 

Query: 184 AST AS FI FSKGSRKLQLERPVS PETQADLQRNLVAELRSI SEQRPPQAPKKS PKAPPPVA 243 

S+ + P PE ++ V+ + PP AP SP PPPV 

Sbjct: 985 VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPVK 1042 

Query: 24 4 RKPS VGVPPPASPSYPRAEPLTAPP 268 

P V PPP S P P+++PP 
Sbjct: 1043 SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 

Score = 190 (28.5 bits), Expect = 7.9e-12, P = 7.9e-12 
Identities = 74/264 (28%), Positives = 111/264 (42%) 



Query: 


5 


PPPEEAFFSVA3PEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 


63 






PPP S PE + P P +P + T+++ PP PP P+P 




Sbjct: 


639 


PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 


698 


Query: 


64 


SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 


123 






P K P K PP+E V +P TP V +P PTP P 




Sbjct: 


699 


QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK— SSPPPAPVSSP--PPTPVSSPP 


753 


Query : 


12 4 


APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 


183 






A P+ S ++SP PAP S A ++K+ + + + P PP + PP +P 




Sbjct: 


754 


A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP-- PPAPKSSPPLAP 


806 


Query: 


184 


AST AS FI FSKGSRKLQLERP- VS PETQADLQRNLVAELRS I SEQRPPQAPKKS PKAPPPV 


242 






S+ + LP + + P++ +V+ + + PP AP SP P 




Sbjct : 


807 


VSSPPQVEKTSPPPAPLSSPPLAPKSSPP — HVVVSSPPPVVKSSPPPAPVSSPPLTPKP 


864 


Query : 


243 


ARKPS-VGVPP PASPSYPR AEPLTAPP 268 








A P+ V PP P++P P +EP ++PP 




Sbjct : 


B65 


ASPPAHVSSPPEVVKPSTPPAPTTVISPPSEPKSSPP 901 




Score 


= 189 


(28.4 bits). Expect = 1.0e-ll, P = 1.0e-ll 




Identities = 


= 86/271 (31%), Positives = 112/271 (41%) 




Query: 


5 


PPPEEAFFSVASPEPAGPSGSPEL-VSSP — AASSSSATALQIQPPG — SPDPPPAP 


56 






PPP AS P P S P + VSSP A SS A PP PPPAP 




Sbjct : 


768 


PPP — APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 


825 


Query : 


57 


PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAP 


116 






P AP SS P V P PV S PP V +P +TP V +P 




Sbjct: 


826 


PPLAPKSSPPHVVVSSPP — PVVKSS PPPAPVSSPPLTPKPASPPA — HVSSPPEVV 


878 


Query: 


117 


TPALGPSAPQKPLRRALS GRASP VPAPSSGLHAAVRLKAC-SLAASEGL SSAQP 


169 






P+ P AP + ++SP P P S V+ ++ +S + SS P 
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Sbjct: 879 KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPVVV 937 

Query: 170 -NGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 228 

+ PP + PP +P S+ + P PE ++ V+ + P 

Sbjct: 938 SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 996 

Query: 229 PQAPKKSPKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P AP SP PPP + P V PPP S P P+++PP 

Sbjct: 997 PPAPMSSP--PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 

Score = 181 (27.2 bits), Expect = 8.8e-ll, P - 8.8e-ll 
Identities = 73/277 (26%), Positives = 105/277 (37%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKL PQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGA 111 

PPAP + SPV++ PKP + GPP+ P P ++S 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 534 

Query: 112 PG--GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 166 

P +PP+ PP+ + PPS AV ++ + 

Sbjct: 585 PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 644 

Query: 167 AQPNGPPEAEPRPPQSPASTASFI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQ 226 

PPE P PP PA + + ++ PE L+ + 

Sbjct: 645 VSSPPPPEKSP-PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 702 

Query: 227 RPPQAPKKSPKAPP-PVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP P K P +P P K V PP S P P+++PP 
Sbjct: 703 TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 745 

Score = 177 (26.6 bits), Expect = 2.6e-10, P = 2 . 6e-10 
Identities = 78/264 (29%), Positives = 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP--DPPPAP-- 56 

PPP +P+PA P S PE+V P+ + T I PP P PPP P 

Sbjct: 850 PPPAPVSSPPLTPKPASPPAHVSSPPEVVK-PSTPPAPTTV — I SPPSEPKSSPPPTPVS 906 

Query: 57 -PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

P P SS P + P P PP V IP P++ V +P 

Sbjct: 907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPVVVSSP — PPTVKSSPPPAPVSSPPAT 959 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P + P+ • P ++SP PPS A + S +SS P PPE 

Sbjct: 960 PKSSPPPAPVNLPPPEV KSSPPPTPVSSPPPAPK SSPPPAPMSSP-P— PPEV 1009 

Query: 176 EPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKS 235 

+ PP +P S+ + P P ++ V+ + PP AP S 

Sbjct: 1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 

Query: 236 PKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P PPPV P V PPP S P P+++PP 
Sbjct: 1069 P— PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 

Score = 177 (26.6 bits), Expect = 2.6e-10, P = 2.6e-10 
Identities - 82/267 (30%), Positives = 110/267 (41%) 

Query: 17 PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P PPP +P 
Sbjct: 410 PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 468 

Query: 70 AKLPQKEPV-GCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+P PV G S P V P + +V+L AP G+P P + ++P P 

Sbjct: 4 69 DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP — PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPV SPETQADLQRNLVAELRS ISEQRPPQA PK 233 

+ S L P SPA+ + ++S ++ PP P 

Sbjct: 578 PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 636 

Query: 234 KSPKAPPPVARKP SVGVPPPASPSYPRAEPLTAPPTN 270 

KSP P PV+ P PPP + S P E PPT+ 

Sbjct: 637 KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTS 676 

Score = 170 (25.5 bits), Expect = 1.6e-09, P = 1.6e-09 
Identities = 78/279 (27%), Positives = 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

PP S S + P 4-P + P SS A+ PP +P +PP P SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS — SPP-PVVVSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG — GAPTPALGP 122 

P V P PV PP +P PL ++S P +P PA 

Sbjct: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG— RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

Sbjct: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS--P--PPPVKSPPP 1046 

Query: 181 QS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP 240 

+ P S+ + P P ++ V+ + PP AP SP PP 

Sbjct: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP--PP 1103 

Query: 241 PVARKPS VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA PS P P+++PP P + ++ L 
Sbjct: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score = 169 (25.4 bits). Expect = 2.1e-09, P = 2.1e-09 
Identities = 75/266 (28%), Positives = 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 
Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP--PPPEKSPPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P P P P ++ P PAP + V+ S ++S P P + 

SbjCt: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP— PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK- -PS VGVPPPAS PS YPRA — EPLTAPP 268 

P +PPP + PS PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score = 168 (25.2 bits), Expect = 2.7e-09, P = 2.7e-09 
Identities = 75/267 (28%), Positives = 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

Sbjct: 496 ASTPPP— SLVKLSPPQAPVGSPPPPVKTTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

Sbjct: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

Sbjct: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 17 9 PP — QS PASTAS FIFSKGSRKLQLERPV SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP+PSSKLPSPQ S ++P +P 

Sbjct: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP — SPP 721 

Query: 234 KSPKAPPPVARKPS VGVPPPAS PS YPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
Sbjct: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits), Expect = 4.6e-09, P = 4.6e-09 
Identities = 81/268 (30%), Positives = 108/268 (40%) 

Query: 5 PPPEEAF FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP-- 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 --APPAPAPASSAPGHVAKLPQKEPVGC SKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

+PP PAP +S+P + P PV K PP P ++S 

Sbjct: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 67 9 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P + PPLPSP P + + ++P PSS + + S SS 

Sbjct: 680 SPPPEKSLPPPTLIPSPP — PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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Query: 


168 


QPNGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQR 227 






D DD C-riJ_7\ J_C C W D 4- D 4- 4.-1. 4. 

f f ir bF + A T 0 0 IS. f 1 r T T *r T 




Sbjct: 


737 


FPAPVbSPPPTPVSSPPALAP--VbbFFbVi\.bb — Ft'WrLabf r FAFtJVKbbFFFVyVbb 


/ y j 


Query: 


228 


PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 








DD IDK CD D4-7i D DD -4- P PT ■( l-PP 

c if t\trt\ or F+A ir V rr t ir trliT-rff 




Sbjct: 


794 


FPPAPKSbP PLA — P-VbbPFUVbK I bFFFAFlibbFF oz / 




Score 


= 165 


(24.8 bits). Expect = 6.0e-09, P = 6.0e-09 




Identities = 


\ z. Z7 =5 J , r05lLlveS — lUJ/tol ( J J 




Query: 


5 


PPPEEAFFSVASPEPAG-PSGSP--ELVSSPAASSSSATALQIQPPGSPDPPP-APPAPA 


60 






PPP + ++PPGPS P +VS PS P GSP PP +PP PA 




Sbjct: 


517 


FFF V is. 1 1 brrA^lborbrrrr V b V V^lrtrlrtrvi\Z>tririr rA tr vUbir irir FEj1\o ctr trcA 


0 i u 


Query: 


61 


PASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAP LVTPSLLQMVRLRSVGAPGG 


114 






P +S P V P V PP V +P + +P V AP 




Sbjct: 


571 


DT77\ C OODDW7C DDODTT ITTi C DDDDt/[/C D D D D^ Dt7R C DDD DWV C DDDDTD177A C DD DD Ti D*7R 




Query: 


115 


APTPALGPSAPQKPLRRALSGRASPVPAP SSGLHAAVRLKACSLAASEGLSSAQPNG 


171 






+ P + P P+ SP P P S+ 5+ +S + P 




Sbjct: 


631 


SSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP — 


688 


Query: 


172 


PPEAEPRPPQS PASTAS FI FSKGSRKLQLERPVSPETQADLQRNLVAELRST SEQRPPQA 


231 






nn n nn Ti rr r\ ^ n 0 i i ill i nn 7v 

PP P PP I bK P bFb + + V+ +FPA 




Sbjct: 


689 




739 


Query: 


232 


PKKSPKAPPPVARKPSVGV--PPPASPSYPRAEPLTAPP 268 








F b F F rvr F + + FF + bF FJj++ FF 




Sbjct: 


740 


PVSSPP-PTPVSSPPALAPVSSPPSVKSSPPPAPLSSPP 777 




Score 


- 162 


(24.3 bits), Expect = 1.36-08, P - 1.3e-08 




Identities = 


- iG/z/z (z/s), positives — yy/z/z (Job) 




Query : 


2 


ADFPPPEEAFFSVASPEPAG-PSGSPFLVSSPAASSSSATALQTQPPGSPDPPPAPPAPA 


60 






A P P SPEP PSP P + SA PPPP +PPA + 




Sbjct: 


427 


AbAPMFS Pri I F PDVbPLrLPbPbFVFArArMrMr 1 PHbr PADDY VFF 1 Fr V FGKbrrAl b 


4 86 


Query: 


61 


PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTP— 


118 






P+ A P V S PP+ VG+P P V+ S AP G+P+P 




Sbjct: 


487 


PSPQVQPPAASTPPPSLVKLS PPQAPVGSP — PPP VKTTSPPAPIGSPSPPP 


536 


Query: 


119 


nLi\3tr&r\tr\JS\ tr ljt\t\nLi OoKnO c v rnrSOUbnAAV r\LiC\.A^di4AHOllAj.udO A^ Jr LNo Ir tr Li 


174 






+ PPKPAGSPPS A S ++PP 




Sbjct: 


537 


PVS VVS PPP PVKS PPP PAP VG--SPPPPEKSPPPPAPVASPPPPVKSPPPPTLVAS PPPP 


594 


Query: 


175 


AEPRPPQSPASTASFIFSKGSRKLQLERPVSPEIQAULQRNLVAELRSISEQRPPQAPKK 


234 






+ PP +P ++ + P P A + + PP P+K 




Sbjct: 


595 


VKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTPVSSPPP-PEK 


653 


Query : 


235 


SPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPPTNGLP 273 








SP PPP P PP P+ P + + PP LP 




Sbjct: 


654 


SPPPPPPAKSTP PPEEYPTPPTSVKSSPPPEKSLP 688 





Score = 159 (23.9 bits). Expect = 2.8e-08, P = 2.8e-08 
Identities = 77/264 (29%), Positives = 103/264 (39%) 



Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP — DPPPAP PAP 59 

PPP V+SP P P SP P SS ++ PP +P PP P P P 

Sbjct: 916 PPPA MVSSP-PMTPKSSPP PVVVSSPPPTVKSSPPPAPVSSPPATPKSSPPP 966 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP + P V P PV S P AP+ +P + V+ AP +P P 

Sbjct: 967 APVNLPPPEVKSSPPPTPVS-SPPPAPKSSPPPAPMSSPPPPE-VKSPPPPAPVSSPPPP 1024 

Query: 120 LGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEG LSSAQPNGPPEA 175 

+ P P+ ++ P PAP S V+ S +SPP+ 

Sbjct: 1025 VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 176 EPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP A S ++ P P A + A ++ S PP AP S 

Sbjct: 1085 PPPPVKSPPPPAPV SSPPPPIKSPPPP APVSSPPPAPVKPPS— LPPPAPVSS 1135 

Query: 236 PK — APPPVARKPSVGVPPPA-SPS YPRAEPLTAPP 268 

P P+K +PPPA S P + PP 

Sbjct: 1136 PPPVVTPAPPKKEEQSLPPPAESQPPPSFNDIILPP 1171 

Score - 143 (21.5 bits), Expect = 1.8e-06, P = 1.8e-06 
Identities = 59/179 (32%), Positives = 77/179 (43%) 
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Query: 


3 


Sbjct: 


970 


Query: 


56 


Sbjct: 


1028 


Qu e r y : 


116 


Sbjct: 


1085 


Query: 


175 


Sbjct: 


1140 


Score 


= 133 


Identities ■ 


Query: 


1 


Sbjct: 


1001 


Query : 


55 


Sbjct: 


1056 


Query : 


109 


Sbjct: 


1114 


Score 


= 110 


Identities - 


Query : 


5 


Sbjct : 


1060 


Query : 


55 


Sbjct : 


1120 


Query: 


109 


Sbjct: 


1177 


Score 


= 108 


Identities = 


Query : 


114 


Sbjct: 


408 


Query : 


172 


Sbjct: 


465 


Query : 


228 


Sbjct: 


525 



+ PPPE 



S P P + P + P+ PA SS ++ PP +P PPP + 

-SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 



PP PAP SS P 



PV 



PP + 



AP 



55 

1027 

115 

1084 

174 

1139 



(20.0 bits), Expect = 2.3e-05, P = 2.3e-05 
50/132 (37%), Positives = 59/132 (44%) 



M+ PPPE 



V SP P PS 



V SP 



-AASSSSATALQIQPPGSP--DPPP 
A SS ++ PP +P PPP 



--APPAPAPASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAPLVTPSLLQMVRLRS 

+PP PAP SS P V P PV PP V +P P + 

i?KSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP— PPPIKSPPPPAP 



54 

1055 

108 

1113 



V +P AP 



P + L P AP 



(16.5 bits). Expect = 8.0e-03, P = 8.0e-03 
■■ 41/121 (33%), Positives = 49/121 (40%) 

PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP — DPPP 

PPP S V SP P PS P V SP A SS ++ PP +P PPP 



AP PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRS 

AP P PAP SS P V P K+ + PP E P +L + 

APVKPPSLPPPAPVSSPPPVVTPAPPKKE EQSLPPPAESQPPPSFNDIILPPTMANK 



+ P 



54 

1119 

108 

1176 



(16.2 bits), Expect = 1.3e-02, P = 1.3e-02 
46/155 (29%), Positives = 67/155 (43%) 



G PTP GP + 



P + A S 



+P+PfP + 



S ++Q 



S + A 



P+ 



171 
464 
227 
524 



PP AP SP PPPV 



SV PPP S P P+ +PP 
-SVVSPPPPVKSPPPPAPVGSPP 560 



Pedant information for DKFZphmcf l_lc23, frame 1 
Report for DKFZphmcf l_lc23 . 1 



[LENGTH] 
[MW] 

[pi] 
[KW] 
[KW] 



311 

31534.58 
9.48 

All_Alpha 

LOW COMPLEXITY 



38.59 % 



SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx. . xxxxxxxxxxxx. . . .xxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 



SEQ 

SEG 



PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPAL 
xxxxxx xxxxxxxxxxx 



552 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PRD cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

SEQ GPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 

SEG xxxxx xxxxxxxxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ QS PASTAS FIF5KGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 

SEG xxxxx xxxxxxxxxxxxxxx 

PRD ccccccceeeecccchhhhhccccccchhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceeeccccccccc 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 



(No Prosite data available for DKFZphmcf l_lc23 . 1) 
(No Pfam data available for DKFZphmcf l_lc23 . 1) 
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DKFZphmcf l_lel5 



group: transmembrane protein 

DKFZphmcf l_lel5 encodes a novel 454 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound. 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 
transport into cells. 

similarity to D-XYLOSE TRANSPORTER 
membrane regions: 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E12646) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1957 bp 

Poly A stretch at pos . 1947, polyadenylation signal at pos . 1929 



1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 
51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 
101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 
151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 
201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 
251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 
301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 
351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 
401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 
451 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 
501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 
551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 
601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 
651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 
701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 
751 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 
801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 
851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 
901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 
951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 
1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 
1051 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 
1101 CACCTGCCTG ACCGGAGTCC TGGGTGTGGG CCTGGGTGTG GAGATCAGCC 
1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 
1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 
1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 
1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 
1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 
1401 GCTGGGTGAT GCTGGGAGCC CCTACCTCAT TGGCCTGATC TCTGACCGCC 
1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 
1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 
1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 
1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 
1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 
1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 
1751 GCCCACCCCA CGAAGGGCCT GGGCCTAACC CCTTGGCCTG GCCCAGCTTC 
1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 
1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 
1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 



BLAST Results 



Entry E12 64 6 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score = 3046, P = 2.2e-131, identities = 640/659 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 340 bp to 1701 bp; peptide length: 454 
Category: similarity to known protein 



1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS TGNPKSEEPE VPDQEGLQRI 

51 TGLSPGRSAL IVAVLCYINL LNYMDRFTVA VFISSYMVLA PVFGYLGDRY 

101 NRKYLMCGGI AFWSLVTLGS SFIPGEHFWL LLLTRGLVGV GEASYSTIAP 

151 TLIADLFVAD QRSRMLSIFY FAIPVGSGLG YIAGSKVKDM AGDWHWALRV 

201 TPGLGVVAVL LLFLVVREPP RGAVERHSDL PPLNPTSWWA DLRALARNLI 

251 FGLITCLTGV LGVGLGVEI S RRLRHSNPRA DPLVCATGLL GSAPFLFLSL 

301 ACARGSIVAT YIFIFIGETL LSMNWAIVAD ILLYVVIPTR RSTAEAFQIV 

351 LSHLLGDAGS PYLIGLISDR LRRNHPPSFL SEFRALQFSL MLCAFVGALG 

401 GAAFLGTAIF IEADRRRAQL HVQGLLHEAG STDDRIVVPQ RGRSTRVPVA 

451 SVLI 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4_1 gene: "C13C4.5"; Caenorhabditis elegans cosmid C13C4, 
N = 3, Score = 441, P = 5.2e-76 

TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N = 2, Score = 449, P = 8.2e-69 

TREMBL : CEF09A5_1 gene: "F09A5.1"; Caenorhabditis elegans cosmid F09A5, 
N = 3, Score = 413, P = 9.1e-60 

TREMBL :ATF6H11_1 8 gene: VF6H11 . 180"; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N = 3, Score = 193, P = 2.5e-24 

SWISSPROT:XYLT_LACBR D-XYLOSE- PROTON SYMPORT (D-XYLOSE TRANSPORTER)., N 
- 1, Score = 180, P - 7.9e-ll 



>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length = 488 

HSPs: 



Score 


= 449 


(67.4 bits), Expect = 8.2e-69, Sum P(2) = 8.2e-69 




Identities ■ 


■ 88/204 (43%), Positives = 125/204 (61%) 




Query: 


58 


SALIVAVLCYINLLNYMDRFTVAVFISS YMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 


117 






+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 




Sbjct: 


29 


AGVLTQVQTY YNISDSLGGLIQTVFLI SFMVFSPVCGYLGDRFNRKWIMI IGVGIWLGAV 


88 


Query: 


118 


LGSSFIPGEHFWLLLLTRGLVGVGEAS YSTIAPTLI ADLFVADQRSRMLS I FYFAI PVGS 


177 




LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 




Sbjct : 


89 


LGSSFVPANHFWLFLVLRSFVGIGEAS YSNVAPSLISDMFNGQKRSTVFMT FYFAI PVGS 


148 


Query: 


178 


GLGYIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVER HSDLPPL 


233 




GLG+I GS V + G W W +RV+ G++ ++ L L EP RGA ++ D+ 




Sbjct: 


149 


GLGFI VGSNVATLTGHWQWGIRVSAIAGLIVMIALVLFTYEPERGAADKAMGESKDVVVT 


208 


Query : 


234 


NPTSWWADLRALARNLI FGLITCLTG 259 








T++ DL L + L-t- C G 




Sbjct: 


209 


TNTTYLEDLVILLKTPT — LVACTWG 232 




Score 


= 267 


(40.1 bits), Expect = 8.2e-69, Sum P(2) = 8.2e-69 




Identities = 


= 74/212 (34%), Positives = 113/212 (53%) 




Query: 


249 


LIFGLITCLTGVLGVGLGVEISRRL RHSNPRADPLVCATGLLGSAPFLFLSL 


300 




L FG IT G++GV G +S+ L R RA PLV G L +APFL + + 




Sbjct : 


277 


LYFGAITTAGGLIGVIFGSMLSKWLVAGWGPFRRLQTDRAQPLVAGGGALLAAPFLLIGM 


336 


Query: 


301 


ACARGSIVATYIFI FIGETLLSMNWAI VADILLYVVIPTRRSTAEAFQIVLSHLLGDAGS 


360 
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S+V YI IF G T + NW + D+L V+ P RRSTA ++ +++SHL GDA 



Sbjct: 


337 


IFGDKSLVLLYIMIFFGITFMCFNWGLNIDMLTTVIHPNRRSTAFS YFVLVSHLFGDASG 


396 


Query: 


361 


PYLIGLISDRLRRN — WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRR — 


416 






PYLIGLISD +R +P ++ +L + C + L + +++ + +DR+ 




Sbjct: 


397 


PYLIGLISDAIRHGSTYPKD QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 


453 


Query: 


417 


RAQLHVQGLLHEA — GSTD — DRI VVPQRGRSTRV 447 








RA++ + L + STD +RI + S+R+ 




Sbjct: 


454 


RAEMGLDDLQSKPIRTSTDSLERIGINDDVASSRL 488 




Score 


= 70 


(10.5 bits), Expect = 5.9e-24, Sum P(2) = 5.9e-24 




Identities : 


= 25/89 (28%), Positives = 41/89 (46%) 




Query: 


62 


VAVLCYINLLNYMDRFTVAVFI SSYMVLAPVFGYLGDRYNRKYLMCGGI AFWSLVT — LG 


119 






V L +NLLNY+DR+TVA ++ + LG +L+ +S V LG 




Sbjct: 


11 


VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI — SFMVFSPVCGYLG 


68 


Query: 


120 


SSFIPGEHFWLLLLTRGLVGVGEASYSTIAP 150 








F W++++ G + +G S+ P 




Sbjct: 


69 


DRF NRKWIMIIGVG-IWLGAVLGSSFVP 95 





Pedant information for DKFZphmcf l_lel5, frame 1 



Report for DKFZphmcf l_lel5 . 1 



[LENGTH ] 454 

[MW] 49013.35 

[pi] 7.66 

[HOMOL] TREMBL:CEC13C4_1 gene: "C13C4.S",- Caenorhabditis elegans cosmid C13C4 2e-51 

[BLOCKS] BL01022D 

[PROSITE] MYRISTYL 11 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] TRANSMEMBRANE 8 

[KW] LOW_COMPLEXITY 15.42 % 



SEQ MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 

SEG xxxxkxxkxxxxxxxx 

PRD cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



SEQ IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 

SEG 

PRD hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 



SEQ SFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAIPVGSGLG 

SEG xxxxxxxxxxxx 

PRD cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 

MEM MMMMMMMMMMMMMMMM XMMMMMMMMMMMMMMM 



SEQ YIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVERHSDLPPLNPTSWWA 

SEG xxxxxxxxxxxxx 

PRD eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 

MEM MMjyMMMMMM 



SEQ DLRALARNLIFGLITCLTGVLGVGLGVEI SRRLRHSNPRADPLVCATGLLGSAPFLFLSL 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 

MEM MMMMMMMMMM1MMMMMMMMMMMMMM 

SEQ ACARGSIVATYIFIFIGETLLSMNWAIVADILLYWIPTRRSTAEAFQIVLSHLLGDAGS 

SEG 

PRD ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchhhhhhcccccccccccc 

MEM MMMM MMMMMMMMMMMMMKMMMMMMM MMMMMMMMMMMMM 

SEQ PYLIGLISDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRRRAQL 

SEG xxxxxxxxxxxxx 

PRD ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhh 

MEM MMMMMMMM MM 



SEQ HVQGLLHEAGSTDDRI VVPQRGRSTRVPVASVLI 
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SEG 

PRD hhhhhhhhccccceeeeeeccccccceeeeeccc 
MEM MMMMM^1M4MM^1MMMMMMMMMMMM^^M^M 



Prosite for DKFZphmcf l_lel5 . 1 



PS00002 


177->181 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


340->344 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


270->273 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


339->342 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


368->371 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


444->447 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


11->15 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


342->346 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


431->435 


CK2 PHOSPHO" 


[site 


PDOC00006 


PS00008 


26->32 


MYRISTYL 




PDOC00008 


PS00008 


32->38 


MYRISTYL 




PDOC00008 


PS00008 


52->58 


MYRISTYL 




PDOC00008 


PS00008 


139->145 


MYRISTYL 




PDOC00008 


PS00008 


176->182 


MYRISTYL 




PDOC00008 


PS0Q008 


252->258 


MYRISTYL 




PDOC00008 


PS00008 


262->268 


MYRISTYL 




PDOC00008 


PS00008 


266->272 


MYRISTYL 




PDOC00008 


PS00008 


288->294 


MYRISTYL 




PDOC00008 


PS00008 


305->311 


MYRISTYL 




PDOC00008 


PS00008 


397->403 


MYRISTYL 




PDOC00008 


PS00013 


292->303 


PROKAR LIPOPROTEIN 


PDOC00013 



(No Pfam data available for DKFZphmcf l_lel5 - 1 ) 
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DKFZphmcf l_lgl3 



group: mammary carcinoma derived 



DKFZphmcf l_lg 13 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0543 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes. 



similarity to KIAA0766 



commplete cDNA, complete cds, few EST hits 

on genomic level encoded by AC005020, no splicing, genomic? 



Sequenced by DKFZ 
Locus : unknown 



Insert length: 2210 bp 

Poly A stretch at pos . 2200, polyadenylation signal at pos . 2176 



1 GAAACCTGAT CTCATAAAAC 
51 GACCCTATTT GGATCAAGTG 
101 CTGAATCAAG GGATACTACA 
151 GAAGGCATCG TGATAGTGAA 
201 TCAAAAGGAA AGAAACAAAG 
251 CTACAACTAT GAATGAGAGA 
301 AGAGTGGCAA AAGAGAAAAT 
351 TCCAGCATGT ATGGACATGG 
401 ATAAACTAAG AACTATACCT 
451 TGTACGATTG CAAAACATTT 
501 CGGTATAGAC TTTGCAATCC 
551 GTCCCACACT CTTGGTTTAT 
601 GAGGATCTCT TATGTTGTTT 
651 TTTATTTACT GAATTAGAAA 
7 01 GGAAACATTG TAAAGGAATT 
7 51 AAACACAGCA GACTTACTGA 
801 TGTTTGGAAT CACTGTTTTA 
851 TTTCACCAAG TCTGATGGAT 
901 TTTATTAAAG GAAGCTCACT 
951 AGAGATTGGA GTGAACCACA 
1001 GGCTTTCTCA AGGAAAAGTA 
1051 ATTTACATTT TTCTCGTTGA 
1101 AGACGACATT TGGGTAACAA 
1151 TTCTTAATGA ATTAAGCCTG 
1201 CAGTATCTTG AACATATTCT 
1251 AGCAAGACTT AAAAGTAACC 
1301 TGCAACACAT CGAAGAGAAC 
1351 AAATTAGAGA TATTGTTGCA 
14 01 TTACTTTCCG GAAGAGAAAT 
1451 AAGATCCATT TGCTTTTCAA 
1501 GAGCCTGAAG AAGAGAATGA 
1551 AAAGAATTAT TATAAGATAT 
1601 AAGATGACTT TCCACTGCTA 
1651 TTCACAACTA CATATTTGTG 

17 01 AAAAACAAAG AAGAGAAATA 
1751 CATTATCTTC ATGTGTTCCT 

18 01 CACCCATCAC ATTAAATACA 
1851 GTGGTGGCTT ACGCCTGTAA 
1901 TCACTTGAGT TCAAGACCAG 
1951 CTAAAAATAG AAACCTTAGC 
2001 GTTACTTGGG TGCCTGAGGC 
2051 ATTGCAGTGA GCTGAGATAA 
2101 TGAGACTTCA TCTCAAAAAA 
2151 TTTGCAGTAT GTTGTAGTTA 
2201 AAAAAAAAAA 



CTAGGTCACA AAGGACAGCC CTGCAAAACA 
AGCCAGTTCC TGGAACCTGA ATAATGACTC 
GATTTGTCTC CAGGGGGTAC CCAGGAGATG 
GGTGGAGGAG GAAGATGAAG AAGACCATTT 
TAGAGTCATC GCCACAAGTT CTCAGTCGCT 
GCCTTATTGT CATCGTATTT AGTTGCATAT 
GGCTCACACA GCGGCTGAAA AAATTATCCT 
TACGGACAAT TTTTGATGAC AAATCAGCTG 
CTTAGTGATA ATACAATATC TCGTCGAATC 
GGAAGCAATG CTTATTACAC GGCTGCAGTC 
AACTCGATGA GAGCACTGAT ATTGCAAGTT 
GTCAGATATG TGTGGCAAGA TGATTTTGTA 
AAATTTAAAT TCACATATAA CTGGATTAGA 
ACTGCCTTCT TGGTCAGTAT AAATTAAACT 
TCAAGTGATG GAACAGCAAA TATGACCGGA 
AAAATTGTTA GAAGCAACCC ACAACAATGC 
TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 
GTATTGAAAA ATGCAGTGAA AACTGTTAAT 
GAATAGCCGA CTTCTCGAAA TATTTTGTTC 
CCCACTTATT GTTTCATACA GAAGTTCGTT 
TTGAGCAGAG TATATGAACT CAGGAACGAG 
AAAGCAATCT CATTTGGCAA ATATTTTTGA 
AATTGGCATA TTTAAGTGAT ATTTTTGGCA 
AAAATGCAGG GGAAAAACAA TGATATATTT 
AGGATTCCAA AAGACGTTAT TATTGTGGCA 
GCCCTAGCTA CTATATGTTT CCAACATTAT 
ATTATTAATG AAGACTGCTT AAAAGAAATA 
TCTCACTTCT TTGTCTCAAA CTTTTAATTA 
TTGAATCATT AAAGGAAAAT ATTT GGATGA 
AACCCAGAAT CAATAATTGA GTTAAACTTG 
ATTATTGCAG CTCAGTTCAT CATTCACACT 
TAAGTTTATC AGCATTTTGG ATTAAGATTA 
AGTAGGAAGA GTATATTGCT GTTACTACCA 
TGAACTAGGA TTTTCAATCT TGACACGGTT 
GGCTCAATAG TGCACCAGAT ATGCGGGTAG 
GACTGGAAGG AACTTATGAA CAGACAAGCA 
AACTTTACAA AATTCTGTGT ATAGCCAGGT 
TCCCAGCAGT GGGAGACCGA GGTGGGCAGA 
CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 
CAGGCGTGGT GGCACATGCC TGCAGTCCCA 
AGGAGAATCT CTTAAACCAG GAAGGCAGAG 
TCCCACTGCA TTCCAGCCTG GGCAACAGCG 
AAAAAATTGT ATTTGTACTT TTAAAGGGAT 
AACGTTAATA AAATTATATT TGTAATTAGG 



BLAST Results 



Entry AC005020 from database EMBL: 

Homo sapiens clone GS259H13; HTGS phase 1, 4 unordered pieces. 
Score = 9110, P = 0.0e+00, identities = 1822/1822 
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Medline entries 

No Medline entry 

Peptide information for frame 1 



ORF from 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 



1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDHFQKER NKVESSPQVL 

51 SRSTTMNERA LLSS YLVAYR VAKEKMAHTA AEKIILPACM DMVRTI FDDK 

101 SADKLRTIPL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 

151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 

201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 

251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 

301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANI FEDDIW VTKLAYLSDI 

351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 

401 TLLQHIEENI INEDCLKEIK LEI LLHLTSL SQTFNYYFPE EKFESLKENI 

451 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 

501 KIKDDFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 

551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877_3 from database TREMBLNEW: 

gene: "WUGSC :H_DJ0751H13 . 2"; product: "KIAA0543 protein"; Homo sapiens 

PAC clone DJ0751H13 from 7q35-qter, complete sequence. 

Score =86, P = 4.4e-03, identities = 46/179, positives = 78/179 

Entry MD36211_1 from database TREMBL: 

product: "Hermes transposase"; Musca domestica Hermes transposase 

gene, complete, cds . 

•Score - 105, P - 3.0e-02, identities = 101/465, positives = 202/465 



Alert BLASTP hits for DKFZphmcf l_lgl3, frame 1 

TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N = 1, Score = 300, P 
= l.le-23 

>TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length = 607 

HSPs : 

Score = 300 (45.0 bits), Expect = l.le-23, P = l.le-23 
Identities = 120/485 (24%), Positives = 229/485 (47%) 

CMD-MVRTI FDDKSADKLRTI PLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDE 
CM+ ++R + + L+ + LS + +RI +1 ++L L R + +++ LD+ 



+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 



KGISSDGTANMTGKHSRLTEKLLEATHNNAVWN — HC — FIHREALVSKEISPSLMDVL 261 

G+++ T M G++S L + E + WN H F+H E L S ++ + ++ 
/GLTTTHTLRMIGENSGLVS YMREKAVSPNCWNVIHYSGFLHLELLSS YDVDVN — QII 298 



IK + + +E H + + WL +GK L ++ LR E+ 



FLV + + F D W+ +L DI L ELS +++ +HI F+ 



Query: 


89 


Sbjct: 


124 


Query: 


148 


Sbjct: 


183 


Query: 


206 


Sbjct: 


241 


Query: 


262 


Sbjct: 


299 


Query: 


321 


Sbjct: 


359 
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Query: 381 TLLLWQARLKSNRPSYYMFPTLLQHIEE NIINEDCLKEIKLEILLHLTSLSQTFNY 436 

L L+Q ++ + FP L + ++E N +E + ++++ L + F 

Sbjct: 418 KLNLFQRHIEEKNLTD — FPALREVVDELKQQNKEDEKIFDPDRYQMVI — CRLQKEFER 473 

Query: 437 YFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILSL 495 

+F + +F +K+++ + +PF F+ + I + +E L +L ++ L N Y+I L 

Sbjct: 474 HFKDLRF--IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKDL 525 

Query: 496 SAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDMR 551 

F+ + + +P++ + + F + +CE FS LTR + L R 

Sbjct: 52 6 GQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALFR 58 5 

Query: 552 VALSSCVPDWKELMNRQAHPSH 573 

VA + P W +L+ R+ + S + 
SbjCt: 586 VATTEMEPGWDDLV-RERNESN 606 

Score = 290 (43.5 bits), Expect = 1.5e-22, P = 1.5e-22 
Identities = 120/485 (24%), Positives = 228/485 (47%) 

Query: 89 CMD-MVRTIFDDKSADKLRTI PLSDNTISRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 

CM+ ++R + + L+ + LS + +RI +1 ++L L R + ++ + LD+ 

SbjCt: 124 CMEVLLREVLPEH-VSVLQGVDLSPDITRQRILSIDRNLRNQLFNRARDFKAYSLALDDQ 182 

Query: 148 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 
SbjCt: 183 AFVAYENYLLVFIRGVGPELEVQEDLLTIINLTHHFSVGALMSAILES — LQTAGLSLQR 240 

Query: 206 CKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNHCFIHREALVSKEISPSLMDV-LKNA 264 

G+++ T M G++S L + E + WN IH + E+ S DV + 
Sbjct: 241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWN-- VTHYSGFLHLELLSSY-DVDVNQI 297 

Query: 265 VKTVN FIKGSSLNSRLLEI FCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 319 

+ T + + IK + + +E H + + WL 4-GK L ++ LR E 

Sbjct: 298 INTISEWIVLIKTRGVRRPEFQTLLTESESEHGERVNGRCLNNWLRRGKTLKLIFSLRKE 357 

Query: 320 I YIFLVEKQSHLANIFEDDIWVTKLAYLSDIFGILNELSLKMQGKNNDIFQYLEHILGFQ 379 

+ FLV + + F D W + +L DI L ELS +++ +HI F+ 

Sbjct: 358 MEAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFE 416 

Query: 380 KTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIKL EILLHLTSLSQTFN 435 

L L+Q ++ + FP L + + + E + + ++K+ ++L + F 

Sbjct: 417 VKLNLFQRHIEEKNLTD — FPALREVVDE — LKQQNKEDEKIFDPDRYQMVICRLQKEFE 472 

Query: 436 YYFPEEKFESLKENIWM-KDPFAFQNPESIIELNLEPEEENELLQLSSSFTLKNYYKILS 494 

+F + +F +K+++ + + PF F+ + I + +E L +L ++ L N Y+I 

SbjCt: 473 RHFKDLRF — IKKDLELFSNPFNFKPEYAPISVRVE LTKLQANTNLWNEYRIKD 524 

Query: 495 LSAFWIKIK-DDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKKRNRLNSA PDM 550 

L F+ + + +P++ + + F + +CE FS LTR + L 
Sbjct: 525 LGQFYAGLSAESYPIIKGVACKVASLFDSNQICEKAFSYLTRNQHTLSQPLTDEHLQALF 584 

Query: 551 RVALSSCVPDWKELMNRQAHPSH 573 

RVA + P W +L+ R+ + S+ 
Sbjct: 585 R VATTEMEPGWDDLV-RERNESN 60 6 

Pedant information for DKFZphmcf l_lgl3, frame 1 



Report for DKFZphmcf l_lgl3 . 1 

[LENGTH] 573 

[MW] 66276.85 

[pi] 5.82 

[HOMOL] TREMBL : AB018 309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo sapiens 

mRNA for KIAA0766 protein, complete cds. le-18 
[PROSITE] MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 10 

[ PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 9 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 8.90 % 

SEQ MTPESRDTTDLSPGGTQEMEGIVIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 
SEG xxxxxxx 



PRD ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 
SEQ LLSSYLVAYRVAKEKMAHTAAEKI ILPACMDMVRTIFDDKSADKLRTI PLSDNTISRRIC 
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SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhcccccceeeeecccchhhhhhh 

SEQ TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRYVWQDDFVEDLLCCLNLNS 

SEG 

PRD hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 

SEQ HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 

SEG 

PRD eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 

SEQ CFIHREALVSKEISPSLMDVLKNAVKTVNFIKGSSLNSRLLEIFCSEIGVNHTHLLFHTE 

SEG 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhhh 

SEQ VRWLSQGKVLSRVYELRNEI Yl FLVEKQSHLANIFEDDIWVTKLAYLSDI FGILNELSLK 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhhhhhchhhhhcccceeehhhhhhhhhhhhhhhhhhh 

SEQ MQGKNNDIFQYLEHILGFQKTLLLWQARLKSNRPSYYMFPTLLQHIEENIINEDCLKEIK 

SEG xxxxx 

PRD hhccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhcchhhhhh 

SEQ LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

SEG xxxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SEQ SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKK 

SEG xxx xxxxxxxxxxx 

PRD hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

SEQ RNRLNSAPDMRVALSSCVPDWKELMNRQAHPSH 

SEG 

PRD hcccccccccceeeccccccchhhhhhhccccc 



Prosite for DKFZphmcf l_lgl3 . 1 



PS00001 


216- 


>220 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


291- 


•>295 


asn" 


"glycosylation 


PDOC00001 


PS00005 


116- 


>119 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


218- 


>221 


PKC" 


PHOSPHO 


"site 


PDOC00005 


PS00005 


225- 


>228 


PKC" 


"PHOSPHO" 


"sits 


PDOC00005 


PS00005 


358- 


>361 


PKC~ 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


391- 


>394 


PKC" 


"PHOSPHO" 


"sits 


PDOC00005 


PS00005 


445- 


>448 


PKC~ 


PHOSPHO" 


"site 


PDOC00005 


PS00005 


485- 


>488 


PKC" 


"PHOSPHO" 


"sits 


PDOC00005 


PS00005 


510- 


>513 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


538- 


>541 


PKC" 


"PHOSPHO" 


"sits 


PDOC0C005 


PS00006 


55 


->59 


CK2" 


"PHOSPHO" 


"sits 


PDOC0C006 


PS00006 


IS 


»->83 


CK2" 


"PHOSPHO~ 


"sits 


PDOC00006 


PS00006 


95 


->99 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


136- 


>140 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


183- 


>187 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


189- 


>193 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


256- 


>260 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


445- 


>449 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


463- 


>4 67 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


546- 


>550 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00007 


364- 


>372 


TYR~ 


"PHOSPHO 


"site 


PDOC00007 


PS00008 


137- 


>143 


MYRISTYL 




PDOC00008 


PS00008 


273- 


>279 


MYRISTYL 




PDOC0C008 


PS00008 


289- 


>295 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphmcf l_lgl3. 1) 
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DKFZphtes3_14g5 



group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . It 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide), but not the zinc finger motif and and nuclear localization signals of 
lyar . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 1503 bp 

Poly A stretch at pos . 1467, polyadenylation signal at pos. 1440 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CCCAGAGGTC 
CTTCCATTGG 
AAAACCTGTC 
TTTTTACATG 
AAGCATGTGT 
CGGTAAAGAT 
GTGAAGATCA 
GGCGACATCA 
GAGACCCAAT 
CTTTTGACAA 
AACAGTTTAA 
CTTTTCTGAA 
GGCCACTCCA 
CCAGCCTCCA 
GAATAAAAGA 
AGAAAGAACT 
CCTAAGAAGC 
GGAAGTCCCT 
AGCAGCGCAA 
AAGAGGAAGC 
GATGAAGCTC 
CTCCTGCAAA 
AAACAGGCCC 
TTTAGCTCAG 
AACTCCTGGT 
TTATTAAAGG 
AATTGAATCC 
TGTAATGAAT 
AGT TAAG AAA 
ATTCTGGTCC 
AAA 



CGACCTGGGA 
AGTGACTGAA 
TTGGATAGAG 
CAATGCATGT 
CTGTTTGCAG 
TTCTGGGGCG 
GAAGTATGGT 
AACAGCAGGC 
GTCAGCCCCA 
CGTTCCCAGG 
AAGTTCATAA 
GCTTCCAACA 
CCCAGTGGCA 
AAGTGAAAGA 
GAAAGAAAGG 
AAAGTTAGAA 
GCAAAAAGGG 
GAGGCCAATG 
GGACAGCGCC 
GGAGGCACTC 
CCAGAGCATC 
AGGTAAATTC 
CAGACAATGA 
TACTACACAG 
CATCTTTAAC 
ACAAAGTCAA 
ATTCTGCTGA 
TCTAACAACT 
ATATATTTTT 
AAACTTCAAA 



GGCTGGGGCT 
TTTCTACATG 
AATATTTAGC 
GGTGAATCAG 
AAACTGTGAA 
ATGACTATAA 
GGCAAAGGCT 
GTGGATTCAG 
AAGTGAGAGA 
AAAAAGGCAA 
TGAATCCATT 
GCGAACCAGT 
AATCCACATG 
CGCCGTGGAA 
AAGAACGGCA 
AACCACCAGG 
ACAGGAGGCT 
GCTCTGCAGG 
AGTGAGGAAG 
GGAAGTTGAA 
CTGAGGGCGG 
AACTGGAAGG 
AATAACCATC 
TGACAGATGA 
AAGAAAATCA 
GCTTGTGAAA 
CTTCTTCCTT 
CAAATTTTGC 
GGTATAACTT 
AAAAAAAAAA 



CAGAGAGCAA 
ACGGCTTTTT 
CATTTACCTA 
TGAAGAAAAT 
TGCCTTTCTT 
AAACCACGTG 
ATGAAGGTAA 
AAAATTAGTG 
ACTTTTAGAG 
AATTTCAGAA 
CTGGACCAGG 
CAATAAGGAA 
CAGAAATCTC 
CAGCAAGGGG 
GAAGAAAAGG 
AAAACTCAAG 
GACCTTGAGG 
GAAGAGGAGC 
AGGCACGCGT 
ACAGATTCTA 
AGAACCAGAA 
GAACTATTAA 
AAAAAGCTAA 
GCATCACAGA 
GCAAGAACCC 
TGAACATTTG 
TCACTGCTGT 
TTTTTGAAGC 
TTATGAGAAA 
AAAAAAAAAA 



TGTTTGCTGT 
GACAAGACTT 
AAAATGGTAT 
ACAAGTGGAA 
GCATTGACTG 
AAATGCATAA 
AACCCACAAA 
AATTAATAAA 
CAAATTAGTG 
TTGGATGAAG 
TGTGGAATAT 
CAGGATCAAC 
CACCAAGGTT 
AGGTGAAGAA 
AAAAGAGAAA 
GAATCAGAAG 
CTGGTGGGGA 
AAGAAGAAGA 
GGGCGCAGGG 
AGAAGAAAAA 
GACGATGAGG 
AGCAATTCTG 
GGAAAAAGGT 
TCCGAAGAGG 
TACCTTTAAG 
TGTATTTAAA 
TTATAAAATG 
TGTATTTTTA 
AATAAAATAT 
AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



93259460: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 
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Peptide information for frame 3 



ORF from 144 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP_GTP_A (60-68) 



1 MVFFTCNACG ESVKKIQVEK 

51 CISEDQKYGG KGYEGKTHKG 

101 I SAFDNVPRK KAKFQNWMKN 

151 DQRPLHPVAN PHAEISTKVP 

201 REKKELKLEN HQENSRNQKP 

251 KKKQRKDSAS EEEARVGAGK 

301 DEAPAKGKFN WKGTIKAILK 

351 EEELLVIFNK KISKNPTFKL 



HVSVCRNCEC LSCIDCGKDF WGDDYKNHVK 
DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 
SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 
ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 
KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 
RKRRHSEVET DSKKKKMKLP EHPEGGEPED 
QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 
LKDKVKLVK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_14gS, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N = 
1, Score = 1410, P = 2.7e-144 

SWISSPROT: YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10.8 IN 
CHROMOSOME III., N = 1, Score = 381, P = 2.9e-35 

TREMBL:AC003058_18 gene: "F27F23.18"; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N = 3, Score = 139, P = 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 164, P = 1.4e-ll 



>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length = 388 



HSPs : 



Score = 1410 (211.6 bits), Expect = 2.7e-144, P = 2.7e-144 
Identities = 275/388 (70%), Positives = 317/388 (81%) 



Query: 1 MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 60 

MVFFTCNACGESVKKIQVEK VS CRNCECLSCIDCGKDFWGDDYK+HVKCISE QKYGG 
Sbjct: 1 MVFFTCNACGESVKKIQVEKQVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGG 60 

Query: 61 KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 120 

KGYE KTHKGD KQQAWIQKI+ELIK+PNVSPKVRELL+QISAFDNVP K KAKFQNWMKN 
Sbjct: 61 KGYEAKTHKGDAKQQAWIQKINELIKKPNVSPKVRELLQQISAFDNVPIKKAKFQNWMKN 120 

Query: 121 SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEIS-TKVPASKVKDAVE 179 

SLKVH++S+L+QVW+IFSEAS+SE ++Q Q P H A PHAE+ TKVP++K E 
Sbjct: 121 SLKVHSDSVLEQVWDIFSEASSSE QDQQQPPSH-TAKPHAEMPITKVPSAKTNGTTE 176 

Query: 180 QQGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVP 239 

+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE+ 
Sbjct: 177 EQTEAKKNKRERKEERQKNRKKEKKELKLENHQENLRGQKPKKRKKNQEAGHEAAGEDGA 236 

Query: 240 EANG SAGKRSKKKKQRKDSASEEEA RVGAGKRKR-RHSEVETDSKKKKM 287 

+ +G G+ S++ R E+ A + AGKRKR +HS E+ KKKKM 

Sbjct: 237 DGSGPPEKKKAQGGQASEEGADRNGGPGEDRAEGQTKTAAGKRKRPKHSGAESGYKKKKM 296 

Query: 288 KLPEHPEGGEPEDDEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEH 347 

KLPE PE GE +D EAP+KGKFNWKGTIKA+LKQAPDNEI++KKL+KKV+AQY+ V ++ 
Sbjct: 297 KLPEQPEEGEAKDHEAPSKGKFNWKGTIKAVLKQAPDNEISVKKLKKKVIAQYHAVMNDT 356 

Query: 348 HRSEEELLVIFNKKISKNPTFKLLKDKVKLVK 379 

EEELL IFN+KIS+NPTFK+LKD+VKL+K 
Sbjct: 357 SHHEEELLAI FNRKI SRNPTFKVLKDRVKLLK 388 
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Report for DKFZphtes3_14g5 . 3 

[LENGTH] 379 

[MW] 43634.03 

[pi] 9.59 

[HOMOL] PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YCR087c-a] 2e-ll 

[BLOCKS] BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 
[PROSITE] ATP_GTP_A 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 18. 73 % 



SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 

SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

SEQ KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESILDQVWNIFSEASNSEPVNKEQDQRPLHPVANPHAEISTKVPASKVKDAVEQ 

SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 

SEG . . xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITIKKLRKKVLAQYYTVTDEHHRSEEELLVIFNK 

SEG XXXXX 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 



Prosite for DKFZphtes3_14g5.3 
PS00017 60->68 ATP GTP A PDOCC0017 



(No Pfam data available for DKFZphtes3_14g5.3) 
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DKFZphtes3_14h21 



group: nucleic acid management 

DKFZphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2200 bp 

Poly A stretch at pos . 2166, polyadenylation signal at pos . 2140 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
"751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



CAACGACGTC 
GCTCCCAAGG 
GTCCCGAGCG 
CTGAGGGATA 
CCCCCGGAGG 
TTTGAAGAGC 
TAAAGAATAT 
CAACCAGAAT 
AGCAAAAGCA 
ATTCAGAATG 
GGAAGCACAG 
GGATCAAATT 
ATTTACCACC 
GCCATGTCAA 
AACGTGGGAT 
CCTGCACATT 
ATTAAAAAGG 
GCCCATTGTG 
CAGGAAAGAC 
CAACCCAGCC 
TCCCACTCGG 
CAT AT AAA GG 
GAACAAATAG 
CGGAAGATTG 
TAACCTACTT 
GAACCCCAGA 
AGTTATGACC 
CTTATTTGAA 
GCTGTAAGTT 
ATGGAGTCAC 
TCATTGTCTT 
CTAATACTTG 
GAGAGATCGG 
TACTAATTGC 
ACACATGTCT 
CCGAATAGGG 
CTTTGACTAG 
GAAAGAGCAA 
GTTTGAGGCA 
CTCAAGGAAG 
AGAGAATTCA 
GTTGGCAGTA 
GTGTTTGAAA 
GAAGTATTTA 



GGACGCGCCC 
CCTCTACGTG 
CCAGAGAGGA 
TAGTGTCGGC 
CCGTGGCCGC 
CACTTTGTTG 
ACAAAGTACA 
CATTAGTCAA 
GTGATAGACA 
CGGAATTGAT 
ATAACAATGT 
AGAGAGGAAG 
A AT T AAG AAA 
AAGTAGAAGC 
GACTTGAAGG 
TGATGACGCC 
CAGGTTTTCA 
TTGCAAGGAA 
ATTGTGTTAT 
TTAAAGGTCA 
GAATTAGCAC 
GCTTCGGAGT 
AAGAGCTTAA 
AATGATCTGC 
GGTTTTAGAT 
TAATGAAGAT 
AGTGCTACAT 
AGAACCAATG 
CAGTGAAGCA 
ATGCAAACTT 
CGTTTCTCGA 
GAAATATATC 
GAGAAAGCAT 
AACTGATCTA 
ATAATTTTGA 
CGCACGGGAA 
AAATGATTGG 
ATCAGAGTAT 
CATCAACGGA 
GCCCAAGAAG 
AGATTTTTTA 
TGAAGAGACC 
ATATAGAATC 
AACTTGAAAA 



CTTCTTGGAA 
GGTCGTTGCT 
GGCCGGCGGA 
AGAGGTGGTC 
TGGTCACGAG 
GCGCGGTAAT 
ACAAACACCA 
AATTTTTGGC 
ATTTTGTTAA 
ACTGCATTCC 
TGTTGCAGGA 
GTTTGAAATG 
AACTTTTATA 
AGATAGTTGG 
ATGGGGAGAA 
TTTCAATGTT 
AAAGCCAACA 
TAGATCTTAT 
TTAATGCCTG 
AAGGAATAGA 
TTCAAGTAGA 
GTTTGTGTAT 
AAAAGGTGTA 
AAATGAGTAA 
GAAGCAGACA 
TTTGTTAGAT 
GGCCTCATTC 
ATTGTCTATG 
AAATATAATT 
TTCTACAGAG 
AAAGCTGTTG 
AGTAGAGTCT 
TAGAGAACTT 
GCCTCTAGAG 
CTTTCCACGG 
GAGCAGGGAG 
AGGGTTGCCT 
TCCAGAGGAG 
AAAGGGAAAT 
TTTCATTAAT 
GAAATATAGT 
GGACTGATTT 
CAGTGTTTTA 
AAAAAAAAAA 



CAATGTCCCA 
AGTCGGCGAA 
GGAGTTGAAT 
GCTGGAGAGG 
GAACTGCCGC 
CGGTCGTGGT 
CAATCCAAAT 
AGCAAGGCAA 
AAAGCTAGAA 
AACCTTCTGT 
GATCGGCCAT 
GCAAAAAACA 
AAGAGTCCAC 
AGGAAAGAAA 
ACGACCTATC 
ATCCTGAGGT 
CCTATTCAGT 
AGGAGTAGCC 
GATTTATTCA 
CCCGGCATGT 
AGGAGAATGT 
ATGGTGGTGG 
GATATCATAA 
CTTCGTCAAT 
AGATGTTGGA 
GTGCGCCCAG 
AGTTCATCGC 
TTGGTACATT 
GTAACCACCG 
TATGTCATCC 
CGGATCACTT 
CTGCATGGAG 
TAAAACAGGC 
GACTTGATGT 
AATATTGAAG 
GACTGGTGTT 
CTGAATTGAT 
CTTGTATCAA 
GGAAAGAAAA 
GTCTTCTGTA 
AAGACAGAAG 
GACTGATTCT 
TACTTTCTTT 
AAAAAAAAAA 



CCACGGAGGA 
GCTCGACAGT 
CGAACAGGTC 
CACCTCTAGG 
TGTGTTTTGC 
GGGTCAAAAA 
AATACAAGAA 
TGCAAACGAA 
GAAAATTACA 
TGGAAAAGAT 
TGATAGATTG 
AAGTGGGCAG 
TGCCACAAGT 
ATTTTAATAT 
CCCAATCCTA 
TATGGAAAAC 
CACAGGCATG 
CAGACTGGAA 
TCTGGTCCTT 
TAGTTCTAAC 
TGCAAATATT 
AAATAGAGAT 
TTGCAACTCC 
CTGAAGAATA 
CATGGGATTT 
ATAGGCAGAC 
CTCGCACAAT 
GGATCTAGTT 
AGGAAGAGAA 
ACAGACAAAG 
ATCAAGTGAC 
ATAGAGAACA 
AAAGTGAGAA 
CCATGACGTT 
AATACGTACA 
TCCATTACAA 
TAATATTCTG 
TGGCTGAGAG 
AT GG AA AG AC 
CTAGTGGGGT 
TATTGGACAT 
TAAAATAATA 
AATAAAAATA 
AAAAAAAAAA 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (286-294) 
DEAD_AT P_HEL I CASE (394-403) 



1 MSHHGGAPKA 
51 WRGTSRPPEA 
101 IQIIQEQPES 
151 PSVGKDGSTD 
201 ESTATSAMSK 
251 PEVMENIKKA 
301 FIHLVLQPSL 
351 GGGNRDEQIE 
4 01 MLDMGFEPQI 
451 GTLDLVAVSS 
501 DHLSSDLILG 
551 LDVHDVTHVY 
601 ELINILERAN 



STWVVASRRS 
VAAGHEELPL 
LVKIFGSKAM 
NNVVAGDRPL 
VEADSWRKEN 
GFQKPTPIOS 
KGQRNRPGML 
ELKKGVDI I I 
MKILLDVRPD 
VKQNIIVTTE 
NISVESLHGD 
NFDFPRNIEE 
QSIPEELVSM 



STVSRAPERR 
CFALKSHFVG 
QTKAKAVIDN 
IDWDQIREEG 
FNITWDDLKD 
QAWPIVLOGI 
VLTPTRELAL 
ATPGRLNDLQ 
RQTVMTSATW 
EEKWSHMQTF 
REQRDREKAL 
YVHRIGRTGR 
AERFEAHQRK 



PAEELNRTGP 
AVIGRGGSKI 
FVKKLEENYN 
LKWQKTKWAD 
GEKRPIPNPT 
DLIGVAQTGT 
QVEGECCKYS 
MSNFVNLKNI 
PHSVHRLAQS 
LQSMSSTDKV 
ENFKTGKVRI 
AGRTGVSITT 
REKERKHERP 



EGYSVGRGGR 
KNIQSTTNTT 
SECGIDTAFQ 
LPPIKKNFYK 
CTFDDAFQCY 
GKTLCYLMPG 
YKGLRSVCVY 
TYLVLDEADK 
YLKEPMI VYV 
IVFVSRKAVA 
LIATDLASRG 
LTRNDWRVAS 
QGRPKKFH 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14h21 , frame 3 

TREMBL : CEY54G1 1A_9 gene: "Y54G11A. 3"; Caenorhabditis elegans cosmid 
Y54G11A, N = 1 , Score = 1008, P = l.le-101 

TREMBL :SPBP8B7_1 6 gene: "dbp2"; "SPBP8B7 . 16c"; product: "p68-like 
protein."; S.pombe chromosome II pi p8B7., n = 1, score = 971, p = 
9.1e-98 

PIR:S13757 RNA helicase DBP2 - yeast ( Saccharomyces cerevisiae), N = 1, 
Score = 970, P = 1.2e-97 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N = 1, Score = 961, P = le-96 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score = 888, P - 7.8e-91 



>TREMBL:CEY54G11A_9 gene: "Y54G11A.3"; Caenorhabditis elegans cosmid 
Y54G11A 

Length = 504 



HSPs: 



Score = 1008 (151.2 bits), Expect = l.le-101, P = l.le-101 
Identities = 211/473 (44%), Positives - 298/473 (63%) 



Query : 


174 


Sbjct: 


23 


Query: 


234 


Sbjct: 


76 


Query: 


294 


Sbjct: 


136 


Query: 


349 



PI ++ YK 



+S 



IP P +F+ AF 



+M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 



LCYLMPG FIHLVLQPSL KGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVC 34 8 

L +L+P +H+ Q + + Q+ P +LVL+PTRELA Q+EGE KYSY G +SVC 

LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 

34 9 VYGGGNRDEQIEELKKGVDI IIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 
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+YGGG+R EQ+E + GV+I+IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYVVLDEADRMLDMGFEV 255 

Query: 409 QIMKILLDVRPDRQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNIIVT 468 

I +IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: 256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 469 TEEEKW SHMQTFLQSMSSTD-KVIVFVSRKAVADHLSSDLILGNISVESLHGDREQR 524 

+ ++ + FL + + K+I+FV K +ADHLSSD + 1+ + LHG R Q 

SbjCt: 316 PHDSRFLRVCEI VNFLTAAHGQNYKMI I FVKSKVMADHLSSDFCMKGINSQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRI LI ATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRT 584 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
Sbjct: 376 DREMSLNMLRSGEVQILVATDLASRGIDVPDITHVLNYDFPMDIEEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 644 

G +++ L ND LI ILE++ Q +P++L AE++ K + R RP R 

Sbjct: 436 GEAMSFLWWNDRSNFEGLIQILEKSEQEVPDQLRRDAEKYRL KCQSGRDGPRPSFRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 



Pedant information for DKFZphtes3_14h21, frame 3 



Report for DKFZphtes3_14h21 . 3 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 
101 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YOR204W] 2e-70 
[FUNCAT] 
[ FUNCAT] 
influenzae, 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 



648 

72873 . 51 
8.84 

TREMBL : CEY54G1 1A_9 gene: "Y54G11A. 3" ; Caenorhabditis elegans cosmid Y54G11A le- 

04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-97 

30.10 nuclear organization [S. cerevisiae, YNL112w] 2e-97 
04.05.03 mrna processing (splicing) IS. cerevisiae, YPL119c] 4e-72 

30.03 organization of cytoplasm |S. cerevisiae, YOR204w) 2e-70 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

06.10 assembly of protein complexes [S. cerevisiae, YBR237w] le-61 
1 genome replication, transcription, recombination and repair [H. 
HI0892] 2e-49 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] le-48 



04.99 other transcription activities 
04.05.01.07 chromatin modification 
09.01 biogenesis of cell wall 

98 classification not yet clear-cut 
30.16 mitochondrial organization 

99 unclassified proteins 
11.10 cell death [S. 



[S. cerevisiae, YDL160c) 9e-45 

[S. cerevisiae, YMR290c) 3e-44 

[S. cerevisiae, YJL033w] 2e-36 

[S. cerevisiae, YOR046c] 7e-32 

[S. cerevisiae, YDR194C] 2e-28 
[S. cerevisiae, YGL0S4c] 5e-10 
cerevisiae, YMR190c] 2e-08 



03.19 recombination and dna repair [S. cerevisiae, YMR190c] 2e-08 

r general function prediction (M. jannaschii, MJ1401] le-07 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

nucleus 4e-96 

RNA binding 3e-87 

DEAD box 5e-50 

transmembrane protein 4e-27 

DNA binding 3e-67 

recF recombination pathway 3e-10 

ATP Ae-96 

purine nucleotide binding 5e-50 

P-loop 4e-96 

hydrolase 9e-45 

protein biosynthesis 5e-50 

ATP binding le-61 

WW repeat homology 8e-88 

DEAD/H box helicase homology 4e-96 

unassigned DEAD/H box helicases 7e-87 

ATP-dependent RNA helicase DBP1 4e-96 

ATP-dependent RNA helicase DHH1 2e-43 

recQ protein 3e-10 

Bloom's syndrome helicase 5e-07 

translation initiation factor elF-4A 5e-50 

recQ helicase homology 3e-10 

tobacco ATP-dependent RNA helicase DB10 8e-88 
DEAD ATP HELICASE 1 
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[PROSITE] ATP_GTP_A 1 

[PFAM] Helicases conserved C-terminal domain 

[PFAM] KH domain family of RNA binding proteins 

[PFAM] DEAD and DEAH box helicases 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 8.4 9 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG XXXXXXXXXXXXXXXXX 

PRD cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIOEOPESLVKIFGSKAM 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVIDNFVKKLEENYNSECGIDTAFQPSVGKDGSTDNNVVAGDRPLIDWDQIREEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPIPNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCYLMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDIIIATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNIIVTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVIVFVSRKAVADHLSSDLILGNISVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

SEQ LIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRTGVSITTLTRNDWRVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcccccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

Prosite for DKFZphtes3_14h21 . 3 

PS00017 286->294 ATP_GTP_A PDOC00017 

PS00039 394->403 DEAD ATP HELICASE PDOC00039 



Pfam for DKFZphtes3_14h21 . 3 
HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnIyeMGFEkPTPIQQqAIPiILeGRDVMACAQTGSGKTAAF 
P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 248 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM UPMLQHIDwdPWpqpPQd. . PrALILAPTRELAMQIQEEcRkFgkHMng 

L+P ++H+ +P +++ Q+ P +L+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-SLKGQRNRPGMLVLTPTRELALQVEGECCKYSYK-G- 343 

HMM IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDrleM 
+R++C+YGG N ++Q+++L++G+ +I+IATPGRL D+ +++ ++L++I++ 
Query 344 LRS VCVYGGGNRDEQI EELKKGV-DI I IATPGRLNDLQMSNFVNLKNIT Y 392 

HMM LVMDEADRMLDMGFIDQIRrlMrqIPMpwNRQTMMFSATMPdelqELARr 

LV+DEAD+MLDMGF++QI++I+ ++ ++RQT+M SAT+P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR — PDRQTVMTSATWPHSVHRLAQS 440 
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HMM 
Query 



FMRNPIRInld . MdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
++++P + ++ D +++ +KQ +1+ E++K + ++++ 
441 YLKEPMIVYVGTLDLVAVS-SVKQNIIVTT-EEEKWSHMQTFLQ 



482 



HMM_NAME 

HMM 

Query 

HMM 

Query 



KH domain family of RNA binding proteins 

*rIiIPedhMGMIIGKGGsNIRqIREEYgvrINIPdecCeDstdRIITIt 
+ + ++++G++IG+GGS I++I++ ++++I I++E+ + + + I 
71 C FALK SHFVGAVIGRGGSKI KN I QSTTNTT IQUQEQ-P ESLVKIF 115 



G* 
G 

116 G 



116 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Helicases conserved C-terminal domain 



497 



"EileeWLknl . . . .GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTD 
+ +++ L+ + +I+V ++HGD++Q++R++++++F++G+ ++LI+TD 

KAVADHLSSDLILGNISVSSLHGDREQRDREKALENFKTGKVRILIATD 545 



VggRGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
+++RG+D+ DV HV+N+D+P+N+E Y++RIGRTGR+G 
54 6 LASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAG 582 
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DKFZphtes3_14pl4 



group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 



complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 3969 bp 

Poly A stretch at pos. 3948, polyadenylation signal at pos . 3927 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



gaagcccagg 
attactcaac 
ggtgctgaag 
cgggcctgtt 
cacggattcc 
tgtcgatgac 
acatcagttt 
gatgagatgg 
cccgctgaac 
acaaggtatg 
gctgtctccc 
gaccctgaca 
cgttcctgct 
cccctgatgg 
acaatgatga 
tcccagctct 
gctgagggtg 
agtttgtgtg 
atttttgaat 
cagcgttcct 
acaaaccaga 
caaggggagt 
gaccattcaa 
tcctaagtca 
ggtggcagct 
tcggtactgt 
ttgacaggtt 
gtcacacagc 
aactgagcca 
tgctcagaac 
agtcactggc 
catttaactg 
ggcttttaaa 
tgacaatgac 
tgctgtccct 
tcatctgtca 
aagtgctaaa 
gggagaaatc 
aggccctcag 
gttcactggg' 
atgccccgta 
ggttttgaac 
tatccaatag 
ttaaatgtac 
caaagccacg 
agcacttcca 
gatggctctg 
ttgggaggcc 
cggctaacat 
aattagccgg 
ctgaggcggg 
aagatcgtac 
caaaacaaac 
tggttgtggc 
tcaataagac 



CTCTCCTTAG 
AGCTCTCCAG 
GCCAGGCAGA 
ACCGGCGGGT 
AAGGAATGGA 
CCAGACAAGG 
ACCCAATCTG 
GCAGGCACGA 
AATGGGGCAG 
GAAGCCCTGC 
CGCTCCCTGG 
CAGCCCCCAG 
GAGAATCTGG 
CTGGTTCAGG 
GAAGACCTGA 
ACCCCTTCTT 
TGATGGCGAT 
GGGATTAAAT 
GGCACGTTCT 
GGGCTGGTGG 
AAGCATCAGG 
GAGATGAGGT 
GGTTCACTGT 
AGTGGGTGAA 
GGCAGGGTTT 
CACAGTGGAG 
AGGATGCTGG 
TATCTGCATG 
CTGGCCACTC 
GCCCTTCCTC 
CACATCCAGC 
CATCAGAACC 
AGTCAGAGGC 
CTGGCACCAA 
TCCCACAGTC 
AGCAGGGTGG 
GCAGAAGCTG 
TGCTGCATGT 
GAGGAGGAGA 
GGCTCTCTCT 
ATTTGGATTT 
AGGGAGGCAA 
GTTGCCACCA 
AATTCAGTCA 
TGCTGGCTAG 
TCGCTGAGGA 
GTGGCCGGGC 
GAGGTGGGTG 
GGTGAAACCC 
GCGTGGTTGC 
AGAATGGCAT 
CACTGCACTC 
AAAAAAAGGA 
AGTGGAGCTG 
TCAGTGAAGA 



TTGACTGTGT 
AGTTGCACAT 
GCATTTGGCT 
CTTTGTTCTT 
ACGTTGGGCC 
ACAGCGGTGG 
CACTGCGAGT 
AGTGGGCCAC 
GCTGCCGCTT 
CTCAGCCCTT 
AAACTGGTTG 
CAAGCGAGGG 
ATGGGGGTCC 
AAGGAGCTAC 
GGATTTGCAG 
GGCCCCTACA 
AATAGTATCA 
GAGCTAATGC 
GTGTTCCAGG 
GGCTCCCATT 
GATGCTAAGT 
GGGCTTGAAA 
TGTTTTGTCC 
CGCAGTGCCC 
TGCTCAGCAC 
CCTCTCAACA 
GGCCCAGAGA 
TCCCACAACT 
CTGGCTTCTC 
CAGACCCTGA 
CCAAAGATAA 
AGTATGAAAA 
TCTCACTACA 
TGGGCAGTGT 
CCTGGCAGGC 
ACTTCTTACG 
CAAGGCTTTC 
TGTGGGTTAA 
TACCAGGAGG 
GCCCATGAGC 
TATGGTGGTT 
TGTAATCAGA 
GCCACATAAG 
CTCATTCTCA 
GGGCCACAGC 
AAGTTCTGCT 
GCGGTGGCTC 
GATCACGAGG 
TGCCTCTACT 
GGGCACCTGT 
GAACCCGGGA 
CAGTCTGGGC 
TGGGGCTGGG 
GGGAGATGTG 
ATCGGATGTG 



GTTAATCACC 
TACAGCTGGG 
GTAGGGAGGC 
AGACCTGGGG 
ATGCGTGTGA 
CAAGATCGAC 
TGGTTGGGCT 
ATCGACAACT 
CGAGGGGCAG 
TCTACCTGCT 
TGGAGGCACT 
TTCGTGTCCA 
AGGCTCCCTG 
TCTTCTCTCC 
CCCCCAGCCC 
AGTCACTTGA 
CGATACCACC 
AGATTCATTC 
GTCGGTGATA 
CTGGTAGAGG 
GCAGTGATGA 
GTACCTTGTC 
TCAGAACCAG 
TTGGGAGGGC 
GTGCCGGCCT 
ACGCTGTGAG 
GGTTAAGTGT 
CCCCTTCCCA 
CTTGTCCCTC 
CACCTGAGCT 
ATTTTGTTTG 
GACCAGGAAT 
CTGGGTCCGT 
TCCCCTTTAG 
GGCTGGAAGG 
TGACAGTTCA 
TTAAGGTTTC 
AGGGAGTCTC 
CAGGGATGCT 
TGCCACACAG 
GTGATGGAAA 
TTTATGCCTT 
GCTATTTAAA 
TCAACCACAT 
GTTAGACAGT 
GGACCGCACA 
AAACCTGTAA 
TCAGGAGATC 
AAAAATACAA 
AGTCCCAGCT 
GGTGGAGCTT 
GACAGAGTGA 
CTGGAGAGGG 
GTCGGATTAG 
GGGGTAAGGG 



CAGCAATTTC 
GTAGAAATTG 
CGATCCTCCT 
TTCTTGGCCT 
ACGAGCTCTA 
GTCAGTCTGA 
TGACATTCAG 
CCATGAAGAT 
TTCAGCATCA 
CCCCTTTCCT 
CACTCGACCT 
GCTGCCTGGC 
GGGTTTTAAG 
AGTGAGGGGG 
TGGGTTCAAG 
CCCATCTTAG 
CACTTCACAA 
ATTCAGAAAA 
GGCTCTGGGG 
GAGACAGTCT 
GGAATAAAGC 
CGCTCAGAAG 
GAGCTTCAGA 
CGAGGCACCC 
TCCTCGAAGC 
GCAGCACCAT 
CTTGCCCGAG 
GCCCCAGCCA 
CTGCAGCCTC 
GGGGTTGCAA 
TCCAGTATAG 
CCAGATTTCT 
GTTCCCGCTA 
AGAGGGTGTG 
CCAGGCCTGG 
GGGCTCCCTT 
GAGTGTTGCT 
TCACCAGCCC 
GGGGGTCGTG 
CACCTTTGCC 
GCCATTTGAG 
AGAACTGGAC 
TTAATTCAAA 
TTCAAGTGCT 
GCAGAGAGAA 
CCCTTAGAAG 
TCCCAGCACT 
GAGACCATCC 
AAAAAAACAA 
ACTCAGGAGG 
GCAGTGAGCC 
GACTCCATCT 
TGGCAGGCAG 
GGAGGTAGAA 
CACATGTGGA 



570 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 



AGCAAAGAAA 
CTAGACATGG 
CTGGGCACTG 
TCTACAGGGA 
AGGGTTCTGG 
TGTTGGAAGT 
AGCAGCATCA 
CCCCCTAAGA 
CTGATGCCAC 
GAGAGGCTGT 
GAAGAGCGGA 
TGACATCTTT 
GCGATGAATG 
TCCTCAACAA 
GATAATTTCC 
ATTGGCCAGG 
CAAAGTGGGT 
CAAAACCCTG 
GTAGTCACAG 
GGAGGTAGAG 
GGTGACAGAG 
TATTAGACCC 
TATTCAAATG 
TACTTTGTAA 
AAAAAAAAAA 



CCTTTGACGT 
AAGCTTAGAA 
GTCATTCCAC 
GAGGTGAAAT 
ACCACAGATG 
CATGGGTCCT 
GCATCACCTG 
CCCACCGACT 
TGAAGTTTGA 
AGATCCGTGT 
TTGAATGCAA 
ATTTTTGCTA 
TAGGCCACAA 
CAGAAATTGC 
AGAGACTGTT 
TGCAGTGGCT 
GGATCACTTG 
TATCTACAAA 
CTACTCGGGA 
GCTTCAGTGA 
TGAAACCCTT 
ACCACTAGGT 
TGGATTTTTA 
TCCTATGCAT 
AAAAAAAAA 



CTTTGTCTTG 
AAGCCTGGAG 
TCTGGTTTCC 
TGGAAGTTGG 
TTGAGGTGGG 
CAGAGTGGGG 
GGAGATTGTT 
CTGTGCTAGA 
GGAGCATTGG 
TCTAAACCTG 
GAGATCTATG 
AACTCGATCT 
ACCACAGTAG 
CGGTATTTAT 
TATATGCACC 
CACACCTGTA 
AGGAGTTCAA 
AAAATACAAA 
GGCCGAGGTG 
GCTGAGATCG 
AATCAATCAG 
CATCTTATTT 
AATATTTTAA 
TTTACGCATT 



ACAACCGGGT 
TCTGTGGGAA 
TTTGGGGTTC 
AGGTGTGGAG 
AGTCATTAGT 
GCTCCTTAAG 
AGGAATGCAG 
ACAAGCGCCC 
TTTAAGCAAG 
GGGTCCACAG 
AAGTTGGATG 
AAAGTTTAGC 
TATTAGCAGT 
AGCACGTTAC 
ACTGTTTTAA 
ATCCCAGCAC 
GACCAGCCTG 
AGTTAACCAA 
GGAGGGTCTT 
CACCACCACA 
TCAATAAAAA 
GATGCATCAG 
TTACTATTTA 
AAAACATTTT 



GGTCCTGTTT 
GTAGGTAGGG 
CCATTAGGTG 
AGTTCAGGAG 
GAATAGATGA 
CCTCCAGGCC 
ATTCTCAGGC 
CTCAGAGATT 
ATTACCTACG 
ACACCCCCAA 
GGGGAAAAAT 
ATTTCCATCT 
GCCTGGGACC 
AGTTGTTGCA 
AATTACGGTG 
TTTGGGAGGC 
GTCAACATGT 
GCCTATGCTT 
CTGAGCCCAG 
CTCCAGCCTG 
TTACAGTAAT 
TAAAGCAGCA 
AATATCTCTT 
AAGCATTTAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 



1 MEP.WAMRVNE LYVDDPDKDS GGKIDVSLNI SLPNLHCELV C-LDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRHSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14pl 4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_14pl4, frame 3 



Report for DKFZphtes3_14pl4 . 3 



[LENGTH] 15 9 

[MW] 17778.55 

[pi] 5.74 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YAL042w] 5e-04 

[KW] Alpha_Beta 



SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVGLDIQDEMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KIPLNNGAGCRFEGQFSINKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ EGSCPAAWPFLLRIWMGVQAPWGFKPLMAGSGRSYSSLQ 
PRD ccccccchhhhhhhhhhhccccccccccccccccccccc 



(No Prosite data available for DKFZphtes3_14pl4 . 3) 
(No Pfam data available for DKFZphtes3_14pl4 . 3) 
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DKFZphtes3_14p7 



group: testes derived 



DKFZphtes3_14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated protein KAP3. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



weak similarity to kinesin associated protein KAP3 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2497 bp 

Poly A stretch at pos. 2424, polyadenylation signal at pos. 2400 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 



GGAATCCAAA 
GGATTTATTT 
CTTCAGCTAA 
GTTCAAAGGA 
AGGGGAAGAG 
AGCAGGCTGC 
CGCAGAAATA 
TTTTGCGTGA 
TGCACACAAC 
ATTTAAGGGA 
TTGGTTCAGA 
AAAGTGAGTA 
TAGCAGGAAT 
AATCATTATT 
GCTTTTTTAT 
ATTTCTTAAT 
TGATAAAACA 
AATTCGGGCC 
TGATTCATCA 
AGCTCTGCAC 
AATATTGCCA 
AGCCTTGGCC 
ACAAATACCA 
GGCAACCTGA 
GAAAGGGAGC 
TGGATCTGCA 
GCGCAGAGGC 
TGTGCTGGCC 
CCAACCCGGG 
CTTGATGATT 
TTTATCTTAC 
ATATTGCTGA 
ATCCTGGAGG 
CTGCGATTTC 
TGGATGCTCA 
AATCTCACTG 
CATTAAAAAG 
AGCTGGCCTG 
ACTAATGCTT 
CTTGCTCTCA 
ATCCAGACCT 
CCTGTGGCAC 
GGAACCCCTG 
AACGAGAACT 
CATTTTTTTC 
AGTGAGTAGC 
ATTATGGAAA 
AGATGAAAAT 
ATTAAACAAT 
AAAAAAAAAA 



GAAACAGTTA 
AACAAAATCA 
CTGATGATGG 
ACAACATCTT 
ACATGCGAGG 
AAACCAAAGC 
GAAGTAGACG 
ATTAGAAAAG 
TTCATCATGC 
AGAAGTATTC 
CTCGCTCAGC 
GAAAGAATCT 
GAGAAGAATG 
GGAGGTACTA 
ACTGTATGGG 
GAAATGATCA 
AATAAATGAG 
ACTTGCTAGT 
TTAGTAAGAA 
GGCAATGGAA 
GAATATTCAG 
AGCTATTCCA 
GAAGAAGCAG 
CGGCAAAAAA 
ATCCAAACTC 
TTCCCAGAAG 
CGCCGTCAGA 
AACATTGCCA 
GATAGTGGGC 
GTGAGGAGCT 
TACCAAGTGA 
ATTGCTCTTA 
CTGTGCGTGT 
ATTGTGCAGA 
GCATCAGGAT 
TGGATAAAGA 
TTAGTGGACT 
CTTGGTTTGT 
CGTCATGTTT 
TCATTTTTAG 
AAAAAACTAT 
AGCAGCTTCT 
CCCATTCCCT 
CACGTCTCCC 
AGCATTAACA 
TGAAGTATTT 
AATGAATATA 
ATGTGCATTT 
TTAGTTCTAG 
AAAAAAAAAA 



TGATGGGGGA 
AATGCTATTT 
AGGCTTCAGT 
TACCATCTCA 
GCCTCATCAT 
AGTCCCAAAA 
AAGTCTTTTG 
GAAGAAAACA 
TTTAGAGGAA 
TCCTGAAGAC 
CTTAAACTTG 
TCTTAATGTC 
ATTCTTTGAT 
AGAAGTGAAG 
GTCTATAAAG 
GCAAAGGTGC 
AACATCAAGA 
CCAGGTGACT 
CTAAGTTCCT 
CAGTACAAGG 
CAAACTTACT 
GATGTTATGC 
GATTTAGTCG 
TAACCAGGCT 
TGCTGTCATT 
CCGGTGGGCC 
GGCAGAGGAC 
TCCACCCGGG 
CTGCTCCTGA 
GGTGATCAAT 
AGAATTCCAT 
AAGCTTCTTG 
TTTCGGAAAT 
ACAATGTCCA 
ATCTGCTTTT 
CAAGCGTGTC 
GTTTAAGAGA 
AAAACTTTAT 
TGGAAATGAA 
ATGAAGAACT 
CACAAACTCC 
AAACCGAATT 
CTTTCTAACA 
TCATTCTTAA 
AATGTGGAAA 
TTTAAAATTA 
CACATTATAT 
TCAAGTAAAT 
TCTTAAAAAA 
AAAAAAAAAA 



CTCTATGGTG 
GCCACTTAAA 
GAAATAAAGG 
TCTCAAGAAT 
GCCCCAGTAG 
GCTGACCTGC 
GAATACAAGG 
TTGAAACGGT 
GGAAACATGC 
CCTGTGTAAA 
CAAAAATAAT 
TGCAAACTTA 
TCAAAATGAC 
ACCTGCAAAC 
TTCATTTCTG 
TGTGGAAATA 
AATGTGGTAC 
GCTACATTGA 
AAACATCAGT 
GTGACAAGGA 
TCTTACCGTG 
CTTATTTCTG 
TCCGIGTTGT 
CGTGAACAAT 
ATTCCAGACG 
AACGAGGCGA 
GTGCTCATCA 
CGTGGGCCCG 
CCACGCTGGA 
GCTACAGCGA 
AATTCAAGAC 
TCAGTAACAA 
CTCTCCCAGG 
CAGGTTCATG 
CTGCCTGTGG 
ATCTTGAAAG 
TTTGGGTCCT 
GGAACTTCAG 
GACACCAACA 
AGCACTGGAT 
ATTGGGAAAC 
CAGAGACATC 
TGATGCAGAT 
GAACTGGTAA 
GTTTTTCAAG 
AGCATTTCTT 
TTCCTGTTGA 
GACTTTTTCT 
AAAAAAAAAA 
AAAAAAAAAA 



AAAATAAATG 
GAGTCACCCA 
AGCAAGAAAT 
GGAGGGGACC 
CTCAGACCTG 
AAGAAGAGGA 
ATTGTACCGA 
TTGTGCTGCT 
TTGGAAATAA 
CTAGTTGATG 
TCTAGCACTT 
TATTTAAAAT 
AGCATTCTGG 
TAACATGGAA 
GAAATCTGGG 
CTGATAAATT 
ATTTTTGCCT 
GAAACTTGGT 
GCCCTTCCCC 
CGTCTGTACC 
ACTGCTGCAC 
AATCTAATTA 
TTTTATTCTT 
TTTCCAAAGA 
TTCCATCAGC 
GCAGCACAGG 
AGCTGACTCG 
GTGCTGGCCG 
ATACAAGTCA 
CAATCAACAA 
AAAAAGCTAT 
CATGGATGGA 
ACCATGATGT 
ATGGCGCTGC 
TGTTCTCCTC 
AAGGAGGTGG 
ACTGATTGGC 
TGAAAACATC 
CACTCTTACT 
GGCAGTTTTG 
AGAATTCAAA 
ACACCTTCCT 
TAACAGTAGA 
CAAACGTGAA 
AACTGGTTTT 
CTTGTTAGGT 
GAGAAATGTA 
TCTATTCTCT 
AAAAAAAAAA 
AAAAAAA 



BLAST Results 
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No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 



1 MMGDSMVKIN 
51 LPSHLKNGGD 
101 EVFWNTRIVP 
151 LLKTLCKLVD 
201 DSLIQNDSIL 
251 SKGAVEILIN 
301 SKFLNISALP 
351 RCYALFLNLI 
401 LLSLFQTFHQ 
451 IHPGVGPVLA 
501 KNSIIQDKKL 
551 NNVHRFMMAL 
601 CLRDLGPTDW 
651 DEELALDGSF 
701 SF 



GIYLTKSNAI 
QGKRHARASS 
ILRELEKEEN 
VGSDSLSLKL 
ESLLEVLRSE 
LIKQINENIK 
QLCTAMEQYK 
NKYQKKQDLV 
LDLHSQKPVG 
AN PGIVGLLL 
YIAELLLKLL 
LDAQHQDICF 
QLACLVCKTL 
DPDLKNYHKL 



CHLKSHPLQL 
CPSSSDLSRL 
IETVCAACTQ 
AKIILALKVS 
DLQTNMEAFL 
KCGTFLPNSG 
GDKDVCTNIA 
VRVVFILGNL 
QRGEQHRAQR 
TTLEYKSLDD 
VSNNMDGILE 
SACGVLLNLT 
WNFSENITNA 
HWETEFKPVA 



TDDGGFSEIK 
QTKAVPKADL 
LHHALEEGNM 
RKNLLNVCKL 
YCMGSIKFIS 
HLLVQVTATL 
RIFSKLTSYR 
TAKKNQAREQ 
PPSEAEDVLI 
CEELVINATA 
AVRVFGNLSQ 
VDKDKRVILK 
SSCFGNEDTN 
QQLLNRIQRH 



EQEMFKGTTS 
QEEDAEIEVD 
LGNKFKGRSI 
IFKISRNEKN 
GNLGFLNEMI 
RNLVDSSLVR 
DCCTALASYS 
FSKEKGSIQT 
KLTRVLANIA 
TINNLSYYQV 
DHDVCDFIVQ 
EGGGIKKLVD 
TLLLLLSSFL 
HTFLEPLPIP 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14p7, frame 2 

TREMBL : MMD3 67_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, 
complete cds . , N = 2, Score - 97, P = 0.00039 



Mus musculus mRNA for KAP3B, complete 



>TREMBL:MMD367_1 product: "KAP3B" 
cds . 

Length = 772 

HSPs: 

Score = 97 (14.6 bits), Expect = 3.9e-04, Sum P(2) = 3.9e-04 
Identities = 45/163 (27%), Positives = 77/163 (47%) 

Query: 442 LTRVLANIAIHPGVGPVLAANPGIVGLLLTTLEYKSLDDCEELVINATATINNLSYYQVK 501 

L +++ NI+ H G P VG L + S D+ EE VI T+ NL+ + 

Sbjct: 483 LMKMIRNISQHDG--PTKNLFIDYVGDLAAQI SSDEEEEFVIECLGTLANLTIPDLD 537 

Query: 502 -NSIIQDKKLYIAELLLKLLVSNNMDG-ILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMA 559 

++++ KL + L KL D +LE V + G +S D + ++ + ++ 

Sbjct: 538 WELVLKEYKL-VPFLKDKLKPGAAEDDLVLEVVIMIGTVSMDDSCAALLAKSGI I PAL IE 596 

Query: 560 LLDAQHQDICFSACGVLL NLTVDKDKR-VILKEGGGIKKLVDCLRD 604 

LL+AQ +D F C ++ + + R VI+KE L+D + D 

Sbjct: 597 LLNAQQEDDEF-VCQI I YVFYQMVFHQATRDVII KETQAPAYLI DLMHD 644 

Score = 77 (11.6 bits). Expect = 3.9e-04, Sum P(2) = 3.9e-04 
Identities - 42/178 (23%), Positives = 82/178 (46%) 

Query: 169 KLAKIILALKVSRKNLLNVCK-LIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNME 227 

K K L V ++ LL V L+ ++ + + + ++N +1+ L++ L + N E 

Sbjct: 263 KTFKKYQGLVVKQEQLLRVALYLLLNLAEDTRTELKMRNKNIVHMLVKALDRD NFE 318 

Query: 228 AFLYCMGSIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVT 287 

+ + +K +S + N+M+ VE L+ +1 +E++ L + + 

Sbjct: 319 LLILVVSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDL LNITLR 366 

Query: 288 ATLRNLVDSSLVRSKFLNISALPQLCTAM — EQYKGDKDVCT — NIARI — FSKLTSYRD 341 

L D+ L R+K + + LP+L + E YK +C +1+ F + +Y D 
Sbjct: 367 LLLNLSFDTGL-RNKMVQVGLLPKLTALLGNENYK-QIAMCVLYHISMDDRFKSMFAYTD 424 
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Query: 342 CCTAL 346 
C L 

Sbjct: 425 CIPQL 429 

Score = 69 (10.4 bits), Expect = 2.6e+00, Sum P(2) = 9.2e-01 
Identities = 35/146 (23%), Positives = 70/146 (47%) 



Query: 


512 


IAELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFS 


571 






I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+D+ 




Sbjct: 


304 


IVHMLVKALDRDNFELLILVVSFLKKLSIFMENKNDMVEMDIVEKLVKMIPCEHEDLLNI 


363 


Query: 


572 


ACGVLLNLTVDKDKRVILKEGGGIKKLVDCLRDLGPTDW-QLACLVCKTLWNFSENITNA 


630 






+LLNL+ D R+ + G + KL L G++ Q+A +C L++ S + 




Sbjct: 


364 


TLRLLLNLSFDTGLRNKMVQVGLLPKLTALL GNENYKQIA— MC-VLYHISMD-DRF 


416 


Query: 


631 


SSCFGNEDT-NTLLLLLSSFLDEELALD 657 








S F D L+ +L DE + L+ 




Sbjct: 


417 


KSMFAYTDCI PQLMKMLFECSDERI DLE 444 




Score 


= 68 


(10.2 bits), Expect = 3.2e-03, Sum P(2) = 3.2e-03 





Identities = 18/58 (31%), Positives = 30/58 (51%) 

Query: 190 LIFKISRNEKN-DSLIQNDSILESLLEVLRSE DLQTNMEAFLYCMGSIKFISG 241 

LI +++RN N + L+ N + + L +L VLR + +L TN+ +C S G 

Sbjct: 155 LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHG 212 

Score = 65 (9.8 bits), Expect = 6.4e+00, Sum P(2) = 1.0e+00 
Identities = 26/122 (21%), Positives = 53/122 (43%) 



Query : 


283 


LVQVTATLRNL VDSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNI ARI FSKLTS 


338 






+++ TL NL +D LV ++ +P L ++ + D+ + I S 




Sbjct: 


521 


VTF.CLGTT.AMT.TT PDT.DWFT.VT.KFY KLVPFT.KDKT.KPGAAFnDT.VT.KVV-TMTGTV^ 


576 


Query: 


339 


YRDCCTALASYSRCYALFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSI 


398 






D C AL + S + L+N Q+ + V +++++ + + R+ KE + 




Sbjct: 


577 


MDDSCAALLAKSGIIPALIELLNAQQEDDEFVCQIIYVFYQMVF-HQATRDVIIKETQAP 


635 


Query: 


399 


QTLLSL 404 








L+ L 




Sbjct: 


636 


AYLIDL 641 




Score 


= 65 


(9.8 bits), Expect = 6.4e+00, Sum P(2) = 1.0e+00 




Identities • 


- 44/177 (24%), Positives = 79/177 (44%) 




Query: 


481 


CE-ELVINATATIN-NLSYYQ-VKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 


537 






CE E ++N T + NLS+ ++N ++Q ++ LLL + N IA+V + 




Sbjct: 


355 


CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGNENYKQI — AMCVLYH 


409 


Query: 


538 


LSQDHDVCD-FIVQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGGIK 


596 






+SD F +++ML+ +1 +NL +K ++ EG G+K 




Sbjct: 


410 


ISMDDRFKSMFAYTDCI PQLMKMLFECSDERI DLELISFCINLAANKRNVQLICEGNGLK 


469 


Query: 


597 


KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEELAL 


656 






L+ R L D L+ K + N S + + + F + L +SS +EE + 




Sbjct: 


470 


MLMK — RALKLKD PLLMKMIRNI SQHDGPTKNLF-IDYVGDLAAQISSDEEEEFVI 


522 


Query: 


657 


D 657 
+ 




Sbjct: 


523 


E 523 




Score 


- 61 


(9.2 bits), Expect = 1.6e-02, Sum P(2) = 1.6e-02 




Identities = 


= 20/66 (30%), Positives = 34/66 (51%) 




Query: 


304 


LNISALPQLCTAM-EQYKGDKDVCTNIARIFSKLTSYRDCCTALAS YSRCYALFLNLINK 


3 62 






LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N+I + 




Sbjct: 


171 


LNETALGALARVLREDWKQSVELATNIIYIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 


229 


Query: 


363 


YQKKQDL 3 69 








K+ +L 




Sbjct: 


230 


ELKRHEL 236 





Pedant information for DKFZphtes3_14p7, frame 2 



Report for DKFZphtes3_14p7 .2 



[LENGTH] 

[MW] 

[pi] 



708 

79266.35 
6.57 



575 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



[ FUNCAT] 


30.25 vacuolar and 


lysosomal organization [S. cerevisiae, 


YEL013w] 3e-04 


r FUNCAT 1 


06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 


3e-04 








[FUNCAT] 


09.25 vacuolar and 


lysosomal biogenesis [S. cerevisiae, 


YEL013W] 3e-04 


[BLOCKS] 


BL00923F Aspartate 


and glutamate racemases proteins 




[BLOCKS] 


BL00288B Tissue inhibitors of metalloproteinases proteins 




[PROSITE] 


MYRISTYL 9 






[PROSITE] 


AMIDATION 1 






[PROSITE] 


CK2 PHOSPHO_SITE 


12 




[PROSITE] 


PKC PHOSPHO SITE 


7 




[PROSITE] 


ASN_GLYCOS YLATION 


11 




[KW] 


Alpha Beta 






[KW] 


LOW_COMPLEXITY 


7.49 % 





SEQ ESKETVMMGDSMVKINGI YLTKSNAICHLKSHPLQLTDDGGFSEIKEQEMFKGTTSLPSH 

SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMLGNKFKGRSILLKTLCKLVDVGSDSLSLKLAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLIFKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEILINLIKQINENIKKCGTFLPNSGHLLVQVTATLRNLV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNIS ALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALAS YSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRVVFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQKPVGQRGEQHRAQRPPSEAEDVLIKLTRVLANIAIHPGVGPVLAANPGIVGLLLTTLE 

SEG 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSIIQDKKLYIAELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFIVQNNVHRFMMALLDAQHQDICFSACGVLT.NLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ IKKLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPIPSF 

SEG xxx 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 
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PS00001 


206- 


•>210 


ASN 


GLYCOS YLATION 


pdocooooi 


PS00001 


212- 


■>216 


ASN" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


311- 


■>315 


asn" 


"glycosylation 


PDOCOOOOI 


PS00001 


385- 


•>389 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00001 


493- 


•>497 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00001 


500- 


->504 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00001 


543- 


•>547 


asn" 


GLYCOSYLATION 


PDOCOOOOI 


PS00001 


584- 


•>588 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00001 


628- 


•>632 


asn 


GLYCOSYLATION 


PDOCOOOOI 


PS00001 


632- 


->636 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00001 


635- 


•>639 


asn" 


"GLYCOSYLATION 


PDOCOOOOI 


PS00005 


173- 


■>176 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


186- 


•>189 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


241- 


•>244 


PKC" 


"PHOSPHO SITE 


PDOC00005 
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dc nnn n ^ 


295- 


>298 


Difr nunc o un 


S ITE 






344- 


>347 


i- i^-'w rrHJo r HU_ 


~C T rpp 




pcinnn n r 


387- 


>390 


PTCP DU^CDUn 
fJV_ rHUortlU 


S I TE 


pnnr n n n or 


fOUUUUJ 


421- 


>424 


PTfP DHnCDHfl 


"site 


pnor n. n o o r 

tr ukj\*. kj\j\j\j j 


pc nnnnfi 

tr O U U U U U 


79->83 


CKO PHPiCDHO 
rHUornU 


S XTE 


pnornnfiofi 


rouUUU 0 


C \J 1 


>205 


rif9 DuncDtin 
\^iS.Z rnUornU 


S ITE 


pnnr 1 nnnn^ 

"UvL UUUUD 


dc nnnn & 

rj UUUUD 


214- 


>218 


l\jL rnuo r nu 


S ITE 


pnnrnnnnfi 




218- 


>2 2 2 


DUACPUA 


S ITE 


rUU'w UUUUC 




230- 


>2 3 4 




SITE 


pnornrififtfi 

IT UUL UUUUU 




T5 n- 


■> j z y 




S ITE 


IT 1JUL U U^J u U 


ro uuu u o 


J4 4 


>J*to 




SITE 


IT L/V-"^ UUUUD 


it o U U U U O 


uy* 




ckO phacipha 


S ITE 


pnnr n n n n ? 


IT o uuuuo 


477- 


>481 


CVO PHACPWA 


S ITE 


pnnr* nn fin 


pennon £. 


483- 


>487 


CK"? PHACPHA 


SITE 


pnnr n n n n £ 


dc nnn n c 

iro UUUUD 


654- 


>658 


^t\i IrnvJolrrlU 




pnnr n nnnf 

1UUL UUUUD 


r o UUUUD 


698- 


>702 


t^t\Z rnUo rnU 


S ITE 


pnor n n n n £ 


dc nnn no 


17 


->23 


MVP T CTVT 
n I Kl ol IL 




pnnr n n n n q 

Jr L"JL* UUUUD 


pcnnn or 

roUUUUO 


64 


->70 


MYR T CTYT, 

11 1 Al J 1 111 




pnorn on n r 


PS00008 


144- 


>150 


MYRISTYL 




PDOC00008 


PS00008 


384- 


>390 


MYRISTYL 




PDOC00008 


PS00008 


402- 


>408 


MYRISTYL 




PDOC00008 


PS00008 


473- 


>479 


MYRISTYL 




PDOC00008 


PS00008 


533- 


>539 


MYRISTYL 




PDOC00008 


PS00008 


580- 


>586 


MYRISTYL 




PDOC00008 


PS00008 


641- 


>647 


MYRISTYL 




PDOC00008 


PS00009 


67 


->71 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphtes3_14p7 . 2 ) 
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DKFZphtes3_15al3 



group: testes derived 

DKFZphtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-speci f ic protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos. 1766, no polyadenylation signal found 

1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG TGCAGCCTCG TGCAGCTCTT 

51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA ATGTTGAATT AAAGAAAATA 

101 CTTTATCAGA AGAAGATGGC CACTGCCCAG TTGCAGAGGA CTCCCATGAG 

151 TGCACTGGTA TTTCCCAATA AGATATCAAC TGAACACCAG TCTTTGGTGT 

201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT CCTGTATCAC GTATTTGAGG 

251 GGAATATTCC CAGAATGCGC TTATGGAACA AGATATCTAG ATGATCTTTG 

301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG CCCAGGATCT ACACAGTTAG 

351 TGAAATGGAT GCTAGGATGT TATGATGCTT TACAGAAAAA ATATGTATAC 

4 01 ACAAACCCAG AAGATCCTCA GACAATTTCA GAATGTTACC AATTCAAATT 

451 CAAATACACC AATAATGGAC CACTCATGGA CTTCATAAGT AAAAACCAAA 

501 GCAACGAATC TAGCATGTTG TCTACTGACA CCAAGAAAGC AAGCATTCTC 

551 CTCATTCGCA AGATTTATAT CCTAATGCAA AATCTGGGGC CTTTACCTAA 

601 TGATGTTTGT TTGACCATGA AACTTTTTTA CTATGATGAA GTTACACCCC ' 

651 CAGATTACCA GCCTCCCGGT TTTAAGGATG GTGATTGTGA AGGAGTTATA 

7 01 TTTGAAGGGG AACCTATGTA TTTAAATGTG GGAGAAGTCT CAACACCTTT 

7 51 TCACATCTTC AAAGTAAAAG TGACCACTGA GAGAGAACGA ATGGAAAATA 
801 TTGACTCAAC TATACTATCA CCAAAACAAA TAAAAACACC ATTTCAAAAA 

8 51 ATCCTGAGGG ACAAAGATGT AGAAGATGAA CAGGAGCATT ATACAAGTGA 
901 TGATTTGGAC ATTGAAACTA AAATGGAAGA ACAGGAAAAA AACCCTGCAT 
951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT GTGAGGAAGA TGAAATTATG 

1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT TCTCATTCTC AGGTTGAGCA 

1051 GTTAGTCAAT AAAACATCTG AACTTGATAT GTCTGAAAGC AAAACAAGAA 

1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA ATGGAAATCA ACCAGTAAAA 

1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA CATGAATCTG GGAGAATAGT 

1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA GTCAGTGCCA AAAAGGAGAA 

12 51 AGTTTAGTGA ACCAAAGGAA CATATATAAA AATTATTTTT GTTCTGCAGG 

1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG AAGGACCCTA TATTATATTT 

1351 CCCTAACTCT GAAGATGTAT ATGTAGTTTA AAGCAGTTTG TACACTAAAA 

14 01 CTAAGTTTTT GGCTGACTGT CATATTGTGG TCCTTAATCT TGAGATAAAT 

14 51 CCAATAGAAC TTTTGAATAA AAGCAAAAGT ACAAATGTCA TAATTGATTC 

1501 GGTAATAAGT AAAATTTCAA AATTGATTTT GTTCATTACC TACTTAATAT 

1551 TTCCTTTAAA TATATACTAA CTGTTAAGGC CCTCTAATGC CATTTTTCTA 

1601 AACAGTAATG TTTACTTTGG TATTAAAATT TGGTATGGAT TCACTTTTTA 

1651 CTTATGTTAA AATTATACCA TTTAACTGGC TCTTTTGTCA TTGTGCTGTT 

1701 ATTAAAACAA TGTTCTTCAA TATTTTGACA TAATGTATTA ACATTTTAAT 

17 51 ATATAATGTA CAATTTAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 

1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACAAAAAAA AAAAAAGG 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 



1 MATAQLQRTP MSALVFPNKI STEHQSLVLV KRLLAVSVSC ITYLRGI FPE 

51 CAYGTRYLDD LCVKILREDK NCPGSTQLVK WMLGCYDALQ KKYVYTNPED 

101 PQTISECYOF KFKYTNNGPL MDFISKNQSN ESSMLSTDTK KASILLIRKI 

151 YILMQNLGPL PNDVCLTMKL FYYDEVTPPD YQPPGFKDGD CEGVIFEGEP 

201 MYLNVGEVST PFHIFKVKVT TERERMENID STILSPKQIK TPFQKILRDK 

251 DVEDEQEHYT SDDLDIETKM EEQEKNPASS ELEEPSLVCE EDEIMRSKES 

301 PDLSISHSQV EQLVNKTSEL DMSESKTRSG KVFQNKMANG NQPVKSSKEN 

351 RKRSQHESGR IVLHHFDSSS QESVPKRRKF SEPKEHI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15al3, frame 2 

TREMBL : ATAC2 130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N = 1, Score = 
274, P = 5.7e-22 

TREMBL :SC9877_9 gene: "hopl"; S.cerevisiae chromosome IX cosmid 9877., 
N = 2, Score = 126, P = 7.1e-09 

PIR:A34691 meiosis-specif ic protein HOP1 - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 126, P = 7.8e-08 



>TREMBL:ATAC2130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 
Length = 562 

HSPs : 

Score = 274 (41.1 bits), Expect = 5.7e-22, P = 5.7e-22 
Identities - 8.4/290 (28%), Positives = 145/290 (50%) 

Query: 22 TEHQSLVLVKRL LA VSVSC ITYLRGI FPECAYGTRYLDDLCVKILREDKNCPGSTQLVKW 81 

TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + S +L+ W 

Sbjct: 11 TEQDSLLLTRNLLRI AI FNIS YIRGLFPEKYFNDKS VPZiLDMKI KKLMPMDAESRRLIDW 70 

Query: 82 M-LGCYDALQKKYVYT NPEDPQTISECYQFKFKYTNNGP--LMDFISK— NQSN 130 

M G YDALQ+KY+ T D I E Y F F Y+++ +M I++ N+ N 

Sbjct: 71 MEKGVYDALQRKYLKTLMFSICETVDGPMIEK-YSFSt'SYSDSDSQDVMMNINRTGNKKN 129 

Query: 131 ESSMLST DTKKASILLIRKI YILMQNLGPLPNDVCLTMKLFYYDEVTPPDYQPP 184 

ST + ++ ++R + LM+ L +P++ + MKL YYD+VTPPDY+PP 

Sbjct: 130 GGIFNSTADITPNQMRSSACKMVRTLVQLMRTLDKMPDERTIVMKLLYYDDVTPPDYEPP 189 

Query: 185 GFKD— GDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTT ERERMENIDSTILS 235 

F+ D ++ P+ + +G V++ + +KV + E + M++ D + 

Sbjct: 190 FFRGCTEDEAQYVWTKNPLRMEIGNVNSKHLVLTLKVKSVLDPCEDENDDMQD-DGKSIG 248 

Query: 236 PKQIKTPFQKILRDKDVEDEQEHY TSDDLDIETKMEEQEKNPASSE 281 

P + Q D ++ QE+ DD D E ++ ++PA +E 

Sbjct: 249 PDSVHDD-QPSDSDSEISQTQENQFIVAPVEKQDDDDGEVDEDDNTQDPAENE 300 



Pedant information for DKFZphtes3_15al3, frame 2 



Report for DKFZphtes3_15al3 .2 



[LENGTH] 387 

[MW] 44417.64 

[pi] 5.57 

[HOMOL] TREMBL : ATAC2 1 3 0_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 9e-23 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072w] 7e-ll 

[FUN CAT] 03.19 recombination and dna repair [S. cerevisiae, YIL072w] 7e-ll 

[ FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YIL072w] 7e-ll 

[PIRKW] nucleus 2e-09 

[PIRKW] zinc finger 2e-09 
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[PIRKW] DNA binding 2e-09 

[PROSITE] MYRI ST YL 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 12 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 



SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDIETKMEEQEKNPASSELEEPSLVCEEDEIMRSKES 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ PDLSISHSQVEQLVNKTSELDMSESKTRSGKVFQNKKANGNOPVKSSKENRKRSQHESGR 

PRD ccccccchhhhhhhhhhcccccccccccccceeeeeccccccccchhhhhhhhhhcccce 

SEQ IVLHHFDSSSQESVPKRRKFSEPKEHI 

PRD eeeeecccccccccccccccccccccc 
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>322 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


346- 


■>350 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


354- 


>358 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


369- 


>373 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00008 


84 


->90 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_15al3 . 2) 



580 



12/13/10, EAST Version: 2.4.2 



WO 01/12659 

DKFZphtes3_15c24 



PCT/IB00/01496 



group: metabolism 

DKFZphtes3_15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), catalyzes the 
oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 



strong similarity to C.elegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus : unknown 

Insert length: 1956 bp 

Poly A stretch at pos . 1929, polyadenylation signal at pos . 1903 



1 CGAAGGCGGC GGCGAAGGCC 
51 AGCCATGGCG GAGTCTGTGG 
101 AGCGGGAACT TGCCCAGGAG 
151 GGAGGGGGCG GCCGGGTCCG 
201 TTCGAATCCC TACAGCCGCT 
251 GCGACTATGA GAAAATCCGT 
301 GGAGTAGGTA GTGTGACTGC 
351 GTTGCTACTC TTTGATTATG 
401 TTTTCTTCCA ACCTCATCAA 
451 CATACTCTGA GGAACATTAA 
501 TAATATAACC ACAGTGGAAA 
551 ATGGTGGGTT AGAAGAAGGA 
601 GACAATTTTG AAGCTCGAAT 
651 ACAAACATGG ATGGAATCTG 
701 TACAGCTTAT AATTCCTGGA 
751 CTTGTAGTTG CTGCAAATAT 
801 TTGTGCAGCC AGTCTTCCTA 
851 TACAAAACGT GTTAAAGTTT 
901 CTTGGATACA ATGCAATGCA 
951 AAATCCTCAG TGTGATGACA 
1001 AGAAAAAGGT AGCAGCACTG 
1051 GAGATAATCC ATGAAGATAA 
1101 TTCAGAAGAG GAACTGAAAA 
1151 AAGGAATTAC AGTGGCATAC 
1201 ACTGAGTTAA CAGTGGAAGA 
1251 CAAAATGAAG AATATGTAGA 
1301 GTTAAAGCCT CTTCCCTTGA 
1351 TAGGGCAACA TTAATTAATG 
1401 GAAAATCCTG TGACTTGCCT 
1451 TCTCCTAAAA TGTGTTTCAT 
1501 GGATATAAAT CTTACTTGAA 
1551 GGAGTGGGGG AAGGACAAAT 
1601 TCCCTTGTGT CTGTTGCATG 
1651 CTCAGATACA GGGAGAAGGA 
1701 CAAGCATCTG CTCATTATGT 
1751 ACT ACT ACT A ACTTGATCAA 
1801 TAACATCCTC TCAAATGTTT 
1851 TTGGAAAAGT CTGTAACCTG 
1901 GCAAATAAAA AGCAGCTATT 
1951 AAAAAG 



CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
TGATGGCATT GAAACGAATG GGAATTGTAA 
ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
TGAAATGCTG ACAAGATGTG GCATTGGTAA 
ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
GCAGGATTAA GTAAAGTTCA AGCAGCAGAA 
TCCTGATGTT CTTTTTGAAG TACACAACTA 
ACTTTCAACA TTTCATGGAT AGAATAAGTA 
AAACCTGTTG ATCTAGTTCT TAGCTGTGTG 
GACAATAAAT ACAGCTTGTA ATGAACTTGG 
GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
GAATCTGCTT GTTTTGCGTG TGCTCCACCA 
TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
CCACTATGGG TGTGGTTGCT GGGATCTTAG 
CTGTTAAATT TTGGTACTGT TAGTTTTTAC 
GGATTTTTTT CCTACTATGT CCATGAAGCC 
GAAATTGCAG GAAGCAGCAG GAGGAATATA 
CCTAAACAAG AGGTTATACA AGAAGAGGAA 
TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
ATTTTTCAGG TCCAGTTCCA GACTTACCTG 
ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
TTCTGGTGAA AGCTTGGAAG ACCTCATGGC 
TAATGGACTG GGATATATTG TATTTCTCAT 
AATTAAAAAA AAATTTTAAC TGATAAAACT 
TATATTCTTA CCTGAATTGT TATACTTTTT 
GTTTCTCCCC GCTCCAACGA AATCATTAAC 
TCTAGTAAGA AAACCTCAAA GGATATTGTA 
AACATAGCTG TTGAAATGTT TTGGCCTTTT 
CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
AGGACATGGA CAATAAAGTA GTATATGATC 
CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
TTGGAATTGC TTTCTATAAG AAAATTGCCC 
CAATGAATTC AAAATAGTTA ACCTATGAAA 
GCTGATGAAG TACAAGTTGA AATGTAGTTA 
TGGATCATAT ATATTCAAAG TGAGACAAAG 
TTCATGAATA GACAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein 
Classification: Metabolism 

Prosite motifs: D 2 HYDROXYACI D_DH_1 (7 6-105) 



1 MAESVERLQQ 
51 NPYSRLMALK 
101 LLFDYDKVEL 
151 ITTVENFQHF 
201 TWMESGVSEN 
251 AASLPTTMGV 
301 PQCDDRNCRK 
351 EEELKNFSGP 
401 MKNM 



RVQELERELA 
RMGIVSDYEK 
ANMNRLFFQP 
MDRISNGGLE 
AVSGHIQLII 
VAGILVQNVL 
QQEEYKKKVA 
VPDLPEGITV 



QERSLQVPRS 
IRTFAVAIVG 
HQAGLSKVQA 
EGKPVDLVLS 
PGESACFACA 
KFLLNFGTVS 
ALPKQEVIQE 
AYTIPKKQED 



GDGGGGRVRI 
VGGVGSVTAE 
AEHTLRNINP 
CVDNFEARMT 
PPLVVAANID 
FYLGYNAMQD 
EEEIIHEDNE 
SVTELTVEDS 



EKMSSEVVDS 
MLTRCGIGKL 
DVLFEVHNYN 
INTACNELGQ 
EKTLKREGVC 
FFPTMSMKPN 
WGIELVSEVS 
GESLEDLMAK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15c24 , frame 1 

TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid 
T03F1., N = 1, Score = 1204, P = 1.9e-122 

TREMBL: ATAC98_3 gene: "YUP8H12 . 3"; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N = 1, Score = 733, P = 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) homolog - Archaeoglobus 
fulgidus, N = 1, Score = 218, P - 1.8e-17 

TREMBL:AF022796_4 gene: "moeB"; product: "MoeB" ; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluster, complete 
sequence., N = 1, Score = 220, P = 3.7e-16 



>TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabditis elegans cosmid T03F1 . 
Length = 419 

HSPs: 



Score = 1204 (180.6 bits). Expect = 1.9e-122, P = 1.9e-122 
Identities = 241/367 (65%), Positives = 293/367 (79%) 



Query: 


37 


RVRIEKMSSEVVDSNPYSRLMALKRMGI VSDYEKIRTFAVAI VGVGGVGSVTAEMLTRCG 


96 




R +IEK+S+EVVDSNPYSRLMAL+RMGIV++YE+IR VA+VGVGGVGSV AEMLTRCG 




Sbjct: 


48 


RQKIEKLSAEVVDSNPYSRLMALQRMGIVNEYERIREKTVAVVGVGGVGSVVAEMLTRCG 


107 


Query: 


97 


IGKLLLFDYDKVELANMNRLFFQPHQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVEN 


156 




IGKL+LFDYDKVE+ANMNRLF+QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 




Sbjct: 


108 


IGKLILFDYDKVEI ANMNRLFYQPNQAGLSKVEAARDTLIHVNPDVQTEVHNFNTTTMDN 


167 


Query: 


157 


FQHFMDRISNGGLEEGKPVDLVLSCVDNFEARMTINTACNELGQTWMESGVSENAVSGHI 


216 






F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 




Sbjct: 


168 


FDTFVNRIRKGSLTDGK-IDLVLSCVDNFEARMAVNMACNEENQIWMESGVSENAVSGHI 


226 


Query: 


217 


QLIIPGESACFACAPPLVVAANIDEKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNF 


276 






Q I PG++ACFAC PPLVVA+ IDE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 




Sbjct: 


227 


QYIEPGKTACFACVPPLVVASGIDERTLKRDGVCAASLPTTMAVVAGFLVMNTLKYLLNF 


286 


Query: 


277 


GTVSFYLGYNAMQDFFPTMSMKPNPQCDDRNCRKQQEEYKKKVAALPKQ-EV-IQEEEEI 


334 






G VS Y+GYNA+ DFFP S+KPNP CDD +C ++Q+EY++KVA P EV + EEE + 




Sbjct: 


287 


GEVSQYVGYNALSDFFPRDSIKPNPYCDDSHCLQRQKEYEEKVANQPVDLEVEVPEEETV 


346 


Query: 


335 


I HEDNEWGI EL VSEVS EEELKNFSGP VP DLPEGITVAYT I PKKQEDSVTELTVEDSGESL 


394 






+HEDNEWGIELV+E SE + S + G+ AY P K+ D+ TEL+ + + 




Sbjct: 


347 


VHEDNEWGIELVNE-SEPSAEQSSSL — NAGTGLKFAYE-PIKR-DAQTELSPAQA — AT 


399 


Query: 


395 


EDLMAKMKN 403 
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D M +K+ 
Sbjct: 400 HDFMKSIKD 408 



Pedant information for DKFZphtes3_15c24, frame 1 



Report for DKFZphtes3_15c24 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[ FUNCAT ] 
[ FUNCAT ] 
palmitylation, 
4e-07 
[ FUNCAT ] 



404 

44863.36 
4.79 

TREMBL:CEUT03F1_11 gene: 



•T03F1.1"; Caenorhabditis elegans cosmid T03F1. le-115 



h cofactor metabolism [H. influenzae, HI1449] 2e-08 

06.07 protein modification (glycolsylation, acylation, myristylation, 



farnesylation and processing) 



[S. cerevisiae, YDR390C UBA2 - El-like] 



04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) [S. 



cerevisiae, YDR390c UBA2 - El-like] 4e-07 



[ FUNCAT ] 

4e-07 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

2e-06 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

[KW] 

[KW] 



06.13.01 cytoplasmic degradation 



[S. cerevisiae, YDR390c UBA2 - El-like] 



30.10 nuclear organization [S. cerevisiae 
11.01 stress response [S. cerevisiae, YKL210w UBA1 - 
30.03 organization of cytoplasm [S. cerevisiae, 

BL01042A Homoserine dehydrogenase proteins 

thiamine pyrophosphate le-07 

molybdenum 5e-07 

molybdopterin biosynthesis 5e-07 

molybdopterin biosynthesis protein moeB 2e-12 

D_2_HYDR0XYACID_DH_1 1 

TRANSMEMBRANE 1 

LOW COMPLEXITY 8.66 % 



YDR390C UBA2 - El-like] 4e-07 
El-like] 2e-06 
YKL210w UBA1 - El-like] 



SEQ MAESVERLQQRVQELERELAQERSLQVPRSGDGGGGRVRIEKMSSEVVDSNPYSRLMALK 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 

MEM 

SEQ RMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCGIGKLLLFDYDKVELANMNRLFFQP 

SEG xxxxxxxxx 

PRD cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ HQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

SEG 

PRD ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 

MEM 

SEQ CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLIIPGESACFACAPPLVVAANID 

SEG 

PRD cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 

MEM 

SEQ EKTLKREGVCAASLPTTMGVVAGI LVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 

MEM 

SEQ PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

SEG xxxxxxxxxxxxxxx. . .xxxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 

MEM 

SEQ VPDLPEGITVAYTIPKKQEDSVTELTVEDSGESLEDLMAKMKNM 

SEG 

PRD ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 

MEM 



Prosite for DKFZphtes3_15c24 . 1 
PS00065 76->105 D_2_HYDROXYACID_DH_l PDOC00063 

(No Pfam data available for DKFZphtes3_15c24 . 1) 
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DKFZphtes3_15c6 



group: transmembrane protein 

DKFZphtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown 



complete cDNA, complete cds, EST hits 
Sequenced by GBF 
Locus: unknown 



Insert length: 1283 bp 

Poly A stretch at pos. 1264, no polyadenylation signal found 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



GAGACACTGA 
CACCCTGTCA 
CTTCCATTCT 
CCTGAGGAAG 
GGGGGAGCTG 
CGCCAAAACC 
GTACAGTCAG 
TTGAGGAAGG 
TGTTTCCTCC 
TGGCTGTCGG 
GTCCTGGGCA 
AATAATTTTC 
TCCCACAGCC 
ATGCCCCAAG 
GGACCTGTGC 
ACTTTATAAT 
CTGTGTCCTG 
TCCCGTGGGC 
AGAGTAAACC 
CTGTAGTCTG 
TTCATGATGC 
AGTTGGTGAG 
CTCTGGGTGA 
ACACTAACAT 
TAAAAAAAAA 
ATCCAAGCTT 



GCCCCGAGAC 
CCTCCACTTT 
CCCACCTGTT 
AAGAGGCACT 
GAGCAGCTGA 
CCCTGATGCT 
ACCAAGAAGC 
AGCTGCAGGC 
TTTGCCTACC 
ATGGTAGCTA 
TGGTGCAGTA 
CATTTGGGTT 
TGCGCTTGCC 
CCCTGGTGGT 
TGCTCTGCCC 
TTTTCTCTTG 
TCCTTAGCAG 
ACTGGCCAAG 
TGGGGCAGTG 
TGTAACCTTC 
AGGAGAGCAG 
CATGTGCTCT 
TCCAAGTGTA 
CTGTGCAGGT 
AAAAAAAAAA 
ACGTAAAAAA 



AGTGAGTGGT 
GCCTTGTTGG 
CCCCAGGACT 
CACCACTGAG 
ATGCAGAGCT 
CCACCCCTGG 
TCAGGCCGTG 
ACAGTAGGGC 
ACTCTGGGGT 
TTCCACCCTC 
CCTGTGCCTA 
AGTGGATGTG 
TCCCTGCCTC 
CTGGCCCTTT 
TCATGTCCCA 
TCTTGTGTTC 
CTCAACCCCA 
CTTTAGGGAG 
GGTCAGGCCA 
ACTGCATCCT 
GGATCCCGCA 
CTCTTGAGAT 
GTGGGACCCC 
GTTGACTTGA 
AAAAAAAAAA 
AAAAAAAAAA 



GGCCTCACTG 
AAGTGACCCA 
CACCCCAGCC 
GACTTTGAGT 
GGGCTTGGAG 
GGCCCGACAT 
GCAGAGCCAT 
TTCCTGGCTA 
GGGGCAGTGT 
TGCCTGCCTG 
GGATTGGTTT 
AACAGGGCTA 
ATCTCTATTC 
CTTTTTCCTC 
CTTGGTTGTT 
CTTTCTGCTT 
TCCTTTGCCA 
GCTCCTGGTC 
GTAGTTACAC 
TGCCCCATTC 
GTACATGGCG 
TAGGAGCTTC 
CTACTAGGGT 
AAAATAAAGT 
AAGGGCGGCC 
AAG 



CTCTGCCCGG 
GCCCCCTCCC 
CCTGCCTGCC 
TGCTGGATCA 
CCAGAGACAC 
CCATTCTCTG 
GAGCCAGCCG 
GGAGTGTTGC 
GTGGGGAAGC 
CCTGCCTGCT 
TAAATTTGTA 
GGGAAGTCCT 
TCATTCCACT 
CTATCCTCAG 
TAGTTGAGGC 
TATTTCCCTG 
GCTCCTCCTA 
TGGGAAGTAA 
TCTTAGGTCA 
AGCCCGGCCT 
CCAGCACTGG 
CTTACTGCTC 
CAGGAAGTGG 
GTTGATTGGC 
GCTCTAGAGG 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 



1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 

PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana, N = 
76, P = 0.33 



Score = 



>PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana 
Length = 258 

HSPs : 

Score = 76 (11.4 bits), Expect = 4.0e-01, P = 3.3e-01 
Identities = 30/91 (321), Positives = 44/91 (48%) 

Query: 15 PGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLFSFHYAPSPGGLALS 7 4 

PG GA P+ R+ F+ PF + +E+ A C P SSL+ A G L 

SbjCt: 52 PGRGA-PLARVTFRH PFRF KKQKELFVAAEVCTPV3SLYCGKKATLVVGNVLP 103 

Query: 75 FSSYPQGPVLLCP HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A + +++V+ 
SbjCt: 104 LRS I PEGAVV-CNVEHHVGDRGVLARASGDYAT VI 137 



Pedant information for DKFZphtes3_15c6, frame 2 



Report for DKFZphtes3_15c6.2 



[LENGTH] 

[MW] 

[pi] 

[PROSITE] 
[PROSITE] 
[ PROSITE] 
[KW] 



118 

12413.79 
7 . 53 

LEUCINE_ZIPPER 1 
MYRISTYL 1 
ASN_GLYCOS YLATION 
TRANSMEMBRANE 1 



SEQ MVAIPPSACLPACCPGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLF 

PRD cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 

MEM 

SEQ SFHYAPSPGGLALSFSS YPQGPVLLCPHVPLGCLVEAL YNFSLVLCSFLLYFPAVSCP 

PRD eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 

MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphtes3_15c6.2 

PS00001 100->104 ASNGLYCOS YLATION PDOC00001 

PS00008 70->76 MYRISTYL PDOC00008 

PS00029 84->106 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_15c6.2) 
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DKFZphtes3_15gl4 



group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR243c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 34S2, no polyadenylation signal found 



1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 

51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 

101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 

151 TGATGAACAG GGACAGTTAG TTAATAAGAC CATCGATGAG CCTATTTTCA 

201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 

251 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 

301 TACTTTGATT AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 

351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 

401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 

451 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 

501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 

551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 

601 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 

651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 

701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 

751 AGACCACAGA AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 

801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 

851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 

901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 

951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 

1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 

1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 

1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 

1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 

1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 

1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 

1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATT7GGGA AGGGAAGGAA 

1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 

1401 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 

1451 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 

1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 

1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 

1501 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 

1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 

1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 

1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 

1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 

1851 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 

1901 CTACAGACAT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 

1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 

2001 TGGAAGATCA TGACATTGAT GTCAAAACGA AAGGTTCCCA CATTGATGAA 

2 051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 

2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 

2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 

2201 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT GATAGATACC 

2251 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 

2301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 

2 351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 

2401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 

2 451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AGACAATTAC 

2501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 

2 551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 

2 601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 

2 651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 

2 701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 
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27 51 AAAGCAGTAT GGAGCATTAT ATATCAGTAA TGTGATATAT ATACTTAAGC 
2801 CAGTTTAACC ATTTTGGGAA ATGTTAGCAT TAGGAAATAA AATCCAAAAG 

28 51 AAGGAAGAGA AGCTATATGC AATGCAAAAT TTGCTTATTG CAATATTTTC 
2901 ATATACAGAC ACTAAAAACA GTTTTCAAAG TCCAGCATTA CGTAACTAAA 
2951 GTAAGTAAAA TGATGTGTAT CAACTTGATG GTAAAATATG TAGTTATTTA 
3001 AAAAAGCAAT GAACAATTTA GTTTCATGAG AAAATGTTGC CCCCTAAAAG 
3051 TAGAACACAT ATGTTACAAC TGCAATAATA CTCTGAATTC ATCTTTCACA 
3101 AATAAGAGAC ATGTTAGCAT AGTGATTAAA AGCACAGATA TTGGAGACAA 
3151 ACTAACCCAG TTTGAACCCT GGCACTGCCA CGTATAGCAC TGCAGCCTTG 
3201 GGAAAGTTAT TTAAACTCAT GGGCTTCAGT TTCAACATCT GTAAAATGGG 
3251 CATGTTAACA TTGCCTACCT CATAGGATTA CTGTGAGAAT TTTCTAAGTT 
3301 AATATATGTA AAGCAACTTT AAAAAGTGCC TGGCACTTAG TTATTGTTAA 
3351 GTAAGTGTCT GCAGATGCAA GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 
3 4 01 TCCCTTCCTG TTAAGATGAA AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 
34 51 CGGCCGCTCA AGATGAAAAA AAAAAAAAAA AAAAAAAAAA AAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 



1 MEEDTDYRIR FSSLCFFNDH VGFHGTIKSS PSDFIVIEID EQGQLVNKTI 

51 DEPIFKISEI QLEPNNFPKK PKLDLQNLSL EDGRNQEVHT LIKYTDGDQN 

101 HQSGSEKEDT IVDGTSKCEE KADVLSSFLD EKTHELLNNF ACDVREKWLS 

151 KTELIGLPPE FSIGRILDKN QRASLHSAIR QKFPFLVTVG KNSEIVVKPN 

201 LEYKELCHLV SEEEAFDFFK YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 

251 KKFGNLVETK SFSKMNCSAG NPNVVVTVRF REKAHKRGKR PLSECQEGKV 

301 I YTAFTLRKE NLEMFEAIGF LAIKLGVIPS DFSYAGLKDK KAITYQAMW 

351 RKVTPERLKN IEKEIEKKRM NVFNIRSVDD SLRLGQLKGN HFDIVIRNLK 

401 KQINDSANLR ERIMEAIENV KKKGFVNYYG PQRFGKGRKV HTDQIGLALL 

451 KNEMMKAIKL FLTPEDLDDP VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 

501 LLEALHRFGM TEEGCIQAWF SLPHSMRI FY VHAYTSKIWN EAVSYRLETY 

551 GARVVQGDLV CLDEDIDDEN FPNSKIHLVT EEEGSANMYA IHQWLPVLG 

601 YNIQYPKNKV GQWYHDILSR DGLQTCRFKV PTLKLNIPGC YRQILKHPCN 

651 LSYQLMEDHD I DVKTKGSHT DETALSLLIS FDLDASCYAT VCLKEIMKHD 

701 V 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15gl4 , frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 . 09"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosmid., N = 3 , Score = 511, P = 2 . 9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 516, E = 7.3e-54 

SWISSPROT:YQ4B_CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.ll IN 
CHROMOSOME V., N = 2, Score =38 6, P = 2.1e-34 



>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length = 676 

HSPs: 

Score = 516 (77.4 bits), Expect = 7.3e-54, Sum P(2) - 7.3e-54 
Identities = 151/498 (30%), Positives = 245/498 (49%) 

Query: 191 KNSEIVVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 249 
+ E V P L +L + EE+ Y A K + F+ +K R +H + 
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Sbjct: 109 RRQEFNVDPELR-NQLVEIFGEEDVLKIESVYRTANKMETAKNFE DKSVRTKIHQLL 164 

Query: 250 NKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHK-RGKRPLSECQEG-KVIYTAFTL 307 

+ F N +E+ + N +EK ++ R + G + FTL 

Sbjct: 165 REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 224 

Query: 308 RKENLEMFEAIGFLAIKLGVIPSD-FSYAGLKDKKAITYQAMVVRKVTPERLKNIEKEIE 366 

KEN + EA+ + KL +PS YAG KD++A+T Q + + K+ +RL + + + 
Sbjct: 225 HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 282 

Query: 367 KKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENVKKKGFV 42 6 

K M + N D SL LG LKGN F +VIR++ N +L E + +++ + GF+ 

Sbjct: 283 -KGMI IGNYNFSDASLNLGDLKGNEFVVVIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 340 

Query: 427 NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNR-AKKYFLQTEDAK 485 

NY+G QRFG + T IG LL + KA +L L+ +D P ++ A+K + +T+DA 

Sbjct: 341 NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 399 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query : 
Sbjct : 



48 6 GTLSLMPEFKVRERALLEALHRFGMTEEGCIQ— AWFS LPHSMRI FYVHAYTSKIW 53 9 

L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 

400 LALKQMPRQCLAENALLYSLSNQRKEEDGTYSENAYYTAIMKI PRNLRTMYVHAYQS YVW 459 

540 NEAVSYRLETYGARVVQGDLVC LDEDIDDENFPNS KIHLVTEEEGS 585 

N S R+E +G ++V GDLV L IDDE+F + VT+E+ 

4 60 NSIASKRIELHGLKLVVGDLVI DTSEKSPLISGIDDEDFDEDVREAQFIRAKAVTQEDID 519 

58 6 ANMYAIHQVVLPVLGYNIQYPKNK- VGQWYHDI LSRDGLQTCRFKVPTLKLNI PGCYRQI 64 4 

+ Y + VVLP G+++ YP N+ + Q Y DIL D + + ++ G YR + 

52 0 SVKYTMEDVVLPSPGFDVLYPSNEELKQLYVDILKADNMDPFNMRRKVRDFSLAGSYRTV 57 9 

645 LKHPCNLSYQLMEDHDIDVKTKGSHID 671 

++ P +L Y+++ D + + +D 
580 IQKPKSLEYRIIHYDDPSQQLVNTDLD 606 



Score = 86 (12.9 bits), Expect = 3.2e-01, Sum P(2) = 2.8e-01 
Identities = 40/160 (25%), Positives = 77/160 (48%) 

Query: 22 GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPKLDLQNLSLE 81 

GF G IK +DF+V EID++G++++ T D+ FK+ + +P K +++ + S E 

Sbjct: 55 GFRGQIKQRYTDFLVNEI DQEGKVIHLT-DKG- FKMPK KPQR — SKEEVNAEKES-E 106 

Query: 82 DGRNQEVHTLIKYTDGDQNHQSGS — EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 138 

R QE + D + +Q +ED + ++ + K + +F D+ ++ 

Sbjct: 107 AARRQEFNV DPELRNQLVEIFGEEDVLKIESVYRTANKMETAKNFEDKSVRTKIH 161 

Query: 139 NFACDVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSA1RQ 181 

+RE + ++ E + FIR ++N R + I Q 

Sbjct: 162 QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 

Score = 58 (8.7 bits), Expect = 7.3e-54, Sum P(2) = 7.3e-54 
Identities = 10/23 (43%), Positives - 17/23 (73%) 

Query: 67 6 SLLISFDLDASC YATVCLKEIMK 698 

++++ F L S YAT+- L+E+MK 
Sbjct: 638 AVVLKFQLGTSAYATMALRELMK 660 

Pedant information for DKFZphtes3_15gl4 , frame 2 



Report for DKFZphtes3_15gl4 .2 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

51 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 



701 

80700.96 
7.31 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 2e- 



99 unclassified proteins 
BL01268C 
BL01268B 
BL01268A 

hypothetical protein HI0701 3e-06 
MYRISTYL 7 
AMIDATION 2 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 16 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 13 
ASN_GLYCOSYLATION 5 
Alpha Beta 



IS. cerevisiae, YOR243c] 8e-53 
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SEQ MEEDTDYRIRFSSLCFFNDHVGFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEI 

PRD ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKLDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFSIGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhheeecccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIVVKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhcccccceeeecccccch 

SEQ HRKAVHHFVNKKFGNLVETKSFSKMNCSAGNPNVWTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ I YTAFTLRKENLEMFEAIGFLAIKLGVI PSDFSYAGLKDKKAITYQAMVVRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ IEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhheeeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEDAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRIFYVHAYTSKIWN 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARVVQGDLVCLDEDIDDENFPNSKIHLVTEEEGSANMYAIHQVVLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeecccccccccccceeeccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYROILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ IDVKTKGSHIDETALSLLI SFDLDASCYATVCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_15gl4 .2 
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PS00008 326->332 MYRISTYL PDOC00008 

PS00008 385->391 MYRISTYL PDOC00008 

PS00008 514->520 MYRISTYL PDOC00008 

PS00008 622->628 MYRISTYL PDOC00008 

PS00009 287->291 AMI DAT I ON PDOC00009 

PS00009 436->440 AMI DAT I ON PDOC00009 



(No Pfam data available for DKFZphtes3_15gl4 . 2 ) 
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DKFZphtes3_15hl 



group: testes derived 

DKFZphtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2277 bp 

Poly A stretch at pos . 2252, polyadenylation signal at pos . 2226 

1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 
51 TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 

101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 

151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 

201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 

251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 

301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 

351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 

401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 

451 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 

501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 

551 TCAGCCCATG AAACACCTCT TACACCCCAC CAAGGGAGAG CCCAAGTGGA 

601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 

651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 

701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 

751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 

801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 

851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 

901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 

951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 
1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 
1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 
1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 
1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 
1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 
1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 
1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 
1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 
1401 CAGGCACAAG TGAAGCTGAG AGACTTCGAG TCAGCCGTGA ACAATTTTGA 
1451 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 
1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 
1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 
1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 
1651 TGAGAGATGA GCCCGAGAAG GTGGTGAAGC AGTGGGACCA TAGTGAGGAT 
1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 
1751 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 
1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 
1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 
1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 
1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 
2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 
2 051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 
2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 
2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 
2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 
2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 69 bp to 2084 bp; peptide length: 672 
Category: similarity to known protein 



1 MSDPEGETLR STFPSYMAEG ERLYLCGEFS KAAQSFSNAL YLQDGDKNCL 

51 VARSKCFLKM GDLERSLKDA EASLQSDPAF CKGILQKAET LYTMGDFEFA 

101 LVFYHRGYKL RPDREFRVGT QKAQEAINNS VGSPSSIKLE NKGDLSFLSK 

151 QAENIKAQQK PQPMKHLLHP TKGEPKWKAS LKSEKTVRQL LGELYVDKEY 

201 LEKLLLDEDL IKGTMKGGLT VEDLIMTGIN YLDTHSNFWR QQKPIYARER 

251 DRKLMQEKWL RDHKRRPSQT AHYILKSLED IDMLLTSGSA EGSLQKAEKV 

301 LKKVLEWNKE EVPNKDELVG NLYSCIGNAQ IELGQMEAAL QSHRKDLEIA 

351 KEYDLPDAKS RALDNIGRVF ARVGKFQQAI DTWEEKIPLA KTTLEKTWLF 

401 HEIGRCYLEL DQAWQAQNYG EKSQQCAEEE GDIEWQLNAS VLVAQAQVKL 

451 RDFESAVNNF EKALERAKLV HNNEAQQAII SALDDANKGI IRELRKTNYV 

501 ENLKEKSEGE ASLYEDR1 IT REKDMRRVRD EPEKVVKQWD HSEDEKETDE 

551 DDEAFGEALQ SPASGKQSVE AGKARS DLG A VAKGLSGELG TRSGETGRKL 

601 LEAGRRESRE IYRRPSGELE QRLSGEFSRQ EPEELKKLSE VGRREPEELG 

651 KTQFGEIGET KKTGNEMEKE YE 



BLAST P hits 



Entry AF039202_1 from database TREMBL: 

product: "Hsp70/Hsp90 organizing protein"; Cricetulus griseus 

Hsp70/Hsp90 organizing protein mRNA, complete cd3. 

Score = 149, P = 5.3e-07, identities = 42/160, positives = 74/160 

Entry AI09782_1 from database TREMBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds. 

Score = 155, P = 6.1e-07, identities = 140/623, positives = 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score = 156, P = 9.7e-08, identities = 41/153, positives = 72/153 



Alert BLASTP hits for DKFZphtes3_15hl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15hl , frame 3 

Report for DKFZphtes3_15hl . 3 



[LENGTH] 672 

[MW] 76655.61 

[pi] 5.49 

[HOMOL] PIR:S56658 stress-induced protein stil - 

[SUPFAM] tetratricopeptide repeat homology le-07 

[PROSITE] MYRISTYL 7 

[PROSITE] AMIDATION 3 

[PROSITE] CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 11 

[PROSITE] ASN_GLYCOS YLAT I ON 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4.7 6 % 



soybean 6e-10 



SEQ MSDPEGETLRSTFPSYMAEGERLYLCGEFSKAAQSFSNALYLQDGDKNCLVARSKCFLKM 

SEG 

PRD cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 

SEQ GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 

SEG 

PRD hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 
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SEQ QKAQEAINNSVGSPSSIKLENKGDLSFLSKQAENIKAQQKPQPMKHLLHPTKGEPKWKAS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

SEQ LKSEKTVRQLLGELYVDKEYLEKLLLDEDLIKGTMKGGLTVEDLIMTGINYLDTHSNFWR 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

SEQ QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

SEQ LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

SEG 

PRD hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

SEQ RALDNIGRVFARVGKFQQAIDTWEEKIPLAKTTLEKTWLFHEIGRCYLELDQAWQAQNYG 

SEG 

PRD hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 

SEQ EKSQQCAEEEGDIEWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SEQ SALDDANKGIIRELRKTNYVENLKEKSEGEASLYEDRIITREKDMRRVRDEPEKVVKQWD 

SEG x 

PRD hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 

SEQ HSEDEKETDEDDEAFGEALQSPASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

SEG xxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

SEQ LEAGRRESREI YRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

SEG 

PRD hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 

SEQ KKTGNEMEKEYE 

SEG 

PRD cccccccccccc 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 



288->294 
320->326 
334->340 
590->596 
596->S00 
603->607 
641->645 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AM I DAT I ON 

AMIDATION 

AMIDATION 



PDOCC00 0 8 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 
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DKF2phtes3_15i5 



group: cell structure and motility 

DKFZphtes3_15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins , 

The novel protein is similar to the Chlamydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63. This protein is important for the maintenance of a planar form of sperm flagellar 
beating. In addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 



strong similarity to "radial spokehead" proteins 

complete cDNA, complete cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus: unknown 

Insert length: 2478 bp 

Poly A stretch at pos . 2452, polyadenylation signal at pos. 2433 



1 CACCCTGGCC CGCTCCCCGC GCCCTCCACG GGTAACGGCC CCCTCTCTCG 

51 GTGCTCAGAA ACCGGCGGTG TCGACAGGTG GCTCTCGCTT GGCCTCCTTG 

101 TCTGCAAGCC TTTCTCCTAG AGATCTGTGC CTCCTGGCGA ACCATGGGAG 

151 ACCTGCCGCC CTACCCTGAG CGCCCTGCCC AGCAGCCTCC GGGCCGGAGG 

201 ACTTCTCAGG CCTCCCAGAG GCGGCACAGT CGGGACCAAG CTCAGGCCCT 

251 GGCAGCGGAC CCCGAGGAGA GGCAGCAGAT ACCTCCAGAC GCCCAGCGAA 

301 ACGCCCCTGG TTGGTCACAG AGGGGCAGCC TGTCCCAACA GGAGAACTTG 

351 CTGATGCCCC AGGTCTTCCA GGCTGAGGAA GCCCGGCTGG GTGGCATGGA 

401 GTACCCATCT GTGAACACGG GCTTTCCCTC AGAGTTCCAG CCTCAGCCTT 

451 ACTCTGATGA AAGCAGGATG CAGGTCGCCG AGCTCACCAC CAGCCTAATG 

501 CTGCAGCGGC TCCAGCAGGG CCAAAGCAGC CTGTTCCAGC AACTGGACCC 

551 CACCTTCCAG GAGCCCCCAG TCAACCCCTT GGGCCAGTTC AACCTCTACC 

601 AGACAGACCA GTTCTCTGAA GGTGCCCAGC ACGGGCCTTA CATAAGGGAT 

651 GACCCTGCCC TTCAGTTCTT GCCCTCTGAG CTGGGCTTCC CACACTACAG 

701 TGCCCAGGTG CCTGAGCCCG AGCCTCTGGA GCTGGCCGTG CAGAACGCCA 

751 AGGCCTACCT GCTGCAGACC AGCATCAATT GCGACCTCAG CCTGTACGAG 

801 CACCTGGTAA ATCTGCTGAC CAAGATCCTG AACCAGCGGC CTGAGGACCC 

851 CTTGTCTGTC CTGGAGTCTC TGAACCGCAC CACGCAGTGG GAGTGGTTCC 

901 ACCCCAAGCT GGACACGCTG CGGGACGACC CCGAGATGCA GCCCACCTAC 

951 AAGATGGCGG AGAAACAGAA GGCGCTGTTC ACCCGGAGTG GAGGCGGCAC 

1001 TGAAGGCGAA CAGGAGATGG AGGAGGAGGT GGGGGAGACA CCAGTGCCCA 

1051 ACATCATGGA GACTGCCTTC TACTTCGAGC AGGCCGGCGT CGGCCTGAGC 

1101 TCGGACGAGA GCTTCCGCAT TTTCCTGGCC ATGAAACAGC TGGTGGAGCA 

1151 GCAGCCCATC CACACCTGTC GCTTCTGGGG CAAGATCCTG GGAATCAAAC 

1201 GCAGCTACCT GGTGGCCGAG GTGGAATTCC GGGAGGGCGA GGAGGAGGCA 

1251 GAGGAGGAGG AGGTGGAGGA GATGACGGAA GGTGGCGAGG TCATGGAGGC 

1301 GCACGGCGAG GAGGAGGGCG AGGAGGACGA GGAGAAGGCC GTGGACATCG 

1351 TCCCTAAGTC CGTATGGAAG CCGCCGCCCG TGATCCCCAA GGAGGAGAGC 

1401 CGCTCAGGCG CCAACAAGTA CCTGTACTTT GTGTGCAACG AGCCGGGCCT 

1451 GCCATGGACG CGGCTGCCCC ACGTCACTCC AGCCCAGATC GTGAACGCCC 

1501 GAAAGATCAA GAAGTTCTTC ACAGGCTACC TGGACACGCC AGTCGTCAGC 

1551 TACCCACCCT TCCCGGGCAA CGAGGCCAAC TACCTGCGGG CCCAGATAGC 

1601 CCGCATCTCG GCCGCCACGC AGGTCAGCCC GCTGGGCTTC TACCAGTTTA 

1651 GTGAGGAGGA GGGCGACGAG GAGGAGGAAG GTGGTGCTGG GCGCGACTCC 

1701 TACGAGGAGA ACCCGGACTT CGAGGGCATC CCCGTGCTGG AGCTGGTCGA 

1751 CTCCATGGCC AACTGGGTGC ATCACACACA GCACATCCTG CCGCAGGGCC 

1801 GCTGCACTTG GGTGAACCCT TTGCAGAAGA CAGAGGAGGA GGAGGACCTG 

1851 GGGGAGGAGG AAGAGAAGGC AGATGAGGGG CCAGAGGAGG TGGAGCAGGA 

1901 GGTTGGCCCC CCACTGCTAA CGCCACTTTC AGAAGATGCA GAAATCATGC 

1951 ACCTGGCACC CTGGACCACC CGCCTGTCCT GCAGCCTCTG CCCGCAGTAC 

2001 TCAGTGGCCG TTGTGCGCTC CAACCTCTGG CCCGGGGCCT ATGCCTATGC 

2051 CAGTGGCAAA AAGTTTGAGA ACATCTACAT CGGCTGGGGT CACAAGTACA 

2101 GCCCCGAGAG CTTCAACCCG GCCCTGCCAG CCCCCATTCA ACAAGAGTAC 

2151 CCCAGTGGCC CAGAGATCAT GGAGATGAGT GACCCCACAG TGGAAGAGGA 

2201 GCAGGCTCTG AAAGCAGCCC AGGAACAAGC CCTGGGAGCC ACAGAGGAGG 

2251 AGGAGGAGGG CGAGGAGGAG GAGGAGGGCG AGGAGACAGA TGACTGAGGC 
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2 301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 
2 351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 
2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 
2 451 GCATTAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



No BLAST result 



Medline entries 



86251010: 

Molecular cloning and expression of flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlamydomonas flagella: polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 

Peptide information for frame 3 



ORF from 144 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 



1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR DQAQALAADP EERQQIPPDA 
51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 
101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 
151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 
201 NAKAYLLQTS INCDLSLYEH LVNLLTKILN QRPEDPLSVL ESLNRTTQWE 
251 WFHPKLDTLR DDPEMQPTYK MAEKQKALFT RSGGGTEGEQ EMEEEVGETP 
301 VPNIMETAFY FEQAGVGLSS DESFRIFLAM KQLVEQQPIH TCRFWGKILG 
351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 
401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 
451 NARKTKKFFT GYLDTPVVSY PPFPGNEANY LRAQIARISA ATQVSPLGFY 
501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP VLELVDSMAN WVHHTQHILP 
551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 
601 IMHLAPWTTR LSCSLCPQYS VAVVRSNLWP GAYAYASGKK FENIYIGWGH 
651 KYSPESFNPA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 
701 EEEEEGEEEE EGEETDD 

BLASTP hits 
Entry U73123_l from database TREMBL: 

product: "radial spokehead"; Strongylocentrotus purpuratus radial 
spokehead mRNA, complete cds . 

Score = 1604, P = 7.4e-165, identities = 303/523, positives = 395/523 
Entry B44498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score = 386, P = 3.4e-45, identities = 105/264, positives = 138/264 



Alert BLASTP hits for DKFZphtes3_15i5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_15i5, frame 3 



Report for DKFZphtes3_15i5 . 3 



[LENGTH] 717 

[MW] 80913.61 

[pi] 4.36 
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[HOMOL] TREMBL:U73123_1 product: "radial spokehead" 

radial spokehead mRNA, complete cds . le-130 

[PROSITE] TRANS FERRIN_1 1 

[PROSITE] MYRISTYL 5 

[PROSITE] AMI DAT I ON 2 

[ PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 14 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 21.48 % 



Strongylocentrotus purpuratus 



SEQ MGDLPPYPERPAQQPPGRRTSQASQRRHSRDQAQALAADPEERQQIPPDAQRNAPGWSQR 

SEG . . . . xxxxxxxxxxxx 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccccccccccccc 

SEQ GSLSQQENLLMPOVFQAEEARLGGMEYPSVNTGFPSEFOPOPYSDESRMQVAELTTSLML 

SEG xxxx 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhhhhhh 

SEQ QRLQQGQSSLFQQLDPTFQEPPVNPLGQFNLYQTDQFSEGAQHGPYIRDDPALQFLPSEL 

SEG xxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GFPHYSAQVPEPEPLELAVQNAKAYLLQTSINCDLSLYEHLVNLLTKILNQRPEDPLSVL 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhccccchhhh 

SEQ ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

SEG xxxxxxxxxxxxxxxx . . 

PRD hhhchhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 

SEQ VPNIMETAFYFEQAGVGLSSDESFRIFLAMKQLVEQQPIHTCRFWGKILGIKRSYLVAEV 

SEG xxx 

PRD ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhhhhhcccchhhhhhhh 

SEQ EFREGEEEAEEEEVEEMTEGGEVMEAHGEEEGEEDEEKAVDIVPKSVWKPPPVIPKEESR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeecccccccccccccccccc 

SEQ SGANKYLYFVCNEPGLPWTRLPHVTPAQIVNARKIKKFFTGYLDTPWSYPPFPGNEANY 

SEG 

PRD cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhcccccccccccccccchhhh 

SEQ LRAQIARI SAATQVSPLGFYQFSEEEGDEEEEGGAGRDSYEENPDFEGI PVLELVDSMAN 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecchhhh 

SEQ WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

SEQ IMHLAPWTTRLSCSLCPQYSVAVVRSNLWPGAYAYASGKKFENI YIGWGHKYSPESFNPA 

SEG 

PRD cccccccccccccccccccceeeeeeccccceeeecccccceeeeeeccccccccccccc 

SEQ LPAPIQQEYPSGPEIMEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

SEG xxxxxxxxxxxxxx. . .xxxxxxxxxxxxxx. . . 

PRD cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
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DKFZphtes3_15jl8 



group: testes derived 

DKFZphtes3_15 j 18 encodes a novel 148 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 



complete cDNA, complete cds, few EST hits 



Sequenced by GBF 
Locus : unknown 



Insert length: 905 bp 

Poly A stretch at pos . 839, polyadenylation signal at pos . 815 



1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 

51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGGAGGG CACAGCTTGT 

101 CCAGCTGGGA TGTTTGGGTG CCCTGTGAGA TGCCCCAAGC CACCAACCCA 

151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 

201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGCTTAG AATCTCTTGG 

251 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 

301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 

351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 

401 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 

4 51 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 

501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 

551 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 

601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 

651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 

701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 

751 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 

801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 

851 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACGCGAAAA 

901 AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 553 bp; peptide length: 148 
Category: putative protein 



1 MFGCPVRCPK PPTQLISGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15j 18, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15jl8, frame 2 
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Report for DKFZphtes3_15 j 18 . 2 



[LENGTH] 

[MW] 

[pi] 

[PROSITE] 
[PROSITE] 
[KW] 



148 

15665.78 
8 . 91 

MYRISTYL 3 
CK2_PHOSPHO_SITE 
Irregular 



SEQ MFGCPVRCPKPPTQLISGEASAARLPAWRDVLQQPGVGGEGGLRISWQGAPKSRVRPAFI 

PRD cccccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 

SEQ SPVPFTVLQSQHYHPFSEGVGTQVECLTPVLRLESDMARTAPHPSSLHPFPAWDSSSPVH 

PRD cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 

SEQ CGAPLPSAHGGFPRARAEGSWSQPGAGS 

PRD cccccccccccccccccccccccccccc 



Prosite for DKFZphtes3_15j 18 . 2 



PS00006 82->86 CK2_PHOSPHO_SITE PDOC00006 

PS00008 38->44 MYRISTYL PDOC03008 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 49->55 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_15] 18 . 2) 
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DKFZphtes3_15j3 



group: nucleic acid management 

DKFZphtes3_15j3 encodes a novel 743 amino acid protein with similarity to proteins with 
unknown function. 

The novel protein contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR27Sc, a ribonuclease H of S. cerevisiae. Thus, the protein 
seems to a new RNA-modificating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations . 



"4 4M2.3"; product, differences to genmodel, similarity to ribonuclease 
H 

complete cDNA, complete cds, EST hits 
YGR27 6c = ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

Locus: /map="16pll . 2" 

Insert length: 2695 bp 

Poly A stretch at pos . 2601, polyadenylation signal at pos . 2579 



1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 
51 GCCCGTCTTC TCTTCTGCGT TTCCCGGGCT AGGGGGCGTG GGGAGTGGTT 
101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 
151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 
201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 
251 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 
301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 
351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 
401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 
451 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 
501 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 
551 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 
601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 
651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 
701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 
751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 
801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 
851 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 
901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 
951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 
1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 
1051 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 
1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 
1151 ATACATCCAT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 
1201 CAGAAGATTT AAGCTCAAGT TCTTAGCCAA AGTTATTTTG GGGAAGGATA 
1251 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 
1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 
1351 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 
1401 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 
1451 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 
1501 GGAGACAGAT GCTGGTGAAC TTCCATCTTC C AGAAATTGT CAAACTATTA 
1551 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 
1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 
1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 
1701 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 
1751 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 
1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 
1851 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 
1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 
1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 
2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 
2 051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 
2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 
2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 
2201 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 
2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 
2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC TCATGGTTCA 
2 351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 
2 401 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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2451 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 

2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 

2551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 

2601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 188 bp to 2416 bp; peptide length: 743 
Category: similarity to known protein 



1 MEPEREGTER 

51 STILFTDNCE 

101 VLQGMSQLHF 

151 GDLPKTMEGP 

201 HFPLQGFPDC 

251 AEGGCCVMDE 

301 LPPDAVLVGH 

351 LGKDIQCPDR 

401 IQAAGQEPKN 

4 51 CQTIKCLSNK 

501 WTEISTVYAG 

551 TLDCDTLVNE 

601 KDLKSGKQKK 

651 LRGLPPESTR 

701 SLCPGTLCLI 



HPRKVRESRQ 
VTHDQLCELL 
YRFYLEFGCL 
LPSNAKAAIN 
ENFLLTKCNG 
LVKPENKILD 
SLDLDLRALK 
LGHDATEDAR 
TAEVLQHPNT 
EVLEQARVEI 
PFSKNCNLRA 
LEGDSENQGS 
YCFLKFKSFG 
LPGLRVVPPP 
LLPGTKSTHG 



APNKLVGAAE 
KYAVLGKSNV 
RKAFRHKFRL 
LQDDPIIQKY 
SI ADNSPLFG 
YLTSFSGITK 
MIHPYVIDTS 
TILELARYFL 
SVLECLDSVG 
PLFPFSIVQF 
LKRLFKSFGP 
IYLSGVSETF 
SAQQALNILT 
FEQEALQTLK 
SLSGLGLMGI 



AMKAGWDLEE 
PKPSWCQLFH 
PPPSSDFLAD 
GSKKVGLTRC 
LDCEMCLTSK 
KILNPVTTKL 
LLYVREOGRR 
KHGPKKI AEL 
QKLLFLTRET 
SFKAFSPVLT 
VQSMTFVLET 
KEQLLQEPRL 
GKDWKLKGRH 
LDHPKIAAWR 
KEEEESAGPG 



SQPEAKKARL 
QNHLNNVVVF 
VVGLQTEQRA 
LLTKEEMRTF 
GRELTRISLV 
KDVQRQLKAL 
FKLKFLAKVI 
NLEALANHQE 
DAGELPSSRN 
EEMNKRMRIK 
RQVQRPVTEL 
FLGLEAVILP 
ALT PP.HLHAW 
WSRKIGKLYN 
LCS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_15j3, frame 2 

TREMBL : AC004 38 1_4 gene: "44M2.3"; product: "Unknown gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2, complete sequence., 
N = 2, Score = 1827, P = 2.1e-284 

TREMBL :AF01 64 30_4 gene: "C05C8.5"; Caenorhabditis elegans cosmid 
C05C8., N = 2, Score = 370, P = 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 334, P = 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPAC637 . 09 " ; product: "putative 
exonuclease"; S . pombe chromosome I cosmid c637., N = 3, Score = 326, P 
= 2.8e-27 



>TREMBL : AC004 38 1_4 gene: "44M2.3"; product: "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2 , complete sequence. 
Length = 547 

HSPs: 

Score = 1827 (274.1 bits), Expect = 2.1e-284, Sum P(2) = 2.1e-284 
Identities = 358/373 (95%), Positives = 358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADWGLQTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 224 

AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct: 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query. 


225 




269 






NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 




Sbjct: 


121 


NSPLFGLDCEMARTTFNFSIGVLQAECLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 


180 


Quer 

y. 


270 


DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 


329 






DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 




Sbjct: 


181 


DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 


240 


Qusry ; 


330 


SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 


389 






SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 




Sbjct: 


241 


SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 


300 


Query: 


390 


LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 


449 






LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 




Sbjct: 


301 




360 


Query: 


450 


NCQT I KCLSNKEV 462 








NCQT I KCLSNKEV 




Sbjct: 


361 






Score 


= 929 


(139.4 bits), Expect = 2.1e-284, Sum P(2) = 2.1e-284 




Identities = 


= 175/179 (97%), Positives = 177/179 (98%) 




Query: 


538 


LETRQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 


597 




L ++VQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 




Sbjct: 


368 


LSNKEVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAV 


427 


Query: 


598 


ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 


657 




ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKOWKLKGRHALTPRHLHAWLRGLPPE 




Sbjct: 


428 


TLPKDLKSGKQKKYCFLKFKS FGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 


487 



Query: 658 STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 716 

STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 
Sbjct: 488 STRLPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTK 546 



Pedant information for DKFZphtes3_15j3, frame 2 



Report for DKFZphtes3_15j3.2 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

Chromosome 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

YGL094C] le 

[FUNCAT] 

cerevisiae, 

[FUNCAT] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PFAM] 

[KW] 



743 

83536.58 
8.87 

TREMBL : AC004 38 1_4 gene: "44M2.3"; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2 , complete sequence. 0.0 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 4e-30 

99 unclassified proteins [S. cerevisiae, YLR107w] 3e-13 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

10 

04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) [S. 
YGL094C] le-10 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c] 2e-10 
MYRISTYL 5 



AMIDATION 1 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_SITE 

ASN_GLYCOSYLATI0N 

RNA recognition motif. 

Alpha_Beta 



8 
1 
1 

16 
2 

(aka RRM, RBD, or RNP domain) 



SEQ MEPEREGTERHPRKVRESRQAPNKLVGAAEAMKAGWDLEESQPEAKKARLSTILFTDNCE 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ VTHDQLCELLKYAVLGKSNVPKPSWCQLFHQNHLNNVVVFVLQGMSQLHFYRFYLEFGCL 

PRD eehhhhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

SEQ RKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSNAKAAINLQDDPI IQKY 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

SEQ GSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIADNSPLFGLDCEMCLTSK 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

SEQ GRELTRISLVAEGGCCVMDELVKPENKILDYLTSFSGITKKILNPVTTKLKDVQRQLKAL 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLRALKMIHPYVIDTSLLYVREQGRRFKLKFLAKVILGKDIQCPDR 

PRD hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTIKCLSNKEVLEQARVEI PLFPFSI VQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFSPVLTEEMNKRMRIKWTEISTVYAGPFSKNCNLRALKRLFKSFGPVQSMTFVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRVVPPPFEQEALQTLKLDHPKIAAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhlihheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGIKEEEESAGPGLCS 

PRD cccccccchhhhhhccccccccc 



Prosite for DKFZphtes3_15j3 . 2 



PS00001 


219- 


>223 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


419- 


>423 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


723- 


>727 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


E 


:->ll 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


182- 


>185 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


238- 


>241 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


279- 


>282 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


287- 


>290 


PKC PHOSPHO"" 


"site 


PDOC00005 


PS00005 


447- 


>450 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


453- 


■>456 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


458- 


>461 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


481- 


>484 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


579- 


>582 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


605- 


>608 


PKC PHOSPHO^ 


"site 


PDOC00005 


PS00005 


630- 


•>633 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


643- 


>646 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


658- 


■>661 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


678- 


•>681 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


692- 


•>695 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


41 


->45 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


193- 


•>197 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


221- 


•>225 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


371- 


>375 


CK2 PHOSPHO" 


"site 


PDocooooe 


PS00006 


421- 


>425 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


458- 


>462 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


630- 


>634 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


370- 


>379 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


27 


->33 


MYRISTYL 




PDOC00008 


PS00008 


186- 


>192 


MYRISTYL 




PDOC00008 


PS00008 


575- 


■>581 


MYRISTYL 




PDOC00008 


PS00008 


714- 


>720 


MYRISTYL 




PDOC00008 


PS00008 


720- 


■>726 


MYRISTYL 




PDOC00008 


PS00009 


337- 


>341 


AMI DAT I ON 




PDOC00009 



Pfam for DKFZphtes3_15 j3 . 2 



HMM_NAME 

HMM 

Query 

HMM 

Query 



RNA recognition motif, (aka RRM, RBD, or RNP domain) 

*IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
IY+ +++ +T +E+L + + F + + + +++D G+ + ++F +F++ 
571 IYLSGVS-ETFKEQLLQEPRLFLGLEAVILPKDLKSGKQKKYCFLKFKS 



EEDAe kAI deMNG . .meFmGRrlRV* 
+A+ A+ + G ++ GR + 
619 FGSAQQALNILTGKDWKLKGRHALT 
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DKFZphtes3_15kll 



group: signal transduction 

DKFZphtes3_15kll encodes a novel 958 amino acid protein C-terminal identical with human 
KIAA0781 protein and high similarity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 
serine/threonine protein kinase active-site signature. The related murine kinase was cloned 
from the myocardium of the developing heart. 

The new protein can find application in modulation of intracellular signal pathways dependent 
on this kinase. 

KIAA0781, 5' extension 

complete cDNA, complete cds , potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /map="ll" 
Insert length: 4868 bp 

Poly A stretch at pos . 4798, polyadenylation signal at pos. 4776 



1 GAGCAAGCGG AGCGGCCGTC GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 
51 CGCCCGCCCG CGCTCCTGTC CGCCGTGTCT AGCAGCGGGG CCCAGCATGG 
10 1 TCATGGCGGA TGGCCCGAGG CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 
151 TTCTACGACA TCGAGGGCAC GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 
201 GCTGGGGCGG CACCGGATCA CCAAGACGGA GGTGGCAATA AAAATAATCG 
251 ATAAGTCTCA GCTGGATGCA GTGAACCTTG AGAAAATCTA CCGAGAAGTA 
301 CAAATAATGA AAATGTTAGA CCACCCTCAC ATAATCAAAC TTTATCAGGT 
351 AATGGAGACC AAAAGTATGT TGTACCTTGT GACAGAATAT GCCAAAAATG 
401 GAGAAATTTT TGACTATCTT GCTAATCATG GCCGGTTAAA TGAGTCTGAA 
451 GCCAGGCGAA AATTCTGGCA AATCCTGTCT GCTGTTGATT ATTGTCATGG 
501 TCGGAAGATT GTGCACCGTG ACCTCAAAGC TGAAAATCTC CTGCTGGATA 
551 ACAACATGAA TATCAAAATA GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 
601 AGTGGTGAAC TGCTGGCAAC ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 
651 AGAAGTCTTT GAAGGGCAGC AGTATGAAGG ACCACAGCTG GACATCTGGA 
701 GTATGGGAGT TGTTCTTTAT GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 
751 GGACCGACTC TTCCAATTTT GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 
801 GATTCCGTAT TTCATGTCAG AAGATTGCGA GCACCTTATC CGAAGGATGT 
851 TGGTCCTAGA CCCATCCAAA CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 
901 AAATGGATGC TCATAGAAGT TCCTGTCCAG AGACCTGTTC TCTATCCACA 
951 AGAGCAAGAA AATGAGCCAT CCATCGGGGA GTTTAATGAG CAGGTTCTGC 
1001 GACTGATGCA CAGCCTTGGA ATAGATCAGC AGAAAACCAT TGAGTCTTTG 
1051 CAGAACAAGA GCTATAACCA CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 
1101 GCGCCTGAAA TCACATCGGA GCAGTTTCCC AGTGGAGCAG AGACTTGATG 
1151 GCCGCCAGCG TCGGCCTAGC ACCATTGCTG AGC A AACAGT TGCCAAGGCA 
1201 CAGACTGTGG GGCTCCCAGT GACCATGCAT TCACCGAACA TGAGGCTGCT 
1251 GCGATCTGCC CTCCTCCCCC AGGCATCCAA CGTGGAGGCC TTTTCATTTC 
1301 CAGCATCTGG CTGTCAGGCG GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 
1351 GACACTCCAA AGGTCAATGG CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 
1401 GGTGCGGAAG GGATGCCAGT CACTGCCCAG CAACATGATG GAGACCTCCA 
1451 TTGACGAAGG GCTGGAGACA GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 
1501 GCCTTTGAGG CATTTCAGTC CACACGCAGC GGGCAGAGAC GGCACACTCT 
1551 GTCAGAAGTG ACCAATCAAC TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 
1601 TCTCCATGAA TGACAGCCCC TCCCTTGACA GTGTGGACTC TGAGTATGAT 
1651 ATGGGGTCTG TTCAGAGGGA CCTGAACTTT CTGGAAGACA ACCCTTCCCT 
1701 TAAGGACATC ATGTTAGCCA ATCAGCCTTC ACCCCGCATG ACATCTCCCT 
1751 TCATAAGCCT GAGACCTACC AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 
1801 AAACGAGAGG TCCACAACAG GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 
1851 AGCATCAGAT ACCTCCCTCA CCCAGGGAAT TGTAGCATTT AGACAACATC 
1901 TTCAGAATCT GGCTAGAACC AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 
1951 TTGTTGTATG AACAAATAGG ACCGGAGGCA GACCCTAACC TGGCGCCGGC 
2001 GGCTCCTCAG CTCCAGGACC TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 
2051 CTCAGCAGCA GGAAAGCGTC TCCACTCTCC CTGCCAGCGT GCATCCCCAG 
2101 CTGTCCCCAC GGCAGAGCCT GGAGACCCAG TACCTGCAGC ACAGACTCCA 
2151 GAAGCCCAGC CTTCTGTCAA AGGCCCAGAA CACCTGTCAG CTTTATTGCA 
2201 AAGAACCACC GCGGAGCCTT GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 
2251 CAGAAGCGAC TCTTTCTTCA GAAGCAGTCT CAACTGCAGG CCTATTTTAA 
2301 TCAGATGCAG ATAGCAGAGA GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 
2351 CCCTTCCCCG CCAGGAGACT CCACCGCCTT CTCAGCAGGC CCCACCGTTC 
2 401 AGCCTGACCC AGCCCCTGAG CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 
2451 GCAATACAGC CCTTTCCTCA GCCAGTACCA AGAGATGCAG CTTCAGCCCC 
2 501 TGCCCTCCAC TTCCGGTCCC CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 
2551 CAGCAGCAGC AGCCGCCACC GCCACCACCC CCTCCACCAC CACGACAGCC 
2 601 AGGAGCTGCC CCAGCCCCCT TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2 651 CAAGCGCTGC TTCCCCTGCG CCAGACTATC CCACTCCCTG TCAGTATCCT 
2701 GTGGATGGAG CCCAGCAGAG CGACCTAACG GGGCCAGACT GTCCCAGAAG 
2751 CCCAGGACTG CAAGAGGCCC CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
2801 AGCTACCTGG ACTCTTTGAT TGTGAAATGC TAGACGCTGT GGATCCACAA 
2851 CACAACGGGT ATGTCCTGGT GAATTAGTCT CAGCACAGGA ATTGAGGTGG 

2 901 GTCAGGTGAA GGAAGAGTGT ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
2951 TTTAAAGCTT ATTTTCTTGC CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
3001 CCAACTGGAA TCAGAGGGTC TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
3051 TCTGCCCCAC CACAAAGTTT TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
3101 GCTGAGGCTC CTGCCCTTCG GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
3151 CTGACAAATG TGTTCCTAAG AAGACATTCA GACCCAGGTC TTATGCAGGA 
3201 TTACATCCGT TTATTATCAA GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
3251 GTGCTATTGC ATATATATGG GGGAAAAGGC AATATATTTT TCACTGAAGC 
3301 TGAGCAACCA CATATTGCTA CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
3351 AGATGCACAG GAAATAAAGG AAAGCTGTGC TTTGTCATTG AATCCTAAGT 

3 401 TCTTAGCTGC TGATGCAAGT TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
3451 GCATGAGCTG TGTTTCAGGG GCCACTAAAT AACAGCTGGT ACTGACCCCA 
3501 GAAACCGCCT TCATCTCCAT TCGGAAGCAG GTGACACACC CCTTCAGAAG 
3551 GTGCCCTGGG TTGCCGAGTG T C AGAAT AT A CTCAGGACTC CAGAGGTGTC 
3601 ACACGTGGAA CTGACAGGAG ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
3 651 ACTCAAGAAC GCATCAAGAG CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
3701 TTCCTGCAGT TTCTCGTGGA CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
3751 GGGTACCTGT TGTCTCTTTT CCGATGTAAT AACTACTTTG ACCTTACACT 
3801 ATATGTTGCT AGTAGTTTAT TGAGCTTTGT ATATTTGGAC AGTTTCATAT 

3 851 AGGGCTTAGA GATTTTAAGG ACATGATAAA TGAACTTTTC TGTCCCATGT 
3901 GAAGTGGTAG TGCGGTGCCT TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
3951 TTCTGTAGAA ACCAACAGTT TCCATTTATG TCAATGCTAA ATCCAAAGTC 

4 001 ACTTCAGAGT TTGTTTTCCA CCATGTGGGA ATCAGCATTC TTAATTTCGT 
4051 TAAAGTTTTG ACTTGTAATG AAATGTTCAA GTATTACAGC AATATTCAAA 
4101 GAAAGAACCA CAGATGTGTT AACCATTTAA GCAGATCATC TGCCAAACAT 
4151 TATATTACTA ATAAAACTTA ACCAACACTT ACAATTCAGT CATCAAAGTA 
4 201 AGTAAAAATT AGATGCTACA GCTAGCTAAC TGTATCCCTA GAAATGATGA 
4251 ATAATTTGCC ATTTGGACAG TTAACATCCA GGTGTTACAA AGTCAGTGTT 
4301 AATTCTAAAG ATGATCATTT CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
4351 AGATGAATGT GTTAAGCACA AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
4 401 ACTAACTGAT GCTGCATCTA GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
4 451 GTAGTTAGCG TTCAGGCAGG TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
4501 CTGGCCATGC GAGCCCAGCT CCTACCAACG TCGGTAACTT GAGCAGTCCC 
4 551 TGTTGCTGGC CAGAGACTGC CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
4 601 GATGCTTCGC AGAGGCACTG TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
4 651 AAGGGCAGTG TGGGGACTGT CATTTTTGTG ATTTAATAAC ACACAGTGAA 
4701 AATCCAGGAA GAATGAATTA AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
4 751 CGTGCTTAAG ATTGATGATT TCGTGAAATA AAGAACATCA TTTCATTTAA 
4801 AAAAAAAAAA AAAAAAAGGG CGGCCGCTCT AGAGGATCCA AGCTTACGTA 
4851 CGCGTGAAAA AAAAAAAG 



BLAST Results 



Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score = 1605, P = 1.9e-66, identities = 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens mRNA for KIAA0781 protein, partial cds. 
Score = 10725, P = 0.0e+00, identities = 2145/2145 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 2874 bp; peptide length: 959 
Category: known protein 



1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADGPR HLQRGPVRVG 
51 FYDIEGTLGK GNFAVVKLGR HRITKTEVAI KIIDKSQLDA VNLEKIYREV 
101 QIMKMLDHPH IIKLYQVMET KSMLYLVTEY AKNGEIFDYL ANHGRLNESE 
151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNMNIKI ADFGFGNFFK 
201 SGELLATWCG SPPYAAPEVF EGQOYEGPQL DIWSMGWLY VLVCGALPFD 
251 GPTLPILRQR VLEGRFRIPY FMSEDCEHLI RRMLVLDPSK RLTIAQIKEH 
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301 KWMLIEVPVQ RPVLY PQEQE NEPSIGEFNE QVLRLMHSLG IDQQKTIESL 
351 QNKSYNHFAA I YFLLVERLK SHRSSFPVEQ RLDGRQRRPS TIAEQTVAKA 
401 QTVGLPVTMH SPNMRLLRSA LLPQASNVEA FSFPASGCQA EAAFMEEECV 
451 DTPKVNGCLL DPVPPVLVRK GCQSLPSNMM ETSIDEGLET EGEAEEDPAH 
501 AFEAFQSTRS GQRRHTLSEV TNQLVVMPGA GKIFSMNDSP SLDSVDSEYD 
551 MGSVQRDLNF LEDNPSLKDI MLANQPSPRM TSPFISLRPT NPAMQALSSQ 
601 KREVHNRSPV SFREGRRASD TSLTQGIVAF RQHLQNLART KGILELNKVQ 
651 LLYEQIGPEA DPNLAPAAPQ LQDLASSCPQ EEVSQQQESV STLPASVHPQ 
701 LSPRQSLETQ YLQHRLQKPS LLSKAQNTCQ LYCKEPPRSL EQQLOEHRLQ 
751 QKRLFLQKQS QLQAYFNQMQ IAESSYPQPS QQLPLPRQET PPPSQQAPPF 
801 SLTQPLSPVL EPSSEQMQYS PFLSQYQEMQ LQPLPSTSGP RAAPPLPTQL 
851 QQQQPPPPPP PPPPRQPGAA PAPLQFSYQT CELPSAASPA PDYPTPCQYP 
901 VDGAQQSDLT GPDCPRSPGL QEAPSSYDPL ALSELPGLFD CEMLDAVDPQ 
951 HNGYVLVN 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15kll, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15kll, frame 1 



Report for DKFZphtes3_15kl 1 . 1 



[LENGTH] 92 6 

[MW] 103915.77 

[pi] 5.70 

[HOMOL] TREMBL : ABO 18 324_1 gene: 

mRNA for KIAA0781 protein, partial cds . 



'KIAA0781' 
0.0 



[ FUNCAT ] 

8e-76 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

3e-56 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[ FUNCAT ] 

[ FUNCAT ] 

[FUNCAT] 

[ FUNCAT ] 

repair) 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT J 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT) 

[FUNCAT] 

YPL031C] 

[FUNCAT] 

le-23 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 



01.05.04 regulation of carbohydrate utilization 

11.01 stress response [S. cerevisiae, YDR477w] 8e-76 
30.03 organization of cytoplasm [S. cerevisiae, 

98 classification not yet clear-cut [S. cerevisiae. 



product: "KIAA0781 protein"; Homo sapiens 
[S. cerevisiae, YDR477w] 



YDR477w] 8e-76 
YCL024W] 4e-58 



03.25 cytokinesis 



[S. cerevisiae, YDR507c] 3e-56 



03.04 budding, cell polarity and filament formation [S. cerevisiae, 



YDR507C] 
le-53 



30.02 organization of plasma membrane [S. cerevisiae, YDR122w] 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 3e-53 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 3e-53 
99 unclassified proteins [S. cerevisiae, YPL141c] 5e-51 

03.19 recombination and dna repair [S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 3e-42 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 3e-42 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPL153c] 3e-42 



03.01 cell growth 



[S. cerevisiae, YFR014c] 5e-42 



03.16 dna synthesis and replication 
03.10 sporulation and germination 
08 

06 

10.02.11 key kinases [S. cerevisiae, YBL105c] 3e-26 



(S. cerevisiae, YMROOlc] 2e-34 
[S. cerevisiae, YGL180w] le-27 
13 vacuolar transport [S. cerevisiae, YGL180w] le-27 

13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGLlBOw] 



le-27 



04 



99 other transcription activities [S. cerevisiae, YER129w] 3e-26 

trehalose) [S. cerevisiae, 



le-23 



[S. cerevisiae, YPL031c] 
YPL03 lc ] le-23 



[S. 



[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
3e-19 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
4e-18 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
YNL183C] 2e-14 



02.19 metabolism of energy reserves (glycogen 

01.04.04 regulation of phosphate utilization 

04.05.01.04 transcriptional control [S. cerevisiae 
03.13 meiosis [S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c] 8e-21 

03.07 pheromone response, mating-type determination, sex-specific proteins 
evisiae, YHL007c] 8e-21 

09.01 biogenesis of cell wall [S. cerevisiae, YPL140c] 2e-20 

10.03.11 key kinases [S. cerevisiae, YLR113w] 7e-20 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 



10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 2e-18 
10.04.11 key kinases [S. cerevisiae, YLR362w] 3e-lB 
04.03.99 other trna-transcription activities 



[S. cerevisiae, YOR061w] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) ' [S. cerevisiae, YFL033c] 4e-17 

05.07 translational control [S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 
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[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL183c] 

2e-14 

[ FUNCAT ] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020c] 5e-14 

[FUNCAT] c energy conversion [M. genitalium, MG109] 2e-12 

[FUNCAT] 30.09 organization of intracellular transport vesicles [S. cerevisiae, 

YBR097w] le-10 

[ FUNCAT ] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YBR097w] 

le-10 

[FUNCAT] 30.08 organization of golgi [S. cerevisiae, YBR097w] le-10 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 
le-10 

[ FUNCAT ] 10.04.99 other nutritional-response activities [S. cerevisiae, YJR059w] 

4e-09 

[FUNCAT] 01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
cerevisiae, YHR079c] le-07 

[FUNCAT] 30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

le-07 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YNL154c] 2e-04 

[BLOCKS] BL00415A Synapsins proteins 

[BLOCKS] BL00239B Receptor tyrosine kinase class II proteins 

[BLOCKS] BL00107A Protein kinases ATP-binding region proteins 

[SCOP] dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-78 

[SCOP] dlwfc 5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) le-81 

[SCOP] dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 5e-89 

[SCOP] dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har 5e-86 

[SCOP] dlphk 5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 3e-80 

[SCOP] dlirk 5.1.1.2.4 insulin receptor [Human (Homo sapiens) 6e-70 

[SCOP] dlapme_ 5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-95 

[SCOP] dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn 7e-71 

[SCOP] dlydse_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 2e-96 

[SCOP] dlfmk_3 5.1.1.2.2 (168-437) c-src tyrosine kinase [human (Horn 2e-72 

[SCOP] dlcdka_ 5.1.1.1.2 cAMP-dependent PK, catalytic subunit [pig (Su 5e-97 

[SCOP] d2hckb3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huma 2e-68 

[SCOP] dlcsn 5.1.1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 3e-53 

[SCOP] dljsua_ 5.1.1.1.1 Cyclin-deper.dent PK [Human (Homo sapiens) 3e-78 

[SCOP] dlckia_ 5.1.1.1.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) le-58 

[EC] 2.7.1.117 Myosin-light-chain kinase 3e-49 

[EC] 2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] kinase 4e-78 

[EC] 2.7.1.38 Phosphorylase kinase 3e-41 

[EC] 2.7.1.37 Protein kinase 7e-45 

[EC] 2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

[EC] 2.7.1.128 [Acetyl-CoA carboxylase) kinase 4e-78 

[PIRKW] phosphotransferase 3e-93 

[PIRKW] nucleus 2e-74 ' 

[PIRKW] calcium 2e-40 

[PIRKW] transferase 3e-33 

[PIRKW] duplication 2e-32 

[PIRKW] tandem repeat 7e-45 

[PIRKW] phorbol ester binding 4e-33 

[PIRKW] zinc 4e-33 

[PIRKW] ion transport le-32 

[pirkw] cell cycle control le-45 

[PIRKW] serine/threonine-specif ic protein kinase 2e-97 

[PIRKW] oncogene le-34 

[PIRKW] phospholipid binding 2e-32 

[PIRKW] autophosphorylation 2e-74 

[PIRKW] brain 6e-36 

[PIRKW] heterotetramer Se-38 

[PIRKW] mitosis le-45 

[PIRKW] polymer 5e-41 

[PIRKW] magnesium 6e-80 

[PIRKW] ATP 2e-97 

[PIRKW] polyprotein le-34 

[PIRKW] alternative initiators 2e-31 

[PIRKW] phosphoprotein 2e-74 

[PIRKW] apoptosis 8e-38 

[PIRKW] cGMP binding 4e-33 

[PIRKW] glycoprotein 3e-36 

[PIRKW] skeletal muscle 8e-38 

[PIRKW] protein kinase 2e-50 

[PIRKW] testis 5e-41 

[PIRKW] CAMP binding 8e-38 

[PIRKW] transforming protein 4e-33 

[PIRKW] purine nucleotide binding 7e-52 

[PIRKW] calcium binding 7e-45 

[PIRKW] alternative splicing 5e-42 

[PIRKW] P-loop 7e-52 

[PIRKW] lipoprotein 8e-38 

[PIRKW] proto-oncogene 4e-33 

[PIRKW] segmentation le-34 

[PIRKW] core protein le-34 
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T PTRTCW 1 
L r i rvi\ ir¥ j 


n\u scls 8 s — 38 


r pTRKWl 


mvTT t V 1 ^1*1 rin Oo-^O 
Hiy J- J- o i_.yj.ci L. x i_Jl l D c JO 


[ P I RKW ] 


EF hand 7e-45 


[ P I RKW ] 


cell division 3e — 49 


[ P I RKW ] 


homodiiuer le — 32 


[ PI RKW ] 


r^lmodnl in hi nHi nn Sp- 47 

L,UXLU*JUU J L 11 U J_ 1 1 LA _1_ 1 1 LJ JC i <-- 


[SUPFAM] 


ribosomal protein S6 kinase II le - 34 


[ SUPFAM] 


calcium - depsndsnt protein kinase 7e — 45 


r qnppAMl 


MD-a rt l vat"pd n T o "t~ p "i n V i n ^ ^ p 6 P — R f) 


[SUPFAM] 


protein kinase akt 3e-36 


TSUPFAM1 


nrotpi n ki nfl^p SPKl 7p — 41 


[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 8e-99 


[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 5e— 42 


[ SUPFAM] 


ca lmodu lin repeat homo logy 7 e~ 4 5 


f QUPFAMl 

L O v IT 1 rtl 1 J 


cAMP receptor protein cyclic nucleotide - bi ndi nQ domain homol ocjy 3 e — 3 3 


[ SUPFAM] 


protein kinase DUNl 6e — 3 6 




nrrvfoi n V "i n .a op C ?ot"a io-n 
^i. ULClll MJlaoC L> ->C La Jc J _> 


[SUPFAM] 


Dictyostelium c AMP- dependent protein kinase catalytic chain 2e - 34 




dp ^ t* h — a.cenr , i^ , t~pfi rrrrifpi n lc i n A i p 8 p — 3 ft 

Us^ CI oaoU^laLCLJ |JLULCi.Il Milage <-> »J U 


r SUPFAM 1 


pleckstrin repeat homology 3e — 36 


r SUPFAM 1 


Vuti n ypnpat" h oirin lorrv p — 3 fl 

oxi^y i. ±11 i c^ca l ii uiiilj J,u^y uc j u 




protein kinase homol oy*y 8 e — 9 9 


[ SUPFAM] 


Ca2 + /calmodulin - dependen t protein kinase II 6e — 38 


r SlTPFZiM 1 


pi ULClII )VJ.i]ci JC; V_, iJ.ll(_ U_L 1 1U J- 1 1 y J_ trJJtrcl L. 1H_)1111J_L UL) y 1 C O -J> 


f SUPFAM 1 


protein kinase C delta 2e — 32 


r SUPFAMl 
L our r nu j 






nrnl"P i n Vi sip rHfl 1 a — Zl "i 

[Jl ULCJ.ll MllaDC J — U J. <3 1J 




H naoo-rol a t* pH t~ v a n c ■F/"i r~m i nn nt* i"it" p i n Tp-R fl 

MllaoC J_ C _L d L. cu L. J. al la J_ J. lll-L 1 iy pi ULCJ.lt C J LI 


f SUPFAM 1 


r , ^2+/r~fllmriHii1 i n-HpnpnHpnt' n Tot p i n lei n ?* ^ p T ftp — 4? 

L> d i r CI J- 1 L1W LI LI J L 1 1 LJ^p^llLA^llL> p 1 LJ L C ± 11 f. 1 1 1 u J C J- U G i £. 




H riaco i nt*oi*3i"t' i An H r\m ai n Vi^Tnlr-irfw "7 P — £ 1 

Mlla ±11 Lc i aL. L 1L>1I H Ul tld _L 11 llLJ.LiLJ-L.Ljyy / C J. 




y a y aKL poxypjiOLcxil ±c 


r ppnt; t Tin 


PRDTrTM KTMZVSF HTP 1 


r PR0SITE1 


MYRISTYL 3 


[PROSTTE] 


AM I DAT I ON 2 






[PROSITE] 


CK2 PHOSPHO SITE 15 


[PROSITE] 


TYR _ PHOSPH0 _ SITE 2 


[PROSITE] 


PKC PHOSPHO SITE 10 


[PROSITE] 


ASN GLYCOSYLATION 2 


[PROSITE] 


PROTE I N_K I NASE_ST 1 


[PFAM] 


Eukaryotic protein kinase domain 


[KW] 


Irregular 


[KW] 


3D 


[KW] 


L0W_C0MPLEXITY 12.31 % 



SEQ MVMADGPRHLQRGPVRVGFYDIEGTLGKGNFAVVKLGRHRITKTEVAIKIIDKSQLDAVN 

SEG 

IctpE EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 

SEQ LEKI YREVQIMKMLDHPHI IKLYQVMETKSMLYLVTEYAKNGEI FDYLANHGRLNESEAR 

SEG 

IctpE HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHH 

SEQ RKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFFKSGELLATWCGSP 

SEG 

IctpE HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC-CCCCCG 

SEQ PYAAPEVFEGQQYEGPQLDIWSMGVVLYVLVCGALPFDGPTLPILRQRVLEGRFRIPYFM 

SEG 

IctpE GGCCHHHHHCCCBC-HHHHHHHHHHHHHHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTT 

SEQ SEDCEHLIRRMLVLDPSKRLTIAQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV 

SEG 

IctpE CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG 

SEQ LRLMHSLGIDQQKTIESLQNKSYNHFAAIYFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

SEG 

IctpE 

SEQ AEQTVAKAQTVGLPVTMHSPNMRLLRSALLPQASNVEAFSFPASGCQAEAAFMEEECVDT 

SEG 

IctpE 

SEQ PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 

SEG xxxxxxxxxxx 

IctpE 

SEQ RRHTLSEVTNQLVVMPGAGKIFSMNDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 

SEG 

IctpE 



609 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



„SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 

SEQ 
SEG 
lctpE 



ANQPSPRMTSPFISLRPTNPAMQALSSQKREVHNRSPVSFREGRRASDTSLTQGIVAFRQ 



HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 
xxxxxxxxxxxxxxxx. . . . xxxxxxxxxxxx . 



LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 
xxxxxxxxxxxxx 



RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 
xxxxxxxxxxx xxxxxxxxxxxxxxx 



SSEQMQYSPFLSQYQEMQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 



PLQFSYQTCELPSAASPAPDYPTPCQYPVDGAQQSDLTGPDCPRSPGLQEAPSSYDPLAL 
XXX 



SELPGLFDCEMLDAVDPQHNGYVLVN 



Prosite for DKFZphtes 3_15kll . 1 



PS00001 


115- 


>119 


ASN GLYCOS YLATION 


PDCC00001 


psooooi 


320- 


■>324 


ASN GLYCOS YLATION 


PDCCOC'031 


PS00004 


258- 


>262 


CAMP_PHOSPHO_SITE 


PDGC00034 


PS00004 


355- 


■>359 


CAMP PHOSPHQ SITE 


PDOC00004 


PS00004 


481- 


•>485 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


584- 


■>588 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


257- 


•>260 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


339- 


•>342 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


420- 


■>423 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


475- 


•>478 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


534- 


•>537 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


545- 


•>548 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


554- 


•>557 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


567- 


■>570 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


579- 


•>582 


PKC PHOSPHO SITE 


PDOC0C005 


ES00005 


670- 


■>673 


PKC PHOSPHO SITE 


PDOCOCOOb 


PS00006 


45 


:->46 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54 


; ->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


128- 


•>132 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292- 


■>296 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


359- 


•>363 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


■>39B 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


458- 


■>462 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


484- 


■>488 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


503- 


■>507 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


515- 


•>519 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


534- 


>538 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


878- 


>882 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


893- 


>897 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


672- 


>680 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


100- 


>108 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


372- 


>378 


MYRISTYL 


PDOC00008 


PS00008 


871- 


■>877 


MYRISTYL 


PDOC00008 


PS00008 


905- 


■>911 


MYRISTYL 


PDOC00008 


PS00009 


134- 


>1 33 


AMI DAT I ON 


PDOC0000S 


PS00009 


582- 


>586 


AMI DAT I ON 


PDOC00009 


PS00107 


26->50 


PROTEIN KINASE ATP 


PDOC00100 


PS00108 


138- 


•>151 


PROTEIN KINASE ST 


PDOC00100 



HMM NAME 



Pfam for DKFZphtes3_15kll . 1 
Eukaryotic protein kinase domain 
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HMM *YeigRiIGeGsFGtVY]cCiWr . TGelVAIKIIkkrsms F1REI 

Y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAVVKLGRHRITKTEVAIKIIDKSQLDAVNLEKI YREV 

HMM qlMRrLnHPNT IRFYDwFedddDHI YMIMEYMeGGDLFDYIrrngpMsEw 

QIM++L+HP+II++Y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 
Query 69 QIMKMLDHPHIIKLYQVME-TKSMLYLVTEYAKNGEIFDYLANHGRLNES 

HMM elrf IMyQILrGMeYLHSMgllHRDLKPENILIDeNgqIKIcDFGLARqM 

E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRKIVHRDLKAENLLLDNNMNIKIADFGFGNFF 

HMM nnYerMtt f CGTPWYMMAPEVIImg . nyYt tkVDMWSFGCILWEMMTGep 

+++E++ T CG+P+Y APEV +G +Y +++ D+WS+G++L+ +++G + 
Query 163 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGVVLYVLVCGAL 

HMM PFyddnMemlmrliqrf rrpf WpnCSeElyDFMrwCWnyDPekRPTFrQI 

PF++ ++ + + +++ R+++++ +SE++ +++R+++ +DP+KR+T-I- QI 
Query 216 PFDGPTLPILRQRVLEGRFRIPYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 



PCT/IB00/01496 

68 
117 
167 
215 

265 
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DKFZphtes3_17f 10 



group: testes derived 

DKFZphtes3_15j 18 encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

similarity to neurofilament proteins 

Sequenced by GBF 

Locus: unknown 

Insert length: 2533 bp 

Poly A stretch at pos. 2507, no polyadenylation signal found 

1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 
51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 

101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 

151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 

201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 

251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 

301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 

351 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 

401 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 

451 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 

501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 

551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 

601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 

651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 

701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 

751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 

801 GCACAAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 

851 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 

901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 

951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 
1001 TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 
1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 
1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 
1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 
1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 
1251 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 
1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 
1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 
1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 
1451 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 
1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 
1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 
1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 
1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 
1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 
1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 
1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 
1851 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 
1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 
1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 
2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 
2051 CCTTAAAGAT TTGTCTAATA CCAATGATGG ACAGGCTCCC ACTCTTGAAA 
2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 
2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 
2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 
2251 ATTAAAGGGG ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 
2 301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 
2351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 
2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 
2451 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 
2 501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 
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No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 18 bp to 2147 bp; peptide length: 710 
Category: similarity to known protein 
Classification: unclassified 



1 MDRSQQTSRT 

51 KEDALKHKSS 

101 VLLEDELREE 

151 TAKAEPRPAE 

201 EFPAEIQPPS 

251 VELLGEIRSP 

301 LPEEAPREEA 

351 AATEPPADET 

401 LAAIEAPADE 

451 PPSGEETTAE 

501 EEAPAEVQPP 

551 APSEVQPPPA 

601 KHSPPADLLL 

651 AGIPAVKLGS 

701 IELKQRPPEL 



GYWTMMNIPP 
GKIFASEHPE 
VTVPVVQEGS 
ETHVQVQPST 
AEESPSVELL 
SAQKAPIEVQ 
RELQLSTAME 
PAEARSPLSE 
TPAEAQSPLS 
EASAAIQLLA 
PAEEAPAEVQ 
EEAPAEVOSL 
TEEFPIGEAS 
VVLEGEAKFE 



VEKVDKEOOT 
FQPATNSNEE 
AVKKVASAEI 
EETPDAEAAT 
AEILPPSAEE 
PLPAEGALEE 
TPAEEAPTEF 
ETSAEEAHAE 
EETSAEEAPA 
ATEASAEEAP 
PPPAEEAPAE 
PAEETPIEET 
AEVSPPPSEQ 
EVSKINSVLK 



YFSESEI VVI 
IGQKN1SRTS 
EPPSTEKFPA 
AVAENSVKVQ 
SPSEEPPAEI 
APAKVEPPTV 
QSPLPKETTA 
VQSPLAEETT 
EVQSPSAKGV 
AEVQPPPAEE 
VQPPPAEEAP 
LAAVHSPPAD 
TPEDEALVEM 
DLSNTNDGQA 



SRPDSSSTKS 
FTQETKKGPP 
KIQPPLVEEA 
PPPAEEAPLV 
LPPPAEKSPS 
EETLAEVQPL 
EEASAEIQLL 
AEEASAEIQL 
SIEEAPLELQ 
APAEVQPPPA 
AEVQPPPAEE 
DVPAEEASVD 
VSTEFQSPQV 
PTLEI ESVFH 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_17f 10, frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N = 1, Score = 480, P 
= 7.4e-43 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
= 1, Score = 475, P = le-42 



>PIR:A37221 neurofilament triplet H protein - rat 
Length = 1,072 



HSPs: 



Score = 480 (72.0 bits), Expect = 7.4e-43, P = 7.4e-43 
Identities =185/622 (29%), Positives = 320/622 (51%) 



Query: 


33 


Sbjct: 


436 


Query: 


93 


Sbjct: 


496 


Query : 


153 


Sbjct: 


555 


Query: 


212 


Sbjct: 


610 


Query: 


269 


Sbjct: 


670 


Query: 


328 


Sbjct: 


722 


Query: 


384 



SE +1 V+ + + 



+E 



+ + + ++ 



E Q 



E G + + TS 



++EE 



K + AE + P+ K PA+++ P ++ A 



+A+ PAE 



V+ P+T ++P + A A++ +V+ P + + P + PAE + P+ 
- VK- S PATVKS PAEAKS PAEAKS PAEVKS P AT VKS PGEAKS PAEAKS PAE 609 



AE 



P++ +SP E + PAE 



KSP+ V+ E +SP+ K+P+ 



++P +V+ P 



++P E A+ 



++PAE ++P 



TEFQSPLPKETTAEEASAEIQLLAATEPPAD-ETPAEARSPLSEETSAEEAHAEVQS 383 

E + P ++ AE S A + PA+ ++PAEA+SP+ E S E+A + V+ 

AEAKP PAEAKS PAEAKS P AEAKSPAEAKSPAEAKSPV-EVKSPEKAKSPVKEGAK 775 

384 PLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPLSEET-SAEEAPA-EVQSPSAKGV 4 40 
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T 7\ F 4- F-t-Zi _i_ a_-L ti D 7A 4- a.j-p 4-nxcpj. C*p" C F+Zl +V+^P AK 
ijMtl* i I_j ' rt t + r 1+ rAt TTr thtott CiFj o Lin ~ v to ~ rti\ 




Sbjct: 


776 


SLAEAKSPEKAKSPVK--EEIKPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTP 


833 


Qucity ; 


441 


SIEEA--PLELQPPSGEETTA-EEASAAIQLLAATEASA EEAPAEVQPPPAEEAPAE 


4 94 






+ EEA P +++ P +-+ A EEA + + TE A EE + V+ A+E P + 




Sbjct: 


834 


AKEEAKRPADI RSPEQVKS PAKEEAKS PEKEETRTEKVAPKKEEVKS PVEEVKAKEPPKK 


893 


Query. 


4 95 


VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 


553 






V+ P EV+ +EAP E Q P AEE + P +++P E + EEA 




Sbjct: 


894 


VEEEKT PATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP — KDSPGEAKK EEAKE 


948 




554 


EVQPPPAEEAPAEV QSLP AEETPIEETL — AAVHSPPADDVPAEEASVD-KHS 


603 






+ P EE PA++ " ++ P AE+ +E + P ++VPA D K 




Sbjct: 


949 


KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 


1008 


Query: 


604 


poAriT t t TFFFPTf""FZlC.ZlFVCPP PCFrtT — PFnFlT UPHU^TRFrt^pri f, d Q 








+ EE P +A A+ P E + P+ E ++ ST+ + Q 




Sbjct: 


1009 


KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDOKDSO 1057 




Score 


= 473 


/7T 0 hi t-c\ Fynorf — A fid - AO P — A P.o — AJ 
\ I L . V JJiLoJ / CiAfJfciUL — 4.0C 4 £. t tr I.OC 4i 




Identities = 


= 184/628 (29%), Positives = 310/628 (49%) 




Query: 


18 


1PPVEKVDKEQQTYFSESEI VVTSRP DSSSTKSKEDALKHKSSGKI FASEHPEFQPA 


7 4 






T \TW -1-4-1? -1- 4--1- -1- F4- 4- 4- C 4- A 4- P 4-1 




Sbjct: 


440 


IKVVEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 


499 


Query: 


75 


i iv j in Hi d J. rvLN i ji\ l o r lycji r\i\\j tr tr v liIiEjUCj jjr\ijCj v l v r v vyDuonvuiw rt ortnij. Lrr j 


134 






4. 4-P 4-4- 4-4- VP r J. p p 4- ~h V 4- BP 4- P4- 
t tt + t +■ IS. Ir Hj+ t. ir + rt T\ ^ rtt, t rt 




Sbjct: 


500 


ASPEKET-KSPVKEEAKSPAEAKSPA EAKSPAEAKSPAEVKSPAEVK-SPAEAKSPA 


554 


Query: 


135 


TFKFPS KTOPPT.VFFATAKAFPRPAFFTHVOVn-P^TFFTPnAFAATAVAPN^VKVOPPP 
± H* rv E tr t\i\ J- \£ tr tr iiv DLni rt i\rt lj c i\ c rt l_j t_j i n v ^ v w n o i doi l i_vrt £j rt_rt ± rt v noii jviwycc n 


193 






V OB 4-4- 4- D 4.4- G 4- Q 4- 4- 4- 4-\T-l- Dj-T 4- 4- P -4- 7A 7A 4- 4- 4-W4- P 

1\ trt\~ ii tr • ' rt > rt > ~ < iV' tr » * t t tr t rt rt~ ~ > v ~ it 




Sbjct: 


555 


EAKSPAEVKSPATVKSPAEAKSPAEAKSPAEVKSPATVKSPGEAKSPAEAKSPAEVKSPV 


614 


Query: 


194 


LFFSPT -UFrDlFTOPPQaFF^ P^-UFI T Zv F T T P P ^ Zi FF C p C f_ p p p ar T T pppnpvcpC 


2 50 






±4D 4. Dl 4.4. P 4.CD+ 4- 11174- P-4- +CD F 4- P &P4- P C P4- 




Sbjct: 


615 


EAKS PAEAKSPASVKS PGEAKS PAEAKS PAEVKSPATVKS PVEAKS PAEVKSPVTVKSPA 


674 


Query: 


251 


—\TTTT 7 rF T R Q PQ &nK"Zi D T P \7fl— DT PZIPI^ZIT IT — PZ1 Pli K\/B" P PTUF R"T 1 . Zl P\7HPT T PPPHPR 
v rLJjJ_iLjll.J-riOir ort 1 ^ J\M rlCv 1 ^,' r^ljlrrtll.'ortl-iC. tiHrHrvVdr r 1 vc J E J J.JjrtILv^rljljIrHILirttrr l . 


3 07 






4- P^4CD4"k V4- P P4-P fi4- P 4-4- P 4-D ++ IP 4D 4-4-P 




Sbjct: 


675 


EAKS PVEVKSPASVKSPSEAKSPAGAKSPAEAKSPVVAKS PAEAKS PAEAKPPAEAKS PA 


734 


Query: 


308 


rraR [TT nr ^TZIMF T P IF - P fl PT PPOQ P T.P-KP TTAFFA ^ AFTfYr.T.AAT F- - 

Cj Lrt 4\ £j J-j J-i O L rtJ/lCj ICrrtCi Lnr 1 lil yor bt I\ I_» 11 rt 1_« llirt O rt _L ij Jjrt rt 1 Zj 


354 






C 4- 4- T? J.DSP xaD P 4. CP P KP 4- HP CP P 




Sbjct: 


735 


EAKS PAEAKSPAEAKS PAEAKS PVEVKSPErvAKSPVKEGAKSLAEAKSPEKAKSPVKEEI 


794 


Query: 


355 


-PPAn-FTPAFAR^lPT SFFT-S AFF AH A- FVOS PT, AFFTT AFFA S AFTOT.T.AATFAPA 

c t rt U Lj L XT rt rt i\0 r LO ulj 1 O rt I_j Hj rt tlrt £j V V; 0 J7 iJrt Cj £j 1 1 rt U Ort O rt U X v/ J-J Xjrtrt _L Urt £ rt 


4 08 






PDZ14- 4-4-P 4-7A4-CD4- C F4-Z1 4.17 4-CP Z1+ PPB Zi4-T4- -1-4-4-UJ1 
t* fcrrt: ■ "tt - ^ r rt ' O t ' Cj Hi 0 i_j ■ rt *VTOr rt ' Hi H#rt rt ' X ' ' r ■ r: rt 




Sbjct: 


795 


KPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADI RSPEQVKS PA 


854 


Query: 


409 


nFTPAFAn^PT.CFFT^AFF- AP A- - FVO^ P^AKGV^T FF APT.FT.OPP^GFFTTAFF A^AA 

LJ Lj 1 tr rt Hirt O « ij O —j Lj L 0 rt Cj Lj rt XT rt uVyijr C rt I\ \J ViJL Dun t JJ Lj U\^tT L JuLu X 1 rt il>rt k>rtrt 


4 65 






P paxcp PPT P+ Q P FU+cp +PP + +pp p FP + A 




Sbjct: 


855 


KE EAKSPEKEETRTEKVAPKKEEVKSP VEEVKAK-EPPKKVE EEKTPA 


901 


Query: 


466 


TOT T TiliTFZVCflFrZlPnPVnPPPaPFaPAFVOPPPAFFflPfiFVOPPPAFFAPAFVnPPPAF 

i yij XJrtrt 4 Hirt Ort £j Hirt C rt Hi vyr t t rt Hi Hi rt IT rt H* V ^> ST tr t t\HjI-jt\ir rt Hj V r IT rt iLtCjr\tr rt H> V H L rt Hj 


525 






P4- 4-PiiPFnpaFP -i-P +++P F4- a+F a pf 




Sbjct: 


902 


TPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP — KDSPGEAKKEEAKEKKAAA PEE 


956 


Query: 


526 


EAPAEV QPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETL 


581 






E PA++ + P E t-A P++ PSE + P EE PA + +E E+ 




Sbjct: 


957 


ET PAKLGVKEEAKPKEKAE DAKAKE PS K — PSEKEKPKKEEVPAAPEKKDTKEEKTTESK 


1014 


Query: 


582 


AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 636 






P EE DK P TE+ ++ + PSE+ PED+A 




Sbjct: 


1015 


KPEEKPKMQAKAKEE DKGLPQEPSKPKTEKAEKSSSTDQKDSQPSEKAPEDKA 1067 


Score 


= 421 


(63.2 bits), Expect = 3.7e-36, P = 3.7e-36 




Identities = 


= 162/540 (30%), Positives = 275/540 (50%) 




Query: 


135 


TEKFPAKIQPPLVEEATAKAEPR PAEETHVQVQPSTEETPDAEAATAVAENSVKV 


189 






TE P KI P + K+E + +E+ V V+ TEE E T E + 




Sbjct: 


419 


TEGLP-KI-PSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEVTE — EEDKEA 


474 


Query: 


190 


QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE — SPSE-EPPAEILPPPAE 


246 






Q EEA A P AEE+ S E E P EE SP+E + PAE P 




Sbjct: 


475 


QGEEEEEAEEGGEEAATTSPPAEEAASPE — KETKSPVKEEAKS PAEAKS PAEAKS PAEA 


532 


Query: 


247 


KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 


306 






KSP+ E++SP+ K+P E + PAE ++PA+V+ P ++ AE + ++P 




Sbjct: 


533 


KSPA EVKSPAEVKS PAEAKS -PAEA KSPAEVKSPATVKS PAEAKS PAEAKS P 


583 
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Query: 


307 


REEARELQLSTAME — TPAE-EAPTEFQSPLPKETTAEEAS-AEIQLLAATEPPAD-ETP 


361 






E + + E +PAE ++P E +SP+ ++ AE S A ++ + PA+ ++P 




Sbjct: 


584 


AEVKSPATVKSPGEAKSPAEAKSPAEVKSPVEAKSPAEAKSPASVKSPGEAKSPAEAKSP 


643 


Query: 


362 


AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAI EAPAD-ETPAEAQSPL 


419 






AE +SP + ++ E ++ AEV+SP+ ++ AE A + ++ +++PA ++P+EA+SP 




Sbjct: 


644 


AEVKSPATVKSPVEAKSPAEVKSPVTVKSPAE-AKSPVE VKSPASVKSPSEAKSP- 


697 


Query: 


420 


SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 


478 






+ ++PAE +SP AK + ++P E +PP+ ++ AE S A A + A A+ 




Sbjct: 


698 


AGAKS PAEAKSPVVAKS PAEAKS PAEAKPPAEAKSPAEAKSPAE AKS PAEAK- 


749 


Query: 


479 


APAEVQPPPAEEAPAEVQPPPAEEAP — AEVQPPPAEEAPA — EVQPPPAEEAPAEVQPP 


534 






+PAE + P ++P ++P E A AE+P ++P E++PP ++P + + P 




Sbjct: 


750 


SPAEAKSPVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 


809 


Query: 


535 


PAEEAPAEVQPPPAEEAPSEVQPPPAEEA — PAEVQSLPAEETPI EETLAAVHS PPADDV 


592 






EEA + + + E + P EEA PA+++S ++P +E SP ++ 




Sbjct: 


810 


MKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKEE AKSPEKEET 


866 


Query: 


593 


PAEEASVDKHS--PPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQV 


650 




E++K P + ++EP + E P + +T E++ EQP+ 




Sbjct: 


867 


RTEKVAPKKEEVKSPVEEVKAKEPP--KKVEEEKTPATPKTEVKESKKDEAPKEAQKPKA 


924 


Query: 


651 


AGI PAVKLGSVVLEGEAKFEEVSK 674 








+ GEAK EE + 




Sbjct: 


925 


EEKEPLTEKPKDSPGEAKKEEAKE 948 




Score 


= 406 


(60.9 bits), Expect = 1.7e-34, P = 1.7e-34 




Identities = 


= 123/390 (31%), Positives - 213/390 (54%) 




Query: 


308 


EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA EA 


364 






E+ E+Q++ E EE E Q +E AEE E AT PPA+E + E 




Sbjct: 


455 


EQTEEIQVT EEVTEEEDKEAQGE--EEEEAEEGGEEA ATTSPPAEEAASPEKET 


506 


Query: 


365 


RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 


422 






+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 




Sbjct: 


507 


KSPVKEEAKSP AEAKSPAEAKSPAEAKSPAEVKSPAEVKS PAEAKS PAEAKS PAEVK 


563 


Query: 


423 


T S AE-EAPAEVQSP S-AKGVSI EEAPLELQPPSGEE'f TAEEASAAIQLLAATEAS AEEAP 


480 






+ A + + PAE +SP+ AK + ++P ++ P GE + EA + ++ + EA + + P 




Sbjct: 


564 


SPAT VKS PAEAKS PAEAKS PAEVKSPATVKSP-GEAKS PAEAKS PAEVKSPVEA KSP 


619 


Query: 


481 


AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 


540 






AE + P + ++P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 




Sbjct: 


620 


AEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPAEAKSP 


679 


Query: 


541 


AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPI EETLAAVHS P PAD- DVPAEEASV 


599 






EV+ P + ++PSE + P ++PAE +S ++P E A PPA+ PAE S 




Sbjct : 


680 


VEVKS PASVKSPSEAKS PAGAKS PAEAKS PVVAKSPAEAKS PAEAKPPAEAKS PAEAKS P 


739 


Query: 


600 


DKHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGIPAVKLG 


659 






+ PA+ E ++ EV P ++P E ++++ E +SP+ A P VK 




Sbjct: 


740 


AEAKS PAEAKS PAE AKSPVEVKSPEKAKSPVKEG-AKSLA-EAKSPEKAKSP-VK-E 


792 


Query: 


660 


SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 








+ E K E +K S +K+ + + + +A TL+++S 




Sbjct: 


793 


EIKPPAEVKSPEKAK — SPMKEEAKSPE-KAKTLDVKb oZ ! 




Score 


= 255 


(38.3 bits), Expect = 5.5e-18, P = 5.5e-18 




Identities = 


= 124/420 (29%), Positives = 199/420 (47%) 




Query: 


252 


ELLGEIRSPSAQKAPIEVQPLPA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 


306 






ELLG+I+ A +A + + A AL E A++E TV+ TL + 




Sbjct: 


236 


ELLGQIQGCGAAQAQAQAEARDALKCDVTSALREIRAQLEGHTVQSTLQSEEWFRVRLDR 


295 


Query: 


307 


REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 


366 






EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 




Sbjct: 


296 


LSEAAKVN-TDAMRSAQEEI-TEYRRQLQARTT ELEALKSTKESLERQRSELED 


347 


Query: 


367 


PLSEE-TSAEEAHAEVQSPLAEETTAEEASA — EIQLLAAIEAPAD-ETPAEAQSPLSEE 


422 






+ S ++A ++ + L TEA+ EQL++ DEA + EE 




Sbjct: 


348 


RHQVDMASYQDAIQQLDNEL-RNTKWEMAAQLREYQDLLNVKMALDIEIAAYRKLLEGEE 


406 


Query: 


423 


TSAEEAPAEV QSPS-AKGVSIE-EAPLELQPPSGEETT-AEEASAAIQLLA-A 


471 






P+ + PS + + ++ E +++ S +ET EE + IQ+ 




Sbjct: 


407 


CRIGFGPSPFSLTEGLPKI PSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEV 


466 


Query: 


472 


TEASAEEAPAEVQPPPAEEAPAEVQP — PPAEEAPA EVQPPPAEEA — PAEVQPPPA 


524 






TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 




Sbjct: 


467 


TEEEDKEAQGE-EEEEAEEGGEEAATTS PPAEEAASPEKETKSPVKEEAKS PAEAKS PAE 


525 
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Query: 525 EEAPAEVQPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAE-ETPIE-ETLA 582 

++PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 
Sbjct: 526 AKSPAEAKS PAEVKSPAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 584 

Query: 583 AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPPSEQTP-EDEALVENV 641 

V SP PES + PA++ E ++ AE PS ++P E ++ E 
Sbjct: 5B5 EVKSPATVKSPGEAKSPAEAKSPAEVKSPVE AKSPAEAKSPASVKSPGEAKSPAEAK 641 

Query: 642 S-TEFQSPQVAGIP 654 

S E +SP P 
Sbjct: 642 SPAEVKSPATVKSP 655 

score = 253 (38.0 bits). Expect = 9.0e-18, P = 9.0e-18 
Identities = 115/364 (31%), Positives = 166/364 (45%) 

Query: 110 EVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAE-ETHVQVQ- 167 

E PVV + A K + AE +PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 
Sbjct: 705 EAKSPVVAKSPAEAK-SPAEAKPPAEAKSPAEAKSPAEAKSPAEAKS-PAEAKSPVEVKS 762 

Query: 168 PSTEETPDAEAATAVAE — NSVKVQPPPAEEA — PL-VEFPAEIQPPSAEE — SPSVELL 220 

P ++P E A ++AE + K + P EE P V+ P + + P EE SP 
Sbjct: 763 PEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSPMKEEAKSPEKAKT 822 

Query: 221 AEILPPSAEESPSEEP — PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE — 275 

++ P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
Sbjct: 823 LDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKE EAKSPEKEETRTEKVAPKKEEVK 879 

Query: 276 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+EE AK P VEE E P P+ +E ++ A + AEE P TE 

Sbjct: 880 SPVEEVKAKEPPKKVEE EKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE--ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

Sbjct: 937 SPGEAKKEEAKEK KAAA--PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 a-eeasaeiqllaaieapadetpaeaqsplseetsaeeapaevqspsa-kgvsieeaple 448 

EE A + E E+ + P + + EE Q PS K E++ 

Sbjct: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 449 LQPPSGEETTAEEASAA 465 

Q S A E AA 

Sbjct: 1052 DQKDSQPSEKAPEDKAA 1068 

Pedant information for DKFZphtes3_17f 10, frame 3 



Report for DKFZphtes3_17f 10 . 3 

[LENGTH] 710 

[MW] 75131.94 

[pi] 4.02 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 34.08 % 

SEQ MDRSQQTSRTGYWTMMNIPPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRD cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKI FASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGS 

SEG 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

SEG xxxxxxxxxxx 

PRD chhhhhhhccccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG xxxx xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccccccchhhhhhhhhhc 

SEQ LPEEAPREEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADET 

SEG xxxxxxxxxxxxxxx . . . .xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAEEAHAEVQSPLAEETTAEEASAEIQLIAAIEAPADETPAEAQSPLS 

SEG XXKX. . . . xxxxxxxxxxxx xxxxxxxxxxxx xxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQS PSAKGVS I EEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx xxxxxxxx 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDVPAEEASVD 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPI GEASAEVS PPPSEQT PEDEALVENVSTEFQS PQVAGI PAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEI ESVFHIEliKQRPPEL 

SEG 

PRD eeeehhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_17f 10 . 3) 
(No Pfam data available for DKFZphtes3_17f 10 . 3) 
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DKFZphtes3_17117 



group: metabolism 

DKFZphtes3_17117 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1) . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) . It is a new testis- 
specific transketolase . Transketolase requires thiamin pyrophosphate as cof actor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) and R- 
CHOH-CO-CH(2)OH. 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 



strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 

Sequenced by GBF 

Locus : unknown 

Insert length: 2688 bp 

Poly A stretch at pos. 2649, polyadenylation signal at pos. 2630 



1 GACAAAAGAG AGATGATGGC CAACGACGCC AAGCCCGACG TGAAGACCGT 
51 GCAGGTGCTG CGGGACACAG CCAACCGCCT GCGGATCCAT TCCATCAGGG 
101 CCACGTGTGC CTCTGGTTCT GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 
151 GAGGTCGTGT CTGTCCTCTT CTTCCACACG ATGAAGTATA AACAGACAGA 
201 CCCAGAACAC CCGGACAACG ACCGGTTCAT CCTCTCCAGG GGACATGCTG 
251 CTCCTATCCT CTATGCTGCT TGGGTGGAGG TGGGTGACAT CAGTGAATCT 
301 GACTTGCTGA ACCTGAGGAA ACTTCACAGC GACTTGGAGA GACACCCTAC 
351 CCCGCGATTG CCGTTTGTTG ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 
4 01 TAGGTACTGC ATGTGGAATG GCTTATACTG GCAAGTACCT TGACAAGGCC 
451 AGCTACCGGG TGTTCTGCCT TATGGGAGAT GGCGAATCCT CAGAAGGCTC 
501 TGTGTGGGAG GCTTTTGCTT TTGCCTCCCA CTACAACTTG GACAATCTCG 
551 TGGCGGTCTT CGACGTGAAC CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 
601 GAGCATGGCG CAGACATCTA CCAGAATTGC TGTGAAGCCT TTGGATGGAA 
651 TACTTACTTA GTGGATGGCC ATGATGTGGA GGCCTTGTGC CAAGCATTTT 
701 GGCAAGCAAG TCAAGTGAAG AACAAGCCTA CTGCTATAGT TGCCAAGACC 
751 TTCAAAGGTC GGGGTATTCC AAATATTGAG GATGCAGAAA ATTGGCATGG 
801 AAAGCCAGTG CCAAAAGAAA GAGCAGATGC AATTGTCAAA TTAATTGAGA 
851 GTCAGATACA GACCAATGAG AATCTCATAC CAAAATCGCC TGTGGAAGAC 
901 TCACCTCAAA TAAGCATCAC AGATATAAAA ATGACCTCCC CACCTGCTTA 
951 CAAAGTTGGT GACAAGATAG CTACTCAGAA AACATATGGT TTGGCTCTGG 
1001 CTAAACTGGG CCGTGCAAAT GAAAGAGTTA TTGTTCTGAG TGGTGACACG 
1051 ATGAACTCCA CCTTTTCTGA GATATTCAGG AAAGAACACC CTGAGCGTTT 
1101 CATAGAGTGT ATTATTGCTG AACAAAACAT GGTAAGTGTG GCACTAGGCT 
1151 GTGCTACACG TGGTCGAACC ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 
1201 TTTACTAGAG CATTCGATCA GCTCCGAATG GGAGCCATTT CTCAAGCCAA 
1251 TATCAACCTT ATTGGTTCCC ACTGTGGGGT ATCCACTGGA GAAGATGGAG 
1301 TCTCCCAGAT GGCCCTGGAG GATCTAGCCA TGTTCCGAAG CATTCCCAAT 
1351 TGTACTGTTT TCTATCCAAG TGATGCCATC TCGACAGAGC ATGCTATTTA 
14 01 TCTAGCCGCC AATACCAAGG GAATGTGCTT CATTCGAACC AGCCAACCAG 
1451 AAACTGCAGT TATTTATACC CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 
1501 AAGGTGGTCC GCCACGGTGT CAATGATAAA GTCACAGTAA TTGGAGCTGG 
1551 AGTTACTCTC CATGAAGCCT TAGAAGCTGC TGACCATCTT TCTCAACAAG 
1601 GTATTTCTGT CCGTGTCATC GACCCATTTA CCATTAAACC CCTGGATGCC 
1651 GCCACCATCA TCTCCAGTGC AAAAGCCACA GGCGGCCGAG TTATCACAGT 
1701 GGAGGATCAC TACAGGGAAG GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 
1751 TCTCCAGGGA GCCTGATATC CTTGTTCATC AACTGGCAGT GTCAGGAGTG 
1801 CCTCAACGTG GGAAAACTAG TGAATTGCTG GATATGTTTG GAATCAGTAC 
1851 CAGACACATT ATAGCAGCCG TAACACTTAC TTTAATGAAG TAAACTAGGC 
1901 TTATTTCTAA AAAGTCAAGT CTATTGGCTT TGGCCCAAAA GCACTGGTAT 
1951 CTTTGTATTA AATTCATGTT TATTGTCACA AAACCATTAT TTATACCTAT 
2001 ACAGTTGTAC TGTTTCTTTT AAAGCAAAGC CATTTAACAT CTTTCTTCAT 
2051 TCCTAATTTG GAAATTAAAG TTTACCTTTC TGTTAATCTA TGTATAAATG 
2101 TTACTCTGAG TTATTAATGT GGATTTTAAA ATTGTAAGCA ATAGAATAGG 
2151 AAATAAAACA ACTACCTAAT ACAAATATTT CTGATAAGAC TACAAATATC 
2201 TGACTGAGCT GGGGATTAAA GTAGAGGTAA CTGTATCTTA AATGAGTATG 
2251 ATTTCCTTGT AAGTTAAAAA AATTGAAATT TAATTGTAGA CTTCAATAGT 
2301 CCAAGTTTTG AAGGATGTTT GAGCTTTTGT ATAATGCCAT TTATACCTGC 
2351 AGTTTTACAG ATAATGTTTG ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 
2 4 01 TTTGCCTTCA TCTCTCCTCT ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 
2 451 ACATCTCTTG ATGCACCACA CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2 501 TAACTGGTTC TAGTTTGCAC ACTACACACA TAGTTTTGTG AAGCTTCAGA 
2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 
2 601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 
2 651 GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

VI mouse adrenocortical tumor cells. 
99123B75: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase . 



Peptide information for frame 1 



ORF from 13 bp to 1890 bp; peptide length: 626 
Category: strong similarity to known protein 
Classification: Metabolism 
Prosite motifs: AT P_GT P_A (595-603) 



1 MMANDAKPDV KTVQVLRDTA NRLRIHSIRA TCASGSGQLT SCCSAAEVVS 

51 VLFFHTMKYK QTDPEHPDND RFILSRGHAA PILYAAWVEV GDISESDLLN 

101 LRKLHSDLER HPTPRLPFVD VATGSLGQGL GTACGMAYTG KYLDKASYRV 

151 FCLMGDGESS EGSVWEAFAF ASHYNLDNLV AVFDVNRLGQ SGPAPLEHGA 

201 DIYQNCCEAF GWNTYLVDGH DVEALCQAFW QASQVKNKPT AIVAKTFKGR 

251 GIPNIEDAEN WHGKPVPKER ADAIVKLIES QIQTNENLIP KSPVEDSPQI 

301 SITDIKMTSP PAYKVGDKIA TQKTYGLALA KLGRANERVI VLSGDTMNST 

351 FSEIFRKEHP ERFIECIIAE QNMVSVALGC ATRGRTIAFA GAFAAFFTRA 

401 FDQLRMGAIS QANINLIGSH CGVSTGEDGV SQMALEDLAM FRSIPNCTVF 

451 YPSDAISTEH AIYLAANTKG MCFIRTSQPE TAVIYTPQEN FEIGQAKVVR 

501 HGVNDKVTVI GAGVTLHEAL EAADHLSQQG ISVRVIDPFT IKPLDAATII 

551 SSAKATGGRV ITVEDHYREG GIGEAVCAAV SREPDILVHQ LAVSGVPQRG 

601 KTSELLDMFG ISTRHIIAAV TLTLMK 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_17117, frame 1 

SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68)., N = 1, 
Score - 2222, P = 2.5e-230 

SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK)., N = 1, Score = 
2202, P = 3.3e-228 

TREMBL:RN0 925 6_1 product: "transketolase"; Rattus norvegicus 
Sprague-Dawley transketolase mRNA, complete cds . , N = 1, Score = 2202, 
P = 3.3e-228 



SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK) . , N = 1, Score = 
2200, P = 5.3e-228 



>SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68). 
Length = 623 

HSPs: 

Score = 2222 (333.4 bits), Expect = 2.5e-230, P = 2.5e-230 
Identities = 417/614 (67%), Positives = 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRI HS I RATCASGSGQLTSCCSAAEVVS VLFFHTMKYKQTDPEH 66 
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Sbjct: 



KPD + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 
6 KPDQQKLQALKDTANRLRISSIQATTAAGSGHPTSCCSAAEIMAVLFFHTMRYKALDPRN 65 



Sbjct: 



Query : 



67 PDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVDVATGSL 12 6 

P NDRF+LS+GHAAPILYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 
66 PHNDRFVLSKGHAAPILYAVWAEAGFLPEAELLNLRKISSDLDGHPVPKQAFTDVATGSL 12 5 



Query: 127 GQGLGTACGMAYTGKYLDKAS YRVFCLMGDGESSEGSVWEAFAFASHYNLDNLVAVFDVN 18 6 

GQGLG ACGMAYTGKY DKAS YRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 
Sbjct: 126 GQGLGAACGMAYTGKYFDKASYRVYCMLGDGEVSEGSVWEAMAFAGIYKLDNLVAIFDIN 185 

Query: 187 RLGQSGPAPLEHGADI YQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPTAIVAKT 24 6 

RLGQS PAPL+H DIYQ CEAFGW+T +VDGH VE LC+AF QA K++PTAI+AKT 
Sbjct: 186 RLGQSDPAPLQHQVDI YQKRCEAFGWHTIIVDGHSVEELCKAFGQA KHQPTAIIAKT 242 

Query: 247 FKGRGIPNIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQISITDIK 306 

FKGRGI IED E WHGKP+PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1+ 
Sbjct: 243 FKGRGITGIEDKEAWHGKPLPKNMAEQIIQEIYSQVQSKKKILATPPQEDAPSVDIANIR 302 

Query: 307 MTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHPERFIEC 366 

M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GDT NSTFSE+F+KEHP+RFIEC 
Sbjct: 303 MPTPPS YKVGDKI ATRKAYGLALAKLGHASDRI IALDGDTKNSTFSELFKKEHPDRFIEC 362 

Query: 367 I IAEQNMVS VALGCATRGRTI AFAGAFAAFFTRAFDQLRMGAISQANINLIGSHCGVSTG 426 

IAEQNMVS+A+GCATR RT+ F FAAFFTRAFDQ+RM AIS++NINL GSHCGVS G 
SbjCt: 363 YIAEQNMVSIAVGCATRDRTVPFCSTFAAFFTRAFDQIRMAAISESNINLCGSHCGVSIG 422 

Query: 427 EDGVSQMALEDLAMFRSI PNCTVFYPSDAISTEHAI YLAANTKGMCFIRTSQPETAVI YT 486 

EDG SQMALEDLAMFRS+P TVFYPSD ++TE A+ LAANTKG+CFIRTS+PE A+IY+ 
Sbjct: 423 EDGPSQMALEDLAMFRSVPMSTVFYPSDGVATEKAVELAANTKGICFIRTSRPENAIIYS 482 

Query: 487 PQENFEIGQAKVVRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFTIKPLDA 54 6 

E+F++GQAKVV +D+VTVIGAGVTLHEAL AA+ L + IS+RV+DPFTIKPLD 

SbjCt: 483 NNEDFQVGQAKVVLKSKDDQVTVIGAGVTLHEALAAAESLKKDKISI RVLDPFTIKPLDR 542 

Query: 547 ATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRGKTSELL 606 

1+ SA+AT GR++TVEDHY EGGIGEAV AAV EP + V +LAVS VP+ GK +ELL 
SbjCt: 543 KLILDSARATKGRI LTVEDHYYEGGIGEAVSAAVVGEPGVTvTRLAVSQVPRSGKPAELL 602 

Query: 607 DMFGISTRHIIAAV 620 

MFGI 1+ AV 
Sbjct: 603 KMFGIDKDAIVQAV 616 



Pedant information for DKFZphtes3_17117, frame 1 



Report for DKFZphtes3_17117 . 1 



[HOMOL] 

[FUNCAT] 

[FUNCATJ 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

2e-05 



[LENGTH] 
[MW] 



[pi] 



62 6 

67877.52 
5.90 

SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68). 0.0 

m outer membrane and cell wall [M. jannaschii, MJ0681] 3e-48 

g carbohydrate metabolism and transport [H. influenzae, HI1023] 9e-36 

01.05.01 carbohydrate utilization [S. cerevisiae, YPR074c] 5e-32 

30.03 organization of cytoplasm [S. cerevisiae, YPR074c] 5e-32 

02.07 pentose-phosphate pathway [S. cerevisiae, YPR074c] 5e-32 

01.01.01 amino-acid biosynthesis [S. cerevisiae, YPR074c] 5e-32 

i lipid metabolism [H. influenzae, HI1439] 3e-17 

c energy conversion [ H . influenzae, HI1233] 2e-09 

02.01 glycolysis [S. cerevisiae, YBR221c PDB1 - pyruvate dehydrogenase] 



[FUNCAT] 
dehydrogenase] 



[EC] 
[EC] 
[EC] 
[EC] 



[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 



30.16 mitochondrial organization [S. cerevisiae, YBR221c PDB1 - pyruvate 

2e-05 

BL00801F 

BL00801E 

BL00801D Transketolase proteins 
BL00801C Transketolase proteins 
BL00801B Transketolase proteins 
BL00801A Transketolase proteins 

dltrka2 3.28.1.2.1 Transketolase Transketolase, C-terminal domai le-21 

1.2.4.1 Pyruvate dehydrogenase (lipoamide) 8e-ll 

1.2.4.4 3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 

2.2.1.1 Transketolase 0.0 

2.2.1.3 Formaldehyde transketolase le-20 

transferase 0.0 

flavoprotein 2e-07 

Calvin cycle le-40 

heterotetramer 2e-07 
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[PIRKW] pentose phosphate pathway 0.0 

[PIRKW] magnesium le-40 

[PIRKW] thiamine pyrophosphate 0.0 

[PIRKW] oxidoreductase 7e-12 

[ PIRKW] fatty acid biosynthesis 4e-10 

[PIRKW] mitochondrion 2e-07 

[PIRKW] peroxisome le-20 

[PIRKW] homodimer le-40 

[SUPFAM] pyruvate dehydrogenase (lipoamide) alpha chain le-06 

[SUPFAM] pyruvate dehydrogenase (lipoamide) beta chain 7e-12 

[SUPFAM] ferredoxin 2 [4Fe-4S] -related protein 8e-47 

[SUPFAM] thiamine pyrophosphate-binding domain homology 0.0 

[SUPFAM] pyruvate dehydrogenase (lipoamide) 6e-08 

[SUPFAM] ferredoxin 2[4Fe-4S] homology 8e-47 

[SUPFAM] hypothetical protein C2814 2e-21 

[SUPFAM] transketolase 0.0 

[PROSITE] ATP_GTP_A 1 

[PFAM] Transketolase 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 3.04 % 

SEQ MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYK 

SEG 

IngsB HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 



SEQ QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERHPTPRLPFVD 

SEG 

IngsB TTTTTTTTTCEEEETTGGGHHHHHHHHHHHCTTCHHHHHTTTTTTTTTTTTTTTTTTTTC 

SEQ VATGSLGQGLGTACGMAYTGKYLDKAS YRVFCLMGDGESSEGSVWEAFAFASHYNLDNLV 

SEG 

IngsB CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTTEE 

SEQ AVFDVNRLGQSGPAPLEHGADIYQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPT 

SEG 

IngsB EEEEECCEETTEEGGGCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

SEQ AIVAKTFKGRGIPHIEDAENWHGKPVPKERADAIVKLIESQIQTNENLIPKSPVEDSPQI 

SEG 

IngsB EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHH 

SEQ SITDIKMTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEIFRKEHP 

SEG 

IngsB HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHKHHTTTTTEEEEETTTHHHHCCTTCEECCG 

SEQ ERFIECIIAEQNMVSVALGCATRGRTIAFAGAFAAFFTRAFDQLRMGAISQANINLIGSH 

SEG xxxxxxxxxxxxxxxxxxx 

IngsB GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGKHHHHHHHHHCTTTEEEEEC 

SEQ CGVSTGEDGVSQMALEDLAMFRSI PNCTVFYPSDAI STEHAI YLAANTKGMCFIRTSQPE 

SEG 

IngsB CCGGGTTTTTTTTCCHHHHHHHCTTTTEEECCCCHHHHHHHHHHHTTTTCEEEECCCCCB 

SEQ TAVIYTPQENFEIGQAKVVRHGVNDKVTVIGAGVTLHEALEAADHLSQQGISVRVIDPFT 

SEG 

IngsB CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE. . . . 

SEQ IKPLDAATIISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRG 

SEG 

IngsB 

SEQ KTSELLDMFGI STRHI IAAVTLTLMK 

SEG 

IngsB 



Prosite for DKFZphtes3_17ll7 . 1 
PS00017 595->503 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_17117 . 1 
HMM_NAME Transketolase 

HMM *vNtIRiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 
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+N++RI ++ A + +SG ++++++A++- VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEWSVLFFHTMKYKQTDPEHPD 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ W + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 

HMM PGVEVTTGPLGQGIaNaVWMAIAERnLAATYNRPGFDI f DHYTYCFMGDG 
++ +V+TG+LGQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV-DVATGSLGQGLG TACGMAYTGK YL DKASYRVFCLMGDG 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSIDGdTdlWFqEDtYakRF 
+ +EG++WEA ++A+H++L+N++A +D NR++++G++++ + D+Y+ + 
Query 158 ESSEGSVWEAFAFASHYNLDNLVAVFDVNRLGQSGPAPLEHGADIYQNCC 

HMM EAYGWHVIEVEnDGHDvEelcaAIEeAKaekDRPTLliCRTVIGYGSPNk 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T4-+G+G+PN 
Query 208 EAFGWNTYLV— DGHDVEALCQAFWQASQVKNKPTAIVAKTFKGRGIPKI 

HMM QGTHdWHGAPLGeD* 

++ + WHG+P +++ 
Query 25 6 EDAENWHGKPVPKE 2 69 

HMM * PqWePnddklATRKASQqaLeaiGPaLPEf WGGSADLTPSNLTrWKGmv 

P++++ +DKIAT K+++ AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSE1FRKE 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGI AIHGgNFRPYGGT 

+ + R+I++ I+E++M++++ G+A++G+ ++++ G 

Query 359 H PERFIECIIAEQNMVSVALGCATRGR-TIAFAGA 

HMM FMMFyDYARPAIRMAALMelPVIWVWTHDSIGLGEDGPTHQPVEHLAHFR 
F++F+++A++++RM A++ +++++++H-+++ GEDG +++++E+LA+FR 
Query 393 FAAFFTRAFDQLRMGAISQANINLIGSHCGVSTGEDGVSQMALEDLAMFR 

HMM alPNMsVWRPCDgNETayAWylAvSRehTPtiLILSRQNLPQIErNPrqf 

+IPN +V++P+D+ T+ A YLA+++++ +++++S ++ +++++ P + 
Query 443 SIPNCTVFYPSDAISTEHAIYLAANTKGM-CFIRTSQPETAVIYT-PQEN 

HMM ekvaRGGYVLkDmdnePDVILIATGSEMELAvaAAKlLadEGIkaRVVSM 
++++++++V + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 
Query 491 FEIGQAKVVRHGVN — DKVTVIGAGVTLHEALEAADHLSQQGISVRVIDP 

HMM PCTeWFD kQDeEYReSVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + ++++R +++DH++ +++++++V ++ +++ + 

Query 539 FTIKPLDAATI ISSAKATGGRVITVEDHYR-EGGIGEAVCAAVSREPDI L 

HMM Gal f GMNr FGESSGKAPpevLYkMFGFTPENI* 

+ + + + + + +■ + +■ +L+ MFG+ +1 

Query 588 VHQLAVSGVPQR GKTSELLDMFGISTRKI 616 



PCT/IB00/01496 

68 

in 

157 
207 
255 

358 
392 
442 
4 90 
538 
587 
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DKFZphtes3_17nl2 



group: transcription factors 

DKFZphtes3_17nl2 . 1 encodes a novel 804 amino acid protein which is nearly identical to mouse 
and trout SOX-LZ . 

Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context. 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper . 

The new protein can find application in modulating/blocking the expression of SOX-controlled 
genes . 



nearly identical to mouse SOX-LZ 

complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ, involved in spermatogenesis 

Seguenced by GBF 

Locus: unknown 

Insert length: 2802 bp 

Poly A stretch at pos . 2692, polyadenylation signal at pos. 2660 



1 GGGATAGGAA AGATGAAAGG TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
51 GTTGTCTCAT GTAACAATAG TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
10 1 CAGCCCCTAA GTCAGGTGAT GGAACAAATA CCTACAGTTT AGTCAGGTGA 
151 AACAGGAGTG GGTGGAGGAA GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
201 CAAGCAAGCC ACCTCTCCAT TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
251 TGACCCAGGA TTTAACCTCA AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
301 GTGGCCTCCC ATCTGCCTCT GCACCCCATA ATGCACAACA AACCTCACTC 
351 TGAGGAGCTA CCAACACTTG TCAGTACCAT TCAACAAGAT GCTGACTGGG 
401 ACAGCGTTCT GTCATCTCAG CAAAGAATGG AATCAGAGAA TAATAAGTTA 
451 TGTTCCCTAT ATTCCTTCCG AAATACCTCT ACCTCACCAC ATAAGCCTGA 
501 CGAACGGAGT CGGGACCGTG AGATAATGAC CAGTGTTACT TTTGGAACCC 
551 CAGAGCGCCG CAAAGGGAGT CTTGCCGATG TGGTGGACAC ACTGAAACAG 
601 AAGAAGCTTG AGGAAATGAC TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
651 GGAAAAACTA CTTTCAAAAG ATTGGAAGGA AAAAATGGAA AGACTAAATA 
701 CCAGTGAACT TCTTGGAGAA ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
751 AAAGAACGGC AGCTCTCCAC CATGATTACC CAGCTGATCA GTTTACGGGA 
801 GCAGCTACTG GCAGCGCATG ATGAACAGAA AAAACTGGCA GCGTCACAAA 
851 TTGAGAAACA ACGGCAGCAA ATGGACCTTG CTCGCCAACA GCAAGAACAG 
901 ATTGCGAGAC AACAGCAGCA ACTTCTGCAA CAGCAGCACA AAATTAATCT 
951 CCTGCAGCAA CAGATCCAGG TTCAGGGTCA CATGCCTCCG CTCATGATCC 
1001 CAATTTTTCC ACATGACCAG CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
1051 CAGGGATTCC TCTTCCCCCC TGGAATAACA TACAAACCAG GTGATAACTA 
1101 CCCCGTACAG TTCATTCCAT CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
1151 TCAGCCCTTT ACAGCTCCAG CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
1201 CAGGTGTCAC CTGGAGCAAA GATGCCATCA ACTCCACAGC CACCAAACAC 
1251 AGCAGGGACG GTCTCACCTA CTGGGATAAA AAATGAAAAG AGAGGGACCA 
1301 GCCCTGTAAC TCAAGTTAAG GATGAAGCAG CAGCACAGCC TCTGAATCTC 
1351 TCATCCCGAC CCAAGACAGC AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
1401 CCAGAACCTC TTCCCAGCCA GCAAAACCAG CCCTGTCAAT CTGCCAAACA 
14 51 AAAGCAGCAT CCCTAGCCCC ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
1501 TTAGGTAAAT GGAAAAGTCA ACACCAGGAA GAGACTTACG AATTAGATAT 
1551 CCTATCTAGT CTCAACTCCC CTGCCCTTTT TGGGGATCAG GATACAGTGA 
1601 TGAAAGCCAT TCAGGAGGCG CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
1651 CAACAGCAGC AACAGCCACA TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
1701 TAATATGGGG CTGAACAGCT GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
1751 AGAATTTGGG GCCCCAGTTA ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
1801 GGCCCAGGTG TCATCGACCT TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
1851 AGCAATGAAT GGCTCTGCAG CTAAACTACA GCAGTATTAT TGTTGGCCAA 
1901 CAGGAGGTGC CACTGTGGCT GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
1951 CGTGCCAGCA GCGAGCCACA CATTAAGCGA CCAATGAATG CATTCATGGT 
2001 TTGGGCAAAG GATGAGAGGA GAAAAATCCT TCAGGCCTTC CCCGACATGC 
2051 ATAACTCCAA CATTAGCAAA ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
2101 AACCAGGAGA AGCAACCTTA TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
2151 CCACTTAGAG AAGTACCCAA ACTATAAATA CAAACCCCGA CCGAAACGCA 
2201 CCTGCATTGT TGATGGCAAA AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
2251 ATGAGGTCTC GGAGACAGGA GATGAGGCAG TTCTTTACTG TGGGGCAACA 
23 01 GCCTCAGATT CCAATCACCA CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
23 51 TCACTATGGC AACTACCACA CCATCGCCTC AG AT GAC AT C TGACTGCTCT 
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2401 AGCACCTCGG CCAGCCCGGA GCCCAGCCTC CCGGTCATCC AGAGCACTTA 
2451 TGGTATGAAG ACAGATGGCG GAAGCCTAGC TGGAAATGAA ATGATCAATG 
2501 GAGAGGATGA AATGGAAATG TATGATGACT ATGAAGATGA CCCCAAATCA 
2551 GACTATAGCA GTGAAAATGA AGCCCCGGAG GCTGTCAGTG CCAACTGAGG 
2 601 AGTTTTTGTT TGCTGAATTA AAGTACTCTG ACATTTCACC CCCCTCCCCA 
2 651 ACAAAGAGTT ATTAAAGAGC CCGCATGCAT TTGTGGCTCC ACAATTAAAA 
2701 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2801 AA 



BLAST Results 



No BLAST result 



Medline entries 



95311974 : 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine zipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 



Peptide information for frame 1 



ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 



1 MGRMSSKQAT 
51 HNKPHSEELP 
101 SPHKPDEGSR 
151 EDSSCMEKLL 
201 LISLREQLLA 
251 QHKINLLQQQ 
301 KPGDNYPVQF 
351 PQPPNTAGTV 
401 SPTSPTQNLF 
451 TYELDILSSL 
501 KLSSINNMGL 
551 DAEGSKAMNG 
601 MNAFMVWAKD 
651 ARLSKIHLEK 
701 FTVGQQPQIP 
751 VIQSTYGMKT 
801 VSAN 



SPFACAADGE 
TLVSTIQQDA 
DREIMTSVTF 
SKDWKEKMER 
AHDEQKKLAA 
IQVQGHMPPL 
IPSTMAAAAA 
SPTGIKNEKR 
PASKTSPVNL 
NSPALFGDQD 
NSCRNEKERT 
SAAKLQQYYC 
ERRKILQAFP 
YPNYKYKPRP 
ITTGTGVVYP 
DGGSLAGNEM 



DAMTQDLTSR 

DWDSVLSSQQ 
GTPERRKGST, 
LNTSELLGEI 
SQIEKQRQQM 
MIPIFPHDQR 
SGLSPLOLOQ 
GTSPVTQVKD 
PNKSSIPSPI 
TVMKAIQEAR 
RFENLGPQLT 
WPTGGATVAE 
DMHNSNI SKI 
KRTCIVDGKK 
GAITMATTTP 
INGEDEMEMY 



EKEEGSDQHV 
RMESENNKLC 
ADVVDTLKQK 
KGTPESLAEK 
DLARQQQEQI 
TLAAAAAAQQ 
LYAAQLASMQ 
EAAAQPLNLS 
GGSLGRGSSL 

kmreq:qreq 
gksnedgklg 
arvyrdargr 

LGSRWKSMSN 
LRIGEYKQLM 
SPQMTSDCSS 
DDYEDDPKSD 



ASHLPLHPIM 

SLYSFRNTST 
KLEEMTRTEQ 
ERQLSTMITQ 
ARQQQQLLQQ 
GFLFPPGITY 
VSPGAKMPST 
SRPKTAEPVK 
GKWKSQHQEE 
QQQQPHGVDG 
PGVIDLTRPE 
ASSEPHIKRP 
QEKQPYYEEQ 
RSRRQEMRQF 
TSASPEPSLP 
YSSENEAPEA 



BLAST P hits 



Entry MMSOXLZ2_l from database TREMBL: 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds. 

Score = 3910, P = 0.0e+00, identities = 764/801, positives = 774/801 

Entry 151083 from database PIR: 
SOX-LZ - rainbow trout 

Score = 1774, P = l.le-287, identities = 365/532, positives = 431/532 

Entry S59121 from database PIR: 
SOX6 protein - mouse 

Score = 2319, P = 1.2e-240, identities = 489/660, positives = 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSox5L"; product: "SOX5"; Mus musculus mSox5L mRNA, complete 
cds . 

Score = 1212, P = 8.9e-209, identities = 274/457, positives = 324/457 
Entry MMU010604_1 from database TREMBL: 

gene: "sox5"; product: "L-Sox5 protein"; Mus musculus mRNA for 

transcription factor L-Sox5 

Score = 879, P = 4.2e-195, identities = 190/281, positives = 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2, frame 1 

Report for DKFZphtes3_17nl2 . 1 



[LENGTH) 
[MW] 
[pi] 
[HOMOL] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

cerevisiae, 

[FUNCAT] 

7e-06 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[SCOP] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[KW] 

[KW] 



804 

89332.69 
6. 97 

TREMBL : MMSOXLZ2_l product: 



"SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 0.0 



04.05.01.04 transcriptional control [S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization [S. cerevisiae, YKL032C) 8e-07 
01.07.07 regulation of vitamins, cofactors, and prosthetic groups [S. 
YPR065w] 5e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR089c-a] 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a] 7e-06 

03.01 cell growth [S. cerevisiae, YBR089c-a] 7e-06 

03.16 dna synthesis and replication [S. cerevisiae, YMR072w] 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072w] 2e-04 

dlhmf 1.20.1.1.1 HMG1, fragments A and B [rat/hamster (Rattu le-13 

dllefa_ 1.20.1.1.6 Lymphoid enhancer-binding factor, LEF1 [mous 4e-15 

dlhrya_ 1.20.1.1.4 SRY [Human (Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

ATP_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

AMIDATION 1 

CAMP_PHOSPHO_SITE 2 

CK2_PHOSPHO_SITE 14 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 



SEQ MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 

SEG 

COILS 

lnhm- 

SEQ TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 

SEG 

COILS 

lnhm- 

SEQ GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 

SEG 

COILS 

lnhm- 

SEQ KGTPESLAEKERQLSTMITQLISLREQLLAAHDEQKKLAASQIEKQRQQMDLARQQQEQI 

SEG xxxxxxxxxxxxxxx 

COILS CCCCCC 

lnhm- 

SEQ ARQQQQLLQQQHKINLLQQQIQVQGHMPPLMIPIFPHDQRTLAAAAAAQQGFLFPPGITY 

SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCC 

lnhm- 

SEQ KPGDNYPVQFI PSTMAAAAASGLSPLQLQQLYAAQLASMQVSPGAKMPSTPQPPNTAGTV 

SEG xxxxxxxxxxxx 
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COILS 

lnhra- 

SEQ SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

lnhm- 

SEQ PNKSSIPSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . . xxxxxxxxxxxxxxxxxx 

COILS 

lnhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG . . xxxxxxxxxxxx 

COILS 

lnhm- 

SEQ PGVIDLTRPEDAEGSKAMNGSAAKLQQYYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

COILS 

lnhm- CCC 

SEQ MNAFMVWAKDERRKILQAFPDMHNSNISKILGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

COILS 

lnhm- CCCHHHHHHHHHHHHHHHTTTTCCHHHHHHHHHHHTTTTTTHHHHHHHHHHHHHHHHHHH 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQIPITTGTGVVYP 

SEG xxxxxxxxxxxx 

COILS 

lnhm- HHHTTTTTTT 

SEQ GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQSTYGMKTDGGSLAGNEMINGEDEMEMY 

SEG xxxxxxx 

COILS 

lnhm- 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

lnhm- 
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PS00001 


97- 


■>101 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


172- 


•>176 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


388- 


•>392 


asn" 


"GLYCOSYLATION 


PDOr.03001 


PS00001 


422- 


•>426 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


559- 


■>563 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


62 6- 


•>630 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00004 


12 6- 


•>130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


369- 


•>373 


CAMP PHOSPHO SITE 


PDOC03004 


PS00005 




5->8 


PKC 


PHOSPHO_SITE 


PDOC03005 


PS00005 


26 


!->31 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


94 


->97 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


136->139 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


203- 


>206 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


299- 


•>302 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


390- 


>393 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


512- 


■>515 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


530- 


•>533 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


692- 


■>695 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


26 


->32 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


129- 


>133 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


146- 


>150 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


148- 


•>152 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


154- 


>158 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


186- 


>190 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


203- 


>207 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


221- 


■>225 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


520- 


>524 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


533- 


>537 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


547- 


>551 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


577- 


>581 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


639- 


■>643 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


793- 


>797 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00008 


182- 


>188 


MYRISTYL 


PDOC00008 


PS00008 


431- 


>437 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00017 
PS00029 



437->443 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

ATP_GTP_A 

LEUCINE ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PDOC00029 



Pfam for DKFZphtes3_17nl2 . 1 



HMM_NAME HMG (high mobility group) box 

HMM * PKRPMNAYMLWMQEMRe kl KaENPNdMhNtEISKMiGEMWKnMsEEEKm 

+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK+ 
Query 597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSNISKILGSRWKSMSNQEKQ 

HMM PYEdMAeeEKqRYMKEMPeYK* 

PY+++ +++ + 4++ +P+YK 
Query 64 5 PYYEEQARLSKIHLEKYPNYK 665 
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DKFZphtes3_17nl8 



group: intracellular transport and trafficking 

DKFZphtes3_17nl8 encodes a novel 782 amino acid protein with weak partial similarity to known 
proteins . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 
receptor protein signature 1. In E. coli, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell. 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 



unknown receptor 

protein containes TONB_DEPENDENT_REC_l Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length: 2853 bp 

Poly A stretch at pos . 2806, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
' 701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 



GTCCTTTTAA 
CCTATAAAAA 
TCTTTCACCC 
CACTACCTCC 
ACTGGGGCAG 
GCGCACCCAC 
TGCTGACGGA 
TCGGTGGGTG 
CCAGCTCCTC 
GCACAGCCGG 
CTCGCAAACA 
CCACTCTTCC 
CCAAGAAGAA 
CCCCTGCATC 
CCCCTGCCCT 
AAGCTGAAAG 
TTACGAAACT 
AGGAGACTCT 
AGACTCTCAG 
CAGCATTGTC 
CACCTTCTAT 
CTGTATGTCA 
TTTAATGACA 
CCAGGGCTGT 
TCTTGGATGA 
CACAAGTGGA 
CAAGGTGAAT 
CAGTCACCTT 
AACAATTGTC 
CAGCAACATG 
TCAAGAAGCG 
CTGGCCGCAG 
AGAATTTGTT 
CCAAGCTAAG 
CACCTGGAAT 
TCCTGTGAGC 
TCACATCCAG 
GCCTTGCCCT 
AGACACCCGT 
ACGTGGAGCT 
CTGGTGTTTG 
CCAGTGGCTG 
CCCCCTGCAT 
CTGGACAGCC 
TGTGGTGCAG 
GGGGCCGTGT 
CAGATCTTCC 
TGACTACAAA 
AATCAGTCAA 



GTCAGTAAAT 
ACTACATGGC 
TCGGATCTCT 
ACCACCACCA 
CCAAGCGCTC 
CAGGAGACCC 
GCTCCTCAGA 
CCAACCCCTT 
CACCTCAATG 
GAGAAGTGGC 
TGTCCGCCAT 
ACAGCCTGTC 
AATAGGCAAA 
GAGGAGTGGG 
GAGGCCCGGG 
GGCCACATGG 
ACAAGGCAAA 
CAGACCCCGG 
CCCCACCTCT 
AAGAGGGGAA 
GATGGCTCCT 
GATCCCCACA 
TACCTGGATT 
GTTCACTACA 
GGAAGGTGGG 
GCTGGACTTC 
GAGGAAAT G A 
CACCTCCCTG 
CCCATGGAAT 
GACGACAAGG 
GTTTCAGAAG 
GTCTGTTTAC 
CGGTTCAAGA 
TTTATACTCA 
CCTCAATTGC 
CCAGTTCGGA 
AGGGAAGGCC 
CAGACTGCCC 
GCTGGCTGCA 
GGAGCGCTTC 
GGATCATCTC 
CTGAACACTC 
CCAGTGCCGG 
CCCTGCAGGA 
GGGATGATTC 
TTTGAATGGA 
GGTCTCAACA 
TTCAGTGTTC 
GAAAGCC GAG 



TGAACTAAGT 
TAAGGTTCTT 
AGCTACAAAA 
GCACCACCAC 
CACCCTCTCT 
TGAACAGGTT 
CTGAAGATGA 
GGACATCACC 
CCAAGGAGAT 
TACAGCAGCG 
TGGGGTGAAC 
TGAGCTTTTC 
TCTAGAACTA 
AACCCCTGCC 
AGAAGCTGCA 
AAAGGGAGGA 
GATGCCCTCT 
GTTTACATTA 
CACCCATCTT 
GGCACCCAAG 
CCTTCGTTTA 
TGCTGCAGAG 
CTCCTTGCTG 
ACCTAAAAAC 
ACCACCAATG 
CAGGACAGAG 
AACTAAAGGT 
AATGAGACAG 
GGCATATGAC 
TGTATAAGAT 
ACAGTGACTC 
CATTGAATAT 
TGAGATCCAG 
GGAGAAAGTC 
AGAGACTTTG 
AGACCACCAA 
CGCGAGGGGC 
GCTGGTGCTG 
AGTGCCTGGT 
CTGTTGGCGC 
AAGCCAGAAC 
TCTACAACCA 
TATGACTCCT 
GGACCCTCCC 
TGATGTTTGC 
TATGGCCTCA 
GGATTACAAG 
CCAACTCTGT 
TCAGAAGATA 



CGGTTATTCG 
AATGATTGAC 
GGTCCCCACA 
GTCCAGTGCT 
CCCACCATGG 
TCAGCAGCAG 
AGGCCATGGT 
AGGCGCTTTG 
GGCCTTCAAC 
GACAGTTGTG 
TCGCCTTACC 
TCTCTCTGCT 
CAGAAGATGT 
AACAGCCTGG 
GGAGTTGTGT 
ATATCTCCTA 
CATCTAATGT 
CCCTCCCACT 
CTGCCAACCA 
AAGGCCTTCA 
CTATCCCTCT 
GGAGAACCAT 
GCCCTATTCA 
CAGTTGCCCA 
ACCAGCAGGG 
ACCCTGCTTT 
ACTGGGACAG 
TAACACTCAC 
AAACGGCTGA 
GAGCCGAGCC 
AGTTCATTAA 
CCCACCAAAA 
AACTCATCCC 
TTTTACGATC 
AAGGATGAGC 
AATCCACACC 
GCAGCCCCAC 
CGGAAGCTCA 
GAAGGCGCCC 
CCCGAGACCC 
TACACCAGCA 
CCAGCAGCGG 
ACCGCCTGCT 
CTGATGGTGA 
CGGGGGGAAG 
GCAAGCAGAA 
ATGGGCTACT 
CCTGAGCCTG 
TCCAAGGAAG 



GCAAGCAGTT 
CACAAGCAGA 
CTGAAGAAGC 
GCTGGCAACC 
CCCGTCAGGT 
TCCATCCACC 
GGAGTCTATG 
TGGAGGCCAG 
TGCCTGATCA 
GAAAGAGTCC 
AGCTGATCTA 
GGAAAAGAAG 
CAGCATGCCG 
AGTTCAGCGA 
CGCCACATAG 
CCCCATGATC 
TGGCCCGCAA 
GCAGGTGCTC 
TCATTTCAGT 
AGTTTCATTA 
GGAAACGTCG 
CACCTGCCTC 
ATACTGAAGG 
TATGTCTTAA 
CTATGTAGTC 
CCCTGGAATA 
GACTCCATCA 
TGTGTCGGCC 
ACCGCAGAAT 
CTGGCTGAGA 
TTCTATCTTG 
AGGAGGAGGA 
GAGCGGCTCC 
TCAGTCAGGC 
CTGAGTCTGC 
AAAGCCAAGG 
CAGGTGGGCG 
TGCTCAAGGA 
CTGGTCTCTG 
CAGCCAAGTG 
CTGGGCAGCT 
GGCCGTGGCT 
GCAGTATGAC 
AGAAGAACTC 
CTCATTTTTG 
TCTGCTGAAA 
TCCTGCCGGA 
GAGGATTCTG 
CAGCTCCTCA 
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2451 TTGGCCCTGG AAGACTATGT GGAtJAAGGAG TTATCTCTGG AGGCTGAGAA 
2 501 GACAAGAGAG CCTGAAGTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 
2551 TAACTAGTTG GAAGAAGCAG GCCTdCAAGA AGTAGCGCCA TCCTGGCAGC 
2 601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 
2 651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGI CCTCCAAGCT TCTATAATAA 
27 01 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 
2 751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 
2851 CCG 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 237 bp to 2582 bp; peptide length: 782 

Category: putative protein 

Prosite motifs: ATP_GTP_A (122-130) 

TONB DEPENDENT_REC_1 (1-44) 



1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 

51 FVEASQLLHL NAKEMAFNCL ISTAGRSGYS SGQLWKESLA NMSAIGVNSP 

101 YQLI YHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 

151 LEFSDPCPEA REKLQELCRH IEAERATWKG RNISYPMILR NYKAKMPSHL 

201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 

251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 

301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT NDQQGYVVHK WSWTSRTETL 

351 LSLEYKVNEE MKLKVLGQDS ITVT FTSLNE TVTLTVSANN CPHGMAYDKR 

401 LNRRISNMDD KVYKMSRALA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 

4 51 KKEEEEFVRF KMRSRTHPER LPKLSLYSGE SLLRSQSGHL ESSIAETLKD 

501 EPESAPVSPV RKTTKIHTKA KVTSRGKARE GRSPTRWAAL PSDCPLVLRK 

551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGIISSQNYT 

601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 

651 VKKNSVVQGM ILMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 

701 YFLPDDYKFS VPMSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 

751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 



BLAST P hits 



No BLAST P hits available 



Alert BLASTP hits for DKFZphtes3_17nl8, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_17nl8, frame 3 



Report for DKF2phtes3_17nl8 . 3 



[LENGTH] 


782 




[MW] 


88030.16 




[pi] 


9.22 




[BLOCKS] 


BL00286 Squash family of serine protease inhibitors proteins 


[PROSITE] 


ATP GTP A 1 




[PROSITE] 


MYRISTYL 4 




[PROSITE] 


CAMP PHOSPHO SITE 


3 


[PROSITE] 


CK2 PHOSPHO SITE 


14 


[PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


TONB DEPENDENT REC 1 


1 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASN GLYCOSYLATION 


4 


[KW] 


Alpha_Beta 
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SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLIYHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPANSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RNISYPMILRNYKAKMPSHLMLARKGDSQTPGLHYPPTAGAOTLSPTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQIPTCCRGRTITCLFNDIPGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FNTEGQGCVHYNLKTSCPYVLILDEEGGTTNDQQGYVVHKWSWTSRTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANNCPHGMAYDKRLNRRISNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeecccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRSPTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKCLVKAPLVSDVELERFLLAPRDPSQVLVFGIISSQNYT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYNHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKNSVVQGM 

PRD ccchhhhhhhhhhhhhcccccccceeeecccccceeecccccccccccccccccchhhhh 

SEQ ILMFAGGKLI FGGRVLNGYGLSKQNLLKQI FRSQQDYKMGYFLPDDYKFSVPNSVLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD CC 
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PS00001 


91 


->95 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


182- 


>186 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


379- 


>383 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


598- 


>602 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


403- 


>407 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


511- 


>515 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


652- 


>656 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


48 


:->51 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


177- 


■>180 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


344- 


■>347 


PKC PHOSPHO SITE 


PDOC03005 


PS00005 


450- 


>453 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


497- 


>500 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


513- 


>516 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


523- 


>526 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


631- 


>634 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


723- 


>726 


PKC PHOSPHO SITE 


PDOCOUOOb 


PS00005 


774- 


>777 


PKC_PHOSPHO SITE 


PDOC00005 


PS00006 


1 


'->11 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


131- 


>135 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


256- 


>260 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


329- 


>333 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


345- 


>349 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


406- 


>A10 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


■>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


466- 


■>470 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


571- 


>575 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


693- 


>697 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


717- 


>721 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


145- 


>151 


MYRISTYL 


PDOC00008 


PS00O08 


327- 


>333 


MYRISTYL 


PDOC00008 


PS00008 


592- 


>598 


MYRISTYL 


PDOC00008 


PS00008 


734- 


>740 


MYRISTYL 


PDOC00008 
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PS00013 101->112 PROKAR_LIPOPROTEIN PDOC00013 

PS00017 122->130 AT P_GT P_A PDOC00017 

PS00430 l->44 TONB DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphtes3_17nl8 . 3) 
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DKFZphtes3_18f3 



group: testes derived 

DKFZphtes3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1. 

The novel protein contains two leucine zippers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to TNF- inducible protein CG12-1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4608 bp 

Poly A stretch at pos . 4570, polyadenylation signal at pos. 4550 

1 GACAGAAGTG AATGGGAATG GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 
51 CCCGACGCGC TGCGGCGCTT CCAGGGACTG CTGCTGGACC GCCGAGGCCG 

101 GCTGCACCGC CAGGTGCTGC GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 

151 GCCTGCGCAG GCGCTCCCTC GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 

201 GCAACGGGCG CCCTCGCCGC CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 

251 CCTGGGGACC TCGCTGCTGG TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 

301 CCGGAGGGGC CGTCACCATC ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 

351 TCCCGGGAGC TGCGGAGGGT GCAGGAGATC GCGGCCACCT GCCAGGACCA 

401 GATGCGAGAG ATCCTGAGCT GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 

4 51 GCGGGGACCG CCAGCTGCTG CAGTGCGGGA GGAACGCCTC CATCGCCCTG 

501 TACAATTCTG TCTACTTCAT CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 

551 CCCCAGGCGG GCGGAGGGGG ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 

601 CCAAGATTCA GAAACTGGCC GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 

651 GACGAACTCA GCGAGCAGCT GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 

701 CAGTCGTGGC CACGACCTCA AGATCTCTGC TGACCAGCGT GCAGGGCTGT 

751 TTTTCTGAGA ACATCCTTTC CCCCTAATGA CCGAGGCCAG CAAATCATCC 

801 TCATGGGATG CTCCAGAATT TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 

851 GTTAGGAGCC GAAGGCAAAG GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 

901 GTCCCCAAAG CCCTTCTTTT CCCATCACTG TGACATCTGC CTGGGCTTGA 

951 GTGCTACGGA CTTTTCAGTC TTCCTAGTGG AAAAATGTGA CCCAAAAACT 
1001 CCTTTTCCTT TATCAAAAAC TTTCTGTCTA AACACAGCTG GGCAGGCACT 
1051 CCTGTTTTAA AGTTATTTCG GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 
1101 CTGAGACTGG AGAGAGTGCC ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 
1151 TCTTTGAAGT CCTCAAAATG TGCGTGAGGA AGGCATTTGC CTCTATTCCA 
1201 GAATTTCTGA TACAAAGAAC TCCAGAATCC AGAGCAAATC AGCCCTTCTC 
1251 TGAACGTTGT AGGATGGTTC AGAACCCAGA GAGGACCCTG GTGCTGATAT 
1301 CTCCTCCTCT TCCCTTTCCC CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 
1351 GGTATGAAGT AGGCCTTTCC TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 
1401 CTCAGAGGGG AAGCCCGTGA AGCCGTCATC TAAGTCCTGC TCCCTCACAT 
1451 GAAGCTGAGG GCCAGATAGA TGGAGCGACT GCCAACTTCA TTTCCCGACA 
1501 TCATTGTGTT CAGAAGAGAG TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 
1551 TTGAGACAGG CTTTGTCACT ACTGTGTGAG TGTAGCCACC TAATCTCTCT 
1601 GAGACTGTGT AAAACAAAGA TGATAAAATC TCACCCTGTT GTGAGATATT 
1651 AAATGAGCCA AAGTGCCTAG CATGATGGTG CTGGCTCATA TAGTGTAGTC 
1701 CCTGGAATGG CAAATTAACA TCACCCAGGA ACTTGTTAGA AAGGCAAATT 
1751 CTTGGACACA ACCCTCCTGA TTTATGGAAT CAGAAACTCT GGCTGTGGGG 
1801 CCCAGCAACC TGAGTTTAAA CAATTTCTCT GGGTGGTTCT GCGGCACACT 
1851 AAGGTTTGAA AATCACTACA ACAAATGCTA ACTTCTAATC CCCTTGATGA 
1901 GCTTTCACGA AGTCTCACGG CTTCTCTAGG GACTCCATGG TCTTCAGAGT 
1951 CGTTCACAGA TGACCAAGGA CAGACTGTGT CCCAGAAGCC AAAATGAGAG 
2001 AGAGAGAGAG AGCACGCGTA CGTGCACCCT GGGGCAGTGT CTCACCGTAT 
2051 GAATAAGGGA TGTAACACTA AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 
2101 TGTTGTAGAA ACTGGTACAG AAAGGATCCT ATATGAAGTT CCTGAAACTG 
2151 ACCTTTGTCT ATTATTACCT TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 
2201 TATTTATTTT AAGTTTGTAA TTTAATTTTT AATTATTGTT TAGTGTTTGC 
2251 ATTTAATTTT ATTTAATCAC CACATTTAGA AAATAATAAG AGCAAGTTTC 
2301 TAAATGGGAG ACTGCTGAGG CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 
2351 TTTCTAAGGC AGGGCATGAG CTGGAAATAG CATTGCTTTC CTTGATTGTC 
2401 TCTCTCCTTC AGGGAGATTC TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 
2451 AAGTAAGTGT GGAGAGTCTT GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 
2501 TTGTAATGGA AGCTTGCATT GTGGGATATA TAACTGAGGA AGCATATTTA 
2 551 TCCTGAAGGT ATTTTGCCAG AAGGTATCAC TTGACCTGGA AAAGGAATCT 
2 601 ATTTAGTTCA GGAAAGATAA AAAGTTTAGA GGTATGTGAA GGAAGCACTT 
2 651 AGAACTTGCA AGCCTGATGT CCTATCAAGT TATGTCTTCT GGGTGACAGA 
2701 CAAAATAGCT TGTCTTATGG TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2751 CTGTAAGAAA CTGTCAGTGA 

2801 TTAACAACTG TAATGTTGAA 

2851 ATGCAAAAAC GGTGCCTCTG 

2901 ACCCAATCTG TCCGCACCCT 

2 951 GTATAATTTC AGTACTGGGG 
3001 TTATTTTTTC TATAAATTGC 
3051 TTTATATTGG TTTCTTTTCA 
3101 CAGATGCTAG CATTTTTTTT 
3151 TAGGCTGGAG TTGCAGTGGT 
3201 CTGGGTTCAA GCAATTCTTC 
3251 AGATGTGCAC CAGCACACCC 
3301 GGGTTTCGCC ATGTTGGCCA 
3351 TCCGCCCACC TTGGCTTCCC 
3401 TCGCCTGGCC AGATGCTAGC 
3451 ATTGTTTTGT TTCACAATCA 
3501 TATTAGTTGT GTTATGGCAT 
3551 TTATTAATGC TTAAGTTTAA 
3601 CTAGAATTAA ACTGGGCACT 
3651 AAACTTTTCC TCTCATATTT 
3701 GTTATGATTT CAGTGGCCCA 
3751 GAACGATACT TTGCACATAG 
3801 ATAATTAACT GTTTAGCTAT 
3851 CATCCATCGC CTTATGTGTG 

3 901 TCAAGTTCAG TTAGATTGAT 
3951 ACGGGGATGT GAATAAGGCT 
4001 CAGGTTGAAA TGGTATGTTG 
4051 ATGATAAGTG TACTTCACAA 
4101 TGTTCTAAAT GTTTAAGTGC 
4151 TTTGAATTGT TCTGTTTCAC 
4201 TCAGGATTCA ATAGAACTGC 
4251 TCGAATCCTA ACTGCTTTGA 
4301 AATATGGTAG GTGTCAAAGT 
4351 GAGTGGTTGT AGAAGTCTCC 
4401 CGTGGTGTAT TTCTCATTCA 

4 451 AAAGACATCG TGCAGAGATA 
4501 TGAGTTCATT TTTTCCCACT 
4551 AATAAATTGC TCATTCCTCC 
4 601 AAAAAAGG 



AAATATGTAC AATTCCTTCA ATTTCCATTC 
AAATAAGTTG AAAAGTCTTT GGGACCATAC 
TTACTTAATT ATTTAATATT CTATAAATGT 
TCCCAGTGAT GGGGCAGTAT GTCTGAGGAA 
TCGGGGAGAG GAGGTGATGT TTCTACATTT 
AATTGGTCTG TATGCTGGTT TATTTTGAAA 
AGCTGGTGTC ATCTCCTAGA CTGTTTCACC 
TTTTTTGAGA CAGAGTCTCA CTCTGTCACC 
TTGATCTCGG CTCACTGCAA CCTCCGACTC 
TGCCTCAGCC TCCTGAGTAG CTGGGATTAC 
GGCTAATTTT TTGTATTTTT AGTAGAGACA 
GGCTGGTCTT GAACTCCTGG CCTTATGTGA 
AAAGTGCTGG GATTACAGGC ATGAGCCACC 
ATTTTAGATC AAACAATTCA TTTTAGATGA 
TTTTAAATCA TTTTAGAATG TACTTCACAT 
AAAGGTACAA CCATTCCCTA ACTCCATCTT 
ATTATATTCT TCCAATGCCT AAGCTATTCC 
TTTGGAAGCA GCAACAGTAA CAGCAGCAGC 
TGGGTGTATC AAAAGTTCTA GACTTTTGAA 
CTTTATTTCT AAGGAAGAGT GTCTACTTTG 
TAGGAACTCA AGAAATACAT TTGAATAATT 
CTTAATGAGA ATTTGTTGAC AACAAAAGAT 
AGTAAGATTG GAGCCTCTAT CAAGATTTAG 
TCTAGAAACA AATATTTATT TCTTTCTTTT 
TTTCCTTAAG GCCTTCATTC TTTAAACAAA 
TAAAAGAGAA GACGGGAGAG AGGTATTTAG 
AAATGCCAAA GTTTGAAAAA TAGGTATGTT 
TTCTCTGTTA GGTTCTGGGG CTTGCAATCA 
AATAAAGGAG ATTCACTGGG TTCTGCATTT 
TCCATTAAAA AAATAATCCT TAGCAAGCAT 
TGCACTTGCC CTCGGGCACC TGTCATTTCC 
CAAAAGTATT TACTGGGAGA AAAAAGAGAG 
CTAAATC AGA CATGTCAAGC AATCAGCCAA 
ATATTTTAGT GTGAATTGAG ACACTGAGAT 
AATGGGGATA CAGTTAAATG TAGCAACTCT 
GTAGCAAAAT TAATGCTTTC TCTTTATTGA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score = 1951, P = 9.0e-101, identities = 411/425 

Entry HS073350 from database EMBL: 
human STS EST303564. 
Score = 1417, P = 8.7e-58, identities = 285/287 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 2 

PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments), N = 1, Score 
- 155, P = 4.5e-10 

TREMBL : HSCG1PA1__1 gene: "COL1A1"; Human proalpha 1 (I) chain of type I 
procollagen mRNA (partial)., N - 1, Score = 155, P = 6.5e-10 
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>PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments) 
Length = 779 

HSPs: 



Score = 155 (23.3 bits), Expect = 4.5e-10, P = 4.5e-10 
Identities = 60/152 (39%), Positives = 67/152 (44%) 



Query: 


7 


rxr n rr DPAAftTAP PA A tat pttaa CPPRPa APPCA — APARfinPAPnAPAOATiPR^ORGR 


€2 




G+ G PG + AR PG GPP PA P GA AP G A A P SQ 




Sbjct: 


230 


GDLGAPGPSGARGERGFPGERGVEGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAP 


289 


Query : 


63 


r\j apDMrDDDtjuorar snDrwDrnT 7Aai^Uf^RC7ii^nCr^"^RRCRHHHVR^T.AnT.7.nT,pn;AAT^ 


122 






L G P RGA PG GD +GA G + G VR L + PG A 




Sbjct: 


290 


GL QGMPGE-RGAAGLPGPKGDRGDAGPKGADGAPGKDG VRGLTGPIGPPGPAG 


341 


Query: 


123 


pDrnuruT _p_r"P nARnpPT.PRVFT.PT.Ad.RCiPPAA 1 ^fi 






GD+G P GP D +P P P AG GPP A 




Sbjct: 


342 


APGDKGEAGPSGPAGTRGAPGDRGEPGPPG P-AGFAGPPGA 381 




Score 


= 121 


(18.2 bits), Expect - 5.4e-05, P = 5.4e-05 




Identities = 


= 52/154 (33%), Positives = 60/154 (38%) 




Query : 


7 


/^T7Ti^~'i"n/^n7iri7TiDQ7\ , j\ at n^TA 7\ P D DDDTi AP P/"" AJiDADr f""ORPf""71PIlf'UiTPRCnDf'^ 
GiijAtjOrOAAWAKKAAALr^l AAor rKr AArrrt-jAAlrAMla brrtrbttrftiJALrrtayKu 


61 






G G PGAA R P AGPP P P G ++G GPA G P + P G 




Sbjct: 


434 


GATGFPGAA-GRVGPPGP3GNAGPPGPPGPAGKEGSKGPRGETGPA-GRPGEVGPPGPPG 


491 


Query: 


62 


RQLAERNGRPRRHRGALAUFtjHFbUijAAbVuKOAOtjljn 


121 






AGP G PG PG RG G +RG R L PG + 




Sbjct: 


492 


P--AGEKGAPGAD-GPAGAPGTPGPQGIAGQRGVVGLPGQRGE RGFPGL PGPS 


541 


Query: 


122 


E , ^7\/" , T">Di'" , tJT PfDT"17APnP£?T DD^EV DT BiTP("'DD'A7AQUPTr 1 Sfl 








G +G R P P + GL GPP + RE 




Sbjct: 


542 


GEPGKQGPSGASGERGPPGP MGPPGLAGPPGESGRE 57 7 




Score 


= 117 


(17.6 bits), Expect = 1.8e-04, P = 1.8e-04 




Identities = 


= 52/148 (35%), Positives - 62/148 (41%) 




Query: 


7 


n/"Ti 7\T.77\ D OT\ 7\ 7\T nf""T , 7\7\/" , DPPP7Aft DDrAT D7\P/"l"'D7iPf*'7j,D7lO?lT DDC OP C2 — P 

OjhjA(j(jir(aAAW AKKAAALit'Lr 1 AAbr r KrAH - - - lrr'VjAArAKuL>irftrlaAr'/iyrtJjr'i\&yr\Va l\ 


62 






G G PG AR +A PG AGP A PPG + GP PG P A +G R 




Sbjct: 


416 


GNVGAPGPKGARGSAGPPG-ATGFPGAAGRVGPPGPS-GNAGP-PGPPGPAGKEGSKGPR 


472 


Query: 


63 


yLA&RNGRr RRHKGALA(Jr^rtF^UijAA^v*jK^A^^unbKK^Krl MM v noJjAL'LiJj'JJjr'UA 


120 






GRP G + PG PG GA G G + ++ LPG 




Sbjct: 


473 


GETGPAGRP GEVGPPGPPGPAGEKGAPGADGPAGAPGTPGPQGIAGQRGVVGLPGQ 


528 


Query: 


121 


nrrnrnDru t DrDnsnnDri —DDI/ITT DT T\t~T crDD "1 ^ *1 








G+RG LPGP + P +G RGPP 




Sbjct : 


529 


R GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 




Score 


= 117 


(17.6 bits), Expect = 1.8e-04, P = 1.8e-04 




Identities = 


= 54/162 (33%), Positives = 64/162 (39%) 




Query: 


7 


rf arror nawnoDniiaT or~T acrDDB D A uppr nn P ARC CZT> A Pf^ A PA DAT. PR ^ OR 


60 




G G PG + PG A+GP P PPG GGAPGP+P + 




Sbjct: 


29 


GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 


88 


Query: 


61 


r-RDT AP"RNPR P RRHRHAT AOPGHPf^ni, AAGVGRGAGGGH ^RRGRHHHV — RSLADLL 


115 




G R L G P + HRG G GD +G G G + R L 




Sbjct: 


89 


GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRGLPGFP 


148 


Query : 


116 


QLPGAA--EG-AGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157, 






GAA G AG+RG +PGP P AG +GPP A 




Sbjct: 


149 


GPKGAAGEPGKAGERG-VPGPPGAVG--PAGKDGEAGAQGPPGPA 190 




Score 


= 113 


(17.0 bits), Expect = 5.4e-04, P = 5.4e-04 




Identities = 


= 54/148 (36%), Positives — DO/14o (39s) 




Query: 


7 


GEAGGPGAAWARRAAALPGTA AGPPRPAAP PGAAPARGGPAP-GAPAQALPR 


57 




G AG PGA A PG A AGPP PA P PG G P P GA A P 




Sbjct: 


374 


GFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPP 


433 


Query: 


58 


SQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 


117 






G A P G PG PG +G G GR V 




Sbjct: 


434 


GATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPAGRPGEVGP 


486 


Query: 


118 


PGAAEGAGDRGHLPGPD— ARDPELPRVFLPLAGLRG 152 






PG AG++G PG D A P P +AG RG 




Sbjct: 


487 


PGPPGPAGEKG-APGADGPAGAPGTPGP-QGIAGQRG 521 




Score 


= 110 


(16.5 bits), Expect = 1.3e-03, P = 1.2e-03 
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Identities = 54/151 (35%), Positives = 60/151 (39%) 



Query: 


7 


GEAGGPGAAWARRAAALPGTAAGPPRPAAPPG— AAPAR-GGPAP-GAPAQALPRSQRGR 


62 






GE G G A + LPG A GPP A PG P GPP GA + +RG 




Sbjct: 


194 


GERGEQGPAGSPGFQGLPGPA-GPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGV 


252 


Query: 


63 


QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 


122 






+ PR GA G GD A G+ G +G R A L PG 




Sbjct: 


253 


EGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGE-RGAAGL PGPK- 


307 


Query: 


123 


GAGDRGHLPGPDARD — PELPRVFLPLAGLRGPPAAA 157 








GDRG GP DP V L G GPP A 




Sbjct: 


308 


— GDRGDA-GPKGADGAPGKDGV-RGLTGPIGPPGPA 340 




Score 


= 109 


(16.4 bits), Expect = 1.7e-03, P = 1.7e-03 




Identities = 


= 55/154 (35%), Positives = 60/154 (38%) 




Query: 


4 


NGN-GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG-GPAPGAPAQALPRSQRG 


61 






NG+ GEAG PG R P A G P A PG RG GA A P +G 




Sbjct: 


67 


NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKG 


125 


Query: 


62 


RQLAE-RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSL ADLL 


115 






+ NGP+G PG PG A GG G V A 




Sbjct: 


126 


EPGSPGENGAPGQ-MGPRGLPGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQ 


184 


Query: 


116 


QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 1 57 








PG A AG+RG GP A P F L G GPP A 




Sbjct: 


185 


GPPGPAGPAGERGE-QGP-AGSPG FQGLPGPAGPPGEA 220 




Score 


= 104 


(15.6 bits), Expect = 6.6e-03, P = 6.6e-03 




Identities = 


= 44/131 (33%), Positives = 49/131 (37%) 




Query: 


2 


EVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 


60 






E GE G PG R LPG GP A PG A RG P P GA A + 




Sbjct : 


126 


EPGSPGENGAPGQMGPR GLPGFP-GPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEA 


181 


Query : 


61 


GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 


120 






G Q P RG G PG G+ G G G+ DL PG 




Sbjct: 


182 


GAQGPPGPAGPAGERGEQG PAGS PG — FQGLP-GPAGPPGEAGKPGEQGVPGDL-GAPGP 


237 


Query: 


121 


AEGAGDRGHLPG 132 








+ G+RG PG 




Sbjct: 


238 


SGARGERG-FPG 248 




Score 


= 104 


(15.6 bits), Expect - 6.6e-03, P = 6.6e-03 




Identities ■ 


= 43/131 (32%), Positives = 55/131 (41%) 




Query: 


7 


GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQLAE 


66 






GEAG G A R A PG GPP PGA GP PGA Q + + G A+ 




Sbjct: 


347 


GEAGPSGPAGTRGA PGDR-GEPGPPGPAGFA GP- PGADGQPGAKGEPGDAGAK 


397 


Query: 


67 


RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 


126 






+ P G PG G++ A +GA G G + A + PG + AG 




Sbjct: 


398 


GDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGA-AGRVGPPGPSGNAGP 


456 


Query: 


127 


RGHLPGPDARD 137 








G PGP ++ 




Sbjct: 


457 


PGP-PGPAGKE 466 




Score 


= 104 


(15.6 bits). Expect = 6.6e-03, P = 6.6e-03 




Identities = 


= 56/162 (34%), Positives = 62/162 (38%) 




Query: 


7 


GEAGGPGAAWARRAAALPGTAA--GPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQL 


64 






G G PGA A G GP P PGA ARG P P Q PR +G 




Sbjct: 


608 


GPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGARG PAGP-QG-PRGBKGZTG 


662 


Query: 


65 


AERNGRPRRHRG ALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLA-DLLQ-LPG 


119 






+ + + HRG PG PG GA G RG SDL LPG 




Sbjct: 


663 


ZZGBRGI KGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGSPGKDGLNGLPG 


722 



Query: 120 AASGAGDRGHL — PGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQ 168 

G RG GP A P P P G GPP-t- L +P Q 

Sbjct: 723 PIGPPGPRGRTGDAGP-AGPPGPPG P-PGPPGPPSGGYDLSFLPQPPQ 768 

Score = 101 (15.2 bits), Expect = 1.5e-02, P = 1.5e-02 
Identities = 49/148 (33%), Positives - 55/148 (37%) 



Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P G 
Sbjct: 152 GAAGEPGKAGERGVPGPPG-AVGP AGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGF 207 
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Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

Q P G + G PGDL A G G RG R + PG A 

Sbjct: 208 QGLPGPAGPPGEAGKPGEQGVPGDLGAP GPSGARGERGFPGE-RGVEGP PGPAG 260 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

GGPGD + PG+GP 
Sbjct: 261 PRGANG-APGNDGAKGDAGAPGAP— GSQGAP 289 

Score = 100 (15.0 bits), Expect = 1.9e-02, P = 1.9e-02 
Identities = 40/130 (30%), Positives = 48/130 (36%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



7 GEAGGPGAAWARRAAALPGT — AAGPPRPAAPPGAAPARG — GPA — PGAPAQALPRSQR 60 
G G PG + PG A+GP P PPG GGAPGP+P + 

29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

61 G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P + HRG G GD +G G G + L 

8 9 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRG-LPGF 147 



118 



148 



PGAAEGAGDRG 128 
PG AG+ G 
PGPKGAAGEPG 158 



Score = 99 (14.9 bits), Expect = 2.5e-02, P = 2.5e-02 
Identities = 53/156 (33%), Positives = 61/156 (39%) 

Query: 7 GEAGGPGAAWARRA AALPGT- -AAGPPRPAAPPGAAPARG- -GPA PGAPAQAL 55 

G G PGA R A PG AGPPPG+RG GPA P PA A 

Sbjct: 587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 646 

Query: 56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

Sbjct: 647 GPAGPQGPRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGP- 705 

Query: 109 RSLADLLQL PGAAEGAGDRG — HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 706 PGSAGSPGKDGLNGLPGPIG— PPGPRGRTGDAGPAGPP 742 

Score = 98 (14.7 bits), Expect = 3.3e-02, P = 3.3e-02 
Identities = 51/158 (32%), Positives = 58/158 (36%) 

Query: 7 GEAGGPGAAWARRAAALPGTR AGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 60 

G G G R AA LPG AGP PG RG P G P A + 

Sbjct: 287 GAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDK 346 

Query: 61 GRQLAERNGRPRRHRGA LAQPGH PGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

SbjCt: 347 GE--AGPSG-PAGTRGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGP- 4 02 

Query: 118 PGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVR 159 

PGAAGG+ AP+R GGPAAR 
Sbjct: 403 PGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGAAGR 444 

Score = 96 (14.4 bits), Expect = 5.7e-02, P = 5.5e-02 
Identities = 46/152 (30%), Positives = 57/152 (37%) 

Query: 6 NGEAGGPGAAWARRAAALPGTAA — GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG A G P P PA ++ R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

P RG G G+ +G G RG H R + L PG 

Sbjct: 634 AGPIGPVGPAGARGPAGPQGPRGB KGZTGZ ZGBRGI KGH-RGFSGLQGPPGPPG 6B6 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 SPGEQG--PS-GASGP AGPRGPPGSA 709 



Score = 94 (14.1 bits), Expect = 9.7e-02, P = 9.2e-02 
Identities = 45/134 (33%), Positives = 56/134 (41%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



24 PGTAAGPPRPAAPPGAAPARGGPA-PGAPAQALPRSQRGRQLAERNGRPRRHR — GALAQ 8 0 

P GPP PG +G P PG P + P RG G P ++ G + 

21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 

81 PGHPGDLAA-GV — GRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGDRGH--LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

7 6 PGRPGERGPPGPQGARGLPGTAGLPGMKGH-RGFSGLDGAKGDAGPAGPKGEPGSPGENG 134 

136 RDPEL-PRVFLPLAGLRGPPAAA 157 
+ + PR LP G GP AA 
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Sbjct: 


135 


Score 


= 92 


Identities - 


Query : 


7 


Sbjct: 


347 


Query : 


66 


Sbjct: 


406 


Query : 


122 


Sbjct: 


466 


Score 


= 92 


Identities = 


Query: 


7 


Sbjct: 


587 


Query: 


61 


Sbjct: 


647 


Query: 


116 


Sbjct: 


704 


Score 


= 90 


Identities : 


Query: 


7 


Sbjct: 


485 


Query: 


66 


Sbjct: 


539 


Query: 


115 


Sbjct: 


599 


Score 


= 83 


Identities - 


Query: 


7 


Sbjct: 


311 


Query: 


61 


Sbjct: 


368 


Query: 


121 


Sbjct: 


424 


Score 


= 82 


Identities = 


Query: 


7 


Sbjct: 


275 


Query: 


67 


Sbjct : 


333 


Query: 


127 


Sbjct: 


388 



135 APGQMGPRG-LP — GFPGPKGAA 154 



1.7e-01, P = 1.5e-01 
Lves = 58/155 (37%) 



GEAG G A R A 



GPP PA 



G + PG 



EG+ G RG GP R E+ 



GA G 



A G P A G P 



GR 



PG A 



P AG +G P A 



51/156 (32%), Positives 



31, P = 1.5e-01 
57/156 (36%) 



G G PGA 



G PR +G 



PG AGPPPG+RG 



+ G G G +G G 



+ P 



45/134 (33%), Positives = 53/134 (39%) 



G G PG A + A 



G A 



+ G PG + 



PGA RG 



G P Q 



R +RG L 
-RGERGFPGLP 538 



-AGV GR-GAGGGHSRRGRHHHVRSLADL 114 

AG GR GA G GR + D 



PAG 



PGP 



12.5 bits), Expect = 1.8e+00, P = 8.3e-01 
49/156 (31%), Positives = 56/156 (35%) 



G+AG GA A 



+ G 



GPP PA PG 



G GPA GAP 



PG 



G PGD 



RV 



G G G 
3DAGPPGPAGPAGPF 

LAGLRGPPAAAVRE 
AG GPP A +E 
NAGPPGPPGPAGKE 

P = 9.0e-01 



R + 



PG 



G+AG PGA ++ A L G 



PG 



RG 



R L 



G P 



+G PG 



G PG G+ G G RG A PGA G 

-GPAGAPGDKGEAGPSGPAGTRGAPGDRGEPGPPGP-AGFAGPPGADGQPGA 387 



A+ 



P AG GPP 



Peptide information for frame 3 
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ORF from 12 bp to 755 bp; peptide length: 248 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (17-39) 
LEUCINE_ZIPPER (24-46) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f3, frame 3 

TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N = 1, Score 
= 135, P = le-06 

TREMBL:HS6302_1 gene: "dJ6802.1"; product: "dJ6802 . 1"; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22. Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N - 1, Score = 107, 
P = 0.0023 



>TREMBL:AF070675_1 product: "TNF-inducible protein CG12-1"; Homo sapiens 
TNF-inducible protein CG12-1 mRNA, complete cds. 
Length = 331 



HSPs: 

Score = 135 (20.3 bits). Expect = 1.0e-06, P = 1.0e-06 
Identities = 30/103 (29%), Positives = 55/103 (53%) 

Query: 30 RLHRQVLRLREVARRLERLRRRSLVANVAGSSLSATGALAA1VGLSLSPVTLGTSLLVSA 8 9 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKLRALANGIEEVHRGCTISNVVSSSTGAASGIMSLAGLVLAPFTAGTSLALTA 150 

Query: 90 VGLGVATAGGAVTITSDL-SLIFCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S E + AT D+++ 

Sbjct: 151 AG VGL G AAS A VT GITTSIVEHS YT S S AEAE -AS RLTATS I DRLK 193 

Pedant information for DKFZphtes3_18f 3, frame 2 



Report for DKFZphtes3_18f 3 . 2 

[LENGTH] 193 

[MW] 19708.24 

[pi] 11.90 

[KW] All_Alpha 

[KW] LOW_COMPLEXITr 55.44 % 

SEQ TEVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

SEQ AEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQFCLLHRLLWLTW 

SEG xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccc 



(No Prosite data available for DKFZphtes3_18f 3 . 2 ) 
(No Pfam data available for DKFZphtes3_18f 3 . 2 ) 

Pedant information for DKFZphtes318f3, frame 3 
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Report for DKFZphtes3_18f 3 . 3 



[LENGTH] 24 8 

[MW] 2"7162.56 

[pi] 9.92 

[PROSITE] LEUCINE_ZIPPER 2 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 30.65 % 

[KW] COILED_COIL 12.10 % 

SEQ MGMERPAAREPHGPDALRRFQGLLLDRRGRLHRQVLRLREVARRLERLRRRSLVANVAGS 

SEG XXXXXXXXXXXXXXXXXX .XXXXXXXXXXXXXXXXXXXX . .XXX 



PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 

COILS 

MEM 

SEQ SLSATGALAAI VGLSLSPVTLGTSLLVSAVGLGVATAGGAVTITSDLSLI FCNSRELRRV 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 

SEQ QEIAATCQDQMREILSCLEFFCRWQGCGDRQLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 

COILS 

MEM 

SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 



Prosite for DKFZphtes3_18f 3 . 3 

PS00029 17->39 LEUCINE_ZIPPER PDOC00C29 

PS00029 24->46 LEUCINE ZIPPER PDOC00C29 



(No Pfam data available for DKFZphtes3_18f 3 . 3 ) 
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DKFZphtes3_1817 



group: cell structure and motility 

DKFZphtes3_1817 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins . 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat. 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-based membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 



similarity to ankyrins 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 4501 bp 

Poly A stretch at pos . 4423, no polyadenylation signal found 



1 GATCGCCGCG CGAGGGTGGT GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
51 AGGTGCCGCC GTCGCCCAGG ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
101 TAAGAGATTT GCTCTGACCC ATCTGAAGTC CATATGGCTC TGTATGATGA 
151 AGACCTCCTG AAAAATCCTT TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
201 ACTTGTGCAG CAAAGTGGCC CAAATCCATG GCATTGTCTT AGTACCCTGC 
251 AAAGGAAGCC TGTCGAGCAG CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
301 CATTTTGATA CCTGTGGAAG AGCATTTTCA GACCTTAAAT GGAAAGGATG 
351 TCTTTATTCA AGGGAACAGG ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
401 CTCTCAGTGC CCATTCTCTT TGAAGAAACT TTCTACAATG AAAAAGAAGA 
451 GAGTTTCAGC ATCCTGTGTA TAGCCCATCC TTTGGAAAAG AGAGAGAGTT 
501 CAGAAGAGCC TTTGGCACCC TCAGATCCCT TTTCCCTGAA AACCATTGAA 
551 GATGTGAGAG AGTTCTTGGG AAGACACTCC GAGCGATTTG ACAGGAACAT 
601 CGCCTCTTTC CATCGAACAT TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
651 ACCACATAGA CTCAGCGAAT GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
701 CTGAGGGACT CTCACCTGAA AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
751 CCTGATGAAG CAGGCAGTGG AGATATACGT CCATCATGAA ATTTACAACC 
801 TGATCTTTAA ATACGTGGGG ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
851 AACAAAATCA CAAGAAGCCT TCAAGATCTT CAGCAGAAAG ATATTGGTGT 
901 GAAACCGGAG TTCAGCTTTA ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
951 AGCTGAACAA ATGCACCTCC CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
1001 GTGGTGCAGC TCATTACACA GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
1051 CATGTGTGCT GATGATCTGC TATCAGTCCT GTTATACTTG CTTGTGAAAA 
1101 CGGAGATCCC TAATTGGATG GCAAATTTGA GTTACATCAA AAACTTCAGG 
1151 TTTAGCAGCT TGGCAAAGGA TGAACTGGGA TACTGCCTGA CCTCATTCGA 
1201 AGCTGCCATT GAATATATTC GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
1251 AGTCTGAGGG ATTTGGAGAC AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
1301 CTCTCTCAGA TGACTTCGTC TCCCACCGAC TGCCTGTTTA AGCACATTGC 
1351 ATCAGGTAAC CAGAAAGAAG TGGAGAGACT TCTGAGCCAA GAGGACCATG 
1401 ATAAAGATAC CGTCCAAAAG ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
1451 TGTGAGAAAC TCGTCTCTGG GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
1501 ATTCTCCAGA GACGACAGGG GGCACACCCC TCTCCATGTG GCTGCTGTCT 
1551 GTGGGCAGGC ATCCCTCATC GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
1601 AATGCCACAG ACTACCATGG GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
1651 GGGCTACCAG AGCGTGACGC TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
1701 AAGTGCAGGA CAACAATGGG AATACGCCAC TCCACCTGGC CTGCACCTAC 
1751 GGCCACGAGG ACTGTGTGAA GGCTCTGGTT TACTACGACG TGGAGTCGTG 
1801 CAGACTTGAC ATTGGCAATG AGAAAGGAGA CACCCCTCTA CACATTGCTG 
1851 CCCGCTGGGG CTACCAAGGC GTCATAGAGA CATTGCTGCA GAACGGAGCG 
1901 TCCACCGAGA TCCAGAACAG ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
1951 AAACTCAAAG ATTCTGTCTG TAATGGAAGC CTATCACCTG TCCTTCGAGA 
2001 GGAGGCAGAA GTCGTCCGAG GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
2 051 GACTCCATCA GCCAAGAGTC CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
2101 CGGCTCAAGG CAGGAGGAGA CCAAGAAGGA CTACAGAGAG GTAGAAAAAC 
2151 TTTTGAGAGC AGTTGCTGAT GGAGATCTAG AAATGGTGCG TTACCTGTTG 
2201 GAATGGACAG AGGAGGACCT GGAGGATGCG GAGGACACTG TCAGTGCAGC 
2251 AGACCCCGAA TTCTGTCACC CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
2301 CTCAGAAGAG GCTGGCGAAG GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
2351 ACCAGCCAGG ACGGCTCCTC CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
2401 GGCGGACCTC ATCCGCCTCC TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
2 451 GGAACGCAGA CCAAGCCGTC CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
2501 TTTCAGGTGG TGAAGTGTCT GTTAGATTCG AATGCAAAAC CCAATAAGAA 
2551 GGACCTCAGT GGAAACACGC CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
2 601 ACGAGCTTGT GGCACTGCTG CTACAGCACG GGGCCTCCAT TAACGCTTCT 
2 651 AACAATAAGG GCAACACAGC GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC 
2751 ACAAGCGGCA GCGCACGGCT 
2 801 ATGGAATTGC TTCAGGTGGT 
2 851 GGCTGAAACT GACCGCAAGG 
2901 GGAACTCAAA ACTGTATGAT 

2 951 TACTTTGTCC ACTCAGCTGG 
3001 TATGGCAAGA GATAGAAGTG 
3051 AGCCAGGGAG GCAAAGTGTC 
3101 AGTGGATCTC ATGCTGCTGA 
3151 TGGACTGACA CAGACTGGCC 
3201 CGGTAGAGGA TGCGGTCGTG 
32 51 TCCACTCCCC AAGAGGTTAG 
3301 TGTTGAACCC ACTGCTAGGA 
3351 TGAACACATC TGAGAACTAA 
34 01 CTTCAGCACC AAGTTCCTGA 
3451 AAAAAAGTTA ACCACCACCA 
3501 ATTGAAACAG ACAAAAATTC 
3551 GCATGCTTCT TTTTAAGTAT 

3 601 TCACCACCGC ATTCTGACCT 
3651 ACCTGTGTAC ATTCACAAAC 
3701 GCTGGAGAGA AGTAAGTAAT 
3751 TGAAATGTCA TATCTGAAGG 
3801 GCAAAGCAAC ACTCGAACCA 
3851 TTTTAGTGAA AGGATGCATC 
3901 GGGTGGTTAT CATTTTCCTT 
3951 ACACGTGCAC CTGTAGCAGT 
4001 CCTCCCTTGA ATGTCTGTCA 

4 051 TAGAGAGTAG ATTTGGCACA 
4101 AACTTAACAG CACAAACCAG 
4151 CCATTTATTC CTTTTTATAA 
4 201 TTATTGGCCT AGAGCTACAC 
4251 AATGACCTTG TGATAGGGAA 
4 301 GTGTATGTAC AGAAGGAAGG 
4351 GATTTCTAAT TTTCTAATGT 
4 401 AAACAGTAAA CTTTATGATT 
4451 AAAAAAAAAA AAAAAAAAAA 
4501 G 



TGCTCCACGG AGCGTCAGTT CAGGTGCTGA 
GTAGACTGTG CTGAACAGAA TTCAAAAATA 
ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
CTACCAGATG AGCCTTTTAC AAGACAGTTT 
TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
TCCCTAATTT AACCGAAGGT TCTTTGCATG 
ACACTGAGAC AGAATAACCT GCCAGCTCAG 
GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
CTGGACACAG ACGGATGCTG CGGAGACACA 
TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
AGCAAGGATG CAACAAGATG ATGCTGAGCG 
ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
TCTCTCTCCT CTTCAAAGCT AATGAATACA 
CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
AAAGATGCCT CAATCCCATT TTGATATTCA 
AGACCTGTTC CACATCATGC ACATGGGAAA 
CTAACAAGTA GGTACAGATA TTCGGTTACT 
ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
CACTCACACC TGACGGGATG GTTACTGGAT 
TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
ATTTCTATAG ATTATACTGT TATTTTTATG 
GTATATGGGT TTGTCCTGAG TCCGTTTTCA 
ATGGTTTTGT CCATGTTCTT GGAAATACTT 
GAGGGATTAT TTTTCTACAA AGTAATTTAT 
GCCTTGGATA TGTGCCAAAT GATGGAAAAG 
CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 



BLAST Results 



No BLAST result 



"Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP GTP A (945-953) 



1 MALYDEDLLK 
51 CQFESYILIP 
101 YNEKEESFSI 
151 RFDRNIASFH 
201 QEAQMNLMKQ 
251 QKDIGVKPEF 
301 RVNLETMCAD 
351 CLTSFEAAIE 
401 LFKHIASGNQ 
451 PSVVTPFSRD 
501 HLACQKGYQS 
551 YDVESCRLDI 
601 PLKCALNSKI 
651 FSSMSAGSRQ 
701 DTVSAADPEF 
751 AALHGRADLI 
801 AKPNKKDLSG 
851 VIEKHVFVVE 



NPFYLALQKC 
VEEHFQTLNG 
LCIAHPLEKR 
RTFRECERKS 
AVEI YVHHEI 
SFNIPRAKRE 
DLLSVLLYLL 
YIRQGSLSAK 
KEVERLLSQE 
DRGHTPLHVA 
VTLLLLHYKA 
GNEKGDTPLH 
LSVMEAYHLS 
EETKKDYREV 
CHPLCQCPKC 
RLLLKHGANA 
NTPLI YACSG 
LLLLHGASVQ 



RPDLCSKVAQ 
KDVFIQGNRI 
ESSEEPLAPS 
LRHHIDSAMA 
YNLI FKYVGT 
LAQLNKCTSP 
VKTEIPNWMA 
PPESEGFGDR 
DHDKDTVQKM 
AVCGQASLID 
SAEVQDNNGN 
IAARWGYQGV 
FERRQKSSEA 
EKLLRAVADG 
APAQKRLAKV 
GARNADQAVP 
GHHELVALLL 
VLNKRQRTAV 



IHGIVLVPCK 
KLGAGFACLL 
DPFSLKTIED 
LYTKCLQQLL 
MEASEDAAFN 
QQKLVCLRKV 
NLSYIKNFRF 
LFLKQRMSLL 
CHPLCFCDDC 
LLVSKGAMVN 
TPLHLACTYG 
IETLLQNGAS 
PVQSPQRSVD 
DLEMVRYLLE 
PASGLGVNVT 
LHLACQQGHF 
QHGASINASN 
DCAEQNSKIM 



GSLSSSIQST 
SVPILFEETF 
VREFLGRHSE 
RDSHLKMLAK 
KITRSLQDLQ 
VQLITQSPSQ 
SSLAKDELGY 
SQMTSSPTDC 
EKLVSGRLND 
ATDYHGATPL 
HEDCVKALVY 
TEIQNRLKET 
SISQESSTSS 
WTEEDLEDAE 
SQDGSSPLHV 
QWKCLLDSN 
NKGNTALHEA 
ELLQVVPSCV 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_1817, frame 2 

TREMBL:HSU43965_1 gene: " ANK3 " ; product: "ankyrin G119"; Human ankyrin 
G119 (ANK3) mRNA, complete cds . , N = 2, Score = 287, P = 3.7e-21 

PIR:I49502 ankyrin - mouse, N = 3, Score = 365, P = 2.2e-27 

TREMBL : HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 

ankyrin (variant 2.1), N' = 2, Score = 380, P = 7.3e-31 

SWISSPR0T:ANK1_HUMAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) (ERYTHROCYTE 
ANKYRIN)., N = 2, Score = 380, P = 8.2e-31 

PIR:SJHUK ankyrin 1, erythrocyte splice form 1 - human, N = 2, Score = 
380, P = 8.2e-31 



>TREMBL:HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length = 1,719 

HSPs: 



Score 


= 380 


(b/.U Dits), Expect = /.de-Ji, bum ir(z) — /.Je-Jl 




Identities = 


= 139/447 (31%), Positives = 207/447 (46%) 




Query: 


4 62 


Kuril FLU V Art vt^tjlJAb Jj 1 UJjIj V bi\oAI*lvNA 1 UinbA 1 r.briljAL.lJft-0 I Ub V 1 uLiLiLiri I KAo 


D £. 1 






-i_T> r Ux7\ 7> i f{*\ J- J- TTTj. r™ 7\ (7M TV C TDT 1~~\-L. -1- \t TT 71J- 

+(j+l Lti-rPJ\+ L,y -*-+■ JjV+ i^rPi \JPIA Lj 1 FLi-ttA (Jt + V Jjjj A+ 




Sbjct: 


77 


KGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGAN 


136 


Query: 


522 




5 58 






17 A./"" TBT J. 7\ /"up l 1/ T -L, V -L. DT 

V xtj 1 rii tA (jirltjT V Lit i t KL» 




Sbjct : 


137 


QNVATEDGFTPLAVALQQGHENVVAHLINYGTKGKVRLPALHIAARNDDTRTAAVLLQND 


196 


Query: 


559 


DIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 


615 






D+- ++ G TPLHIAA + V + LL GAS + TPL A S+ +V + 




Sbjct: 


197 


PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTPQNGITPLHIA — SRRGNVIM 


254 


Query: 


616 


AInijat LKKUKbbfcjAF Vljb r(JKb V UbibQfcjbblb br bbMbAGbK— (Jfc,hj 1 KKD I Ktv CjJ\1j 


b / J 






L +R + E + + ++ S + G+ Q +TK + 




Sbjct: 


255 


V-RLLLDRGAQI-ETKTKDELTPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHM- 


311 


Query: 


674 


LRAVADGD-LEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPA 


732 






A GD L+ VR LL++ E ++D T+ P H C R+AKV 




Sbjct: 


312 


AAQGDHLDCVRLLLQYDAE-IDDI — TLDIILTP--LHVAAHC GHHRVAKVLL 


358 


Query: 


733 


S-GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQ 


791 






G N + +G +PLH+A ++ LLLK GA+ A PLH+A GH 




Sbjct: 


359 


DKGAKPNSRALNGFTPLHI ACKKNHVRVMELLLKTGASI DAVTESGLTPLHVASFMGHLP 


418 


Query: 


792 


VVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAV 


851 






+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 




Sbjct: 


419 


IVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAA 


478 


Query: 


852 


IEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQW 896 








H +V+LLL + A++ T+A+ ++L+^ 




Sbjct: 


479 


RIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALL 52 3 




Score 


= 378 


(56.7 bits), Expect = 1.2e-30, Sum P(2) = 1.2e-30 




Identities = 


■ 130/447 (29%), Positives = 195/447 (43%) 




Query: 


4 65 


TPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 


524 






TPLH AA G + ++L+ GA + A +G +P+H+A Q + LLL Y A + 




Sbjct: 


274 


TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDD 


333 


Query: 


525 


QDNNGNTPLHLACTYGHEDCVKALVYYDVE SCR 


557 






+ TPLH+A GH K L+ + +C+ 




Sbjct: 


334 


ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 


393 


Query: 


558 


LDIGNEKGDTPLHI AARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 


614 






+ D EG TPLH+A+ G+ +++ LLQ GAS + N ETPL A + V 




Sbjct: 


394 


GASI DAVTESGLTPLHVASFMGHLPI VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVA 


453 






642 





12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Que ry : 


0 ± 0 


C7AVUT QFFPPnTfC!CTr7N nUACT)ADC T 7nCT CDF CT QQFCQMC; RnFFTKKnYRFVFKT.T, 


674 






_1_ V T -L -L. -I_ /~\ t V*t T -4- -t-A T f 

+ I L T T T Q+P -L + +ft * ■ Lj 




Sbjct: 


454 


K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 


508 


Query. 


O / 3 


r AWAnfiriT.FMVR yt t PWTrrnr FnaFnTV^AAnPFFPHPT.rnrPKrAPAriKRr.AKVPA^f^ 


734 






7A A-C 4-F U T T U 1 4-4- AT P T4 4- K A4- T 4- 




Sbjct: 


509 


I AAREGHVETVLALLE KEASQACMTKKGFTP — LHVAAKYGKVRVAELLLER D 


559 


Query. 


735 


J_lO V IN V J. •jyUOOij rilfl V nnljnij tVrt LJLiX I\Jj.Lj.Ljr\.rivJrtl^rt*Jrtr\lN£"! lU ^rt VTL 1 1 J_irt^— . \£\£\JIL £ \,/ V V 1\. 


794 






M -L.-L.C J.DT UWT\ U ri4-4-4-T T T C 4- 4- a. DT U4-7A 4-P -i-\7 4. 




Sbjct: 


560 


AHFNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVAR 


619 


Query : 


1 J 0 


n T FtQKlA VPMKTf m ^CWTPT T VaPQr^HHFT.VAT T T nHf^A^TMA^WNKf^KITAT.T-IFAVTFK 


854 






LL N + + G TPL A GH E+VALLL A+ N N G T LH E 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEG 


679 


Query : 


855 


HVt V VhjijLijJjrlljAb VIJV LiDJ l\KyKi A V ULAhiy iNbrAlMEjljlj oyj 








HV V ++L+ HG V + T + A N K+++ L 




Sbjct: 


680 


HVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 720 




Score 


= 367 


(DD.i Dies J , bxpect — l . oe-zi)/ bum f \c. ) — Loe-zy 




Identities = 


= 131/489 (26%), Positives = 210/489 (42%) 




Query: 


404 




4 60 






UT AC P M \7 T T 4- 4- 4. DTP 4. 4-C T T*1 4-4- 
rl _LAb oN V Li Li +■ t + rL L t ta Li U +t 




Sbjct: 


244 


HIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRISEILLDHGAPIQ-AKT 


302 


Query: 


461 


nnPHTPT Huaauonna tht t " l ;cK , (^:7iM\7MA , T'nvHr;ATPT ht APHKrivri^UTT ttt hvkh 

DKbn 1 IT Liti v rtH V LuU"^L ± LJJj Jj V o IxorVIri V IN ft 1 LJ I nuri 1 tr Litl Jjrt^l^l\Lj lyov 1 ijJjijijn I I\rt 


520 










Sbjct: 


303 


KNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHCGHHRVAKVLLDKGA 


362 


Query: 


521 


C A "/"\r*lXT\T/"'M l T l n T U T 7\ /"TVPLirnPUL' A T 1* W "MfPCfOT r\T fMPffnTDT tl T 7\ t\ DHr'vnfi 7 

bAhjVyiJNNLiN 1 FLiMLiA^ 1 I vjritjlJ(^ VKALiV 1 1 UV&oUKljlJilaNnjlSLjlJl cLril AAKWtj I IJtjV 


con 
Do U 






+ NG rPLn+AC n ++ L+ +D h. G 1 rL.n+A+ G+ + 




Sbjct: 


363 


KPNSRALNGFTPLHIACKKNHVRVMELLLK TGASIDAVTESGLTPLHVASFMGHLPI 


419 


Query: 


581 


T DrriT t r~\\ir^ JCTCTAMDI i/fTDT wfnr MClfTT CinunTTtVUT CITC'DIilMi'CCrTlDUriCDriU 

1 £.1 LiliLJWijAb 1 ti lyNKLilNIli 1 f J_it\l«,Alj WbMLbVl'li.a IHLbt HdKKljrS.bbll.Air VL)b f IJK 


DJ / 






■ ■ T T ^ ^ j_>iT rimnT tl _i_L* _i_T~i_i_ T*> 

+ + LLQ GAS +■ N LLPL A ++ + + + + K + P+ K 




Sbjct: 


420 


VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAAR 


479 


Query : 


638 


— — — — C\7I*tCT COrCCTCCffCCMCJii* , CPnirPTI*IfnVPFUiriif T T R Ji'i/L.riPriT FMVRVT T FTjOTF 
JvUoloyLiJO 1 O O " O bnOnl"i3t\L)ljD 1 nl\Ul KCi V Cj I\ J_i J_i KM V rt L-'Li UJjCj1 v 1 v Kl J_ji_iCiln(l 


693 










Sbjct: 


480 


IGHTNMVKLLLENNANPNLATTAGHTPLHIAAREGHVETVLALLEKEASQACMTKKGFTP 


539 


Query: 


694 


bULiCiUHEjUl V D/WU PLC ^tlfij^V *— t ~fv^AKMyr\KijMI\Vr ft OuiiO V IN V 1 O 


id 1 

f H X 






T V ft+ rlf f A I) V \J T T 




Sbjct: 


540 


LHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPA 


599 


Query: 


742 


\{U\jO O r lill V ftrtljri>jKjr\l-'l_i J. KIjIjIj t\ llwrt LNrt\3rt t\lNrt U^rt VCbn iirtljy^ljnr ^ V V IX"^, 1_» J_j L/OUrl 


801 






J_r^ J.DT LJ _i_ TV 7\ j_ j_ j_ □ T T _i Li" 1 Ti 7> i DT LIT 7\ 1" TJ J.4.\7 T T A 

tb tr Liti+AA +- ++■ K IiL + t-Ij +A A + r JjitlijA v+kjJi + + V LiJj A 




Sbjct: 


600 


WNGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQA 


659 


Query: 


802 


ifDMif ifriT crMTDT TynrGrruucT \/qt r t nur atirKzicMMKrMTfli uitzi\7tc"K"ui7 f\/ w it t 
rxrNtSRLJjbbWl irijl I A^btjtjnrltljlj v MLLLynbrtb 1 IN AbPJ NMjN ImLiIIc-A V IJjMIVE V V L.Li 


ODl 






N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ 




Sbjct: 


660 


NGNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKF 


719 


Query: 


862 


T T T U(~ A C17l~i"i7r M k" Q "7 /I 

LiliLirlijiAb Vy V LiIN fv 0 / H 








LL H A V K 




Sbjct: 


720 


LLQHQADVNAKTK 7 32 




Score 


= 345 






Identities = 


= 146/506 (28%), Positives = 233/506 (46%) 




Query: 


404 


14 T D. Q — — P M f"M(T IT "i 7 IT DT T QflfiTlH nKTlTUrrK" Mr* 14 DIP FC^T^^^^VK^ \7QPR1 HFlP^WTDrQ 


458 






I-14-IiQ P+ tc T7 t r j-C 1 i -p j. if 14 4_4-4,w 4_K1 4- \7 4- 

tlTAb VjT-KV LL T£j T 1 T[\ H 1^T + V I IN TV T 




Sbjct: 


50 


HLASKEGHVKMVVELLHKEIILETTTKKGNTALHI AALAGQ-DE VVRELVNYGANVN — A 


106 


Query: 


459 


RDDRGHTPLHVAAVCGQASIjI DLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHY 


518 






4. 4-p TDT 4-4-iia 4-4- T 4- r**7i M f TDT 4-7A ri4-r 4- 4- 4-\7 T 4-4-V 
+ +\j 1 Jr 'MA • ' Jjt OA W 1 trJ-t +M LJtljTTtV L + t i 




Sbjct: 


107 


QSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVALQQGHENVVAHLINY 


166 


Query: 


519 


KA^AFVOnNNGNTP-T.HT.ArTYGHFnrVKAT.VYYnUF^rRT.DTGNIFKGDTPT.HT AARWGY 


577 




+ V+ f LipItA ttU A V + L7t t+ O 1 fJjrtl AM ■ 




Sbjct: 


167 


GTKGKVR LPALHIAAR--NDDTRTAAVLLQNDP-NPDVLSKTGFTPLHIAAHYEN 


218 


Query: 


578 


QGVIETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQS 


634 






V + LL GAS + TPL A N ++ ++ E + K P+ 




Sbjct: 


219 


LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHC 


278 


Query: 


635 


PQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGD-LEMVRYLLEWTE 


693 






R+ E + + A +TK + A GD L+ VR LL++ 




Sbjct: 


279 


AARNGHVRISEILLDHGAPIQA KTKNGLSPIHM AAQGDHLDCVRLLLQYDA 


329 






643 





12/13/10, EAST Version: 2.4.2.1 
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Oi lprv ■ 


694 


EDLEDAE-DTVSAAD-PEFC--HPLCQC PK CAPAQKRLAK 


729 






E ++D D++ CH + + P C R+ + 




Sbjct: 


330 


E-IDDITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSFALNGFTPLHIACKKNHVRVME 


388 


Qu s it y i 


730 


VPA-SGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQG 


788 






+ +G ++ ++ G +PLHVA+ G +++ LL+ GA+ N PLH+A + G 




Sbjct: 


389 


LLLKTGASIDAVTESGLTPLHVASFMGHLPIVKNLLQRGASPNVSNVKVETPLHMAARAG 


448 


Query: 


789 


HFQVVKCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGAS INASNNKGNTALH 


848 






H +V K LL + AK N K TPL A GH +V LLL++ A+ N + G+T LH 




Sbjct: 


449 


HTEVAKYLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLH 


508 


Query: 


849 


EAVI EKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIM — ELL 893 








A E HV V LL AS + K+ T + A + K+ ELL 




Sbjct: 


509 


IAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELL 555 




Score 


= 243 


(36.5 bits), Expect = 1.6e-14, Sum P(2) = 1.6e-14 




Identities = 64/199 (32%), Positives = 97/199 (48%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 


4 61 






H+A+ G + E LL ++ H + PL L +L P +P S 




Sbjct: 


541 


HVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPAW 


600 


Query: 


4 62 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 






G+TPLH+AA Q + L+ G NA G TPLHLA Q+G+ + LLL +A+ 




Sbjct: 


601 


NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 


660 


Query: 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVI 


581 






+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGV MVDATTRMGYTPLHVASHYGNIKLV 


717 


Query: 


582 


ETLLQNGASTEIQNRLKETPL 602 








+ LLQ+ A + +L +PL 




Sbjct: 


718 


KFLLQHQADVNAKTKLGYSPL 7 38 




Score 


= 242 


(36.3 bits), Expect = 5.0e-29, Sum P(2) = 5.0e-29 




Identities ■ 


■ 63/176 (35%), Positives = 92/176 (52%) 




Query: 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 






G VN T Q+G +PLH+A+ G ++RLLL GA + D+ PLH A + GH ++ 




Sbjct: 


229 


GASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRIS 


288 


Query: 


794 


KCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGAS INASNNKGNTALHEAVIE 


853 






+ LLD A K +G +P+ A G H + V LLLQ+ A 1+ T LH A 




Sbjct: 


289 


EILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHC 


348 


Query: 


854 


KHVFVVELLLLHGA — SVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAET 909 






H V ++LL GA + + LN + C + + ++MELL AS+D V E+ 




Sbjct : 


349 


GHHRVAKVLLDKGAKPNSRALNGFTPLHI ACKKNHVRVMELLLKTG ASIDAVTES 4 03 


Score 


= 242 


(36.3 bits), Expect = 3.3e-14, Sum P(2) = 3.3e-14 




Identities ■ 


= 80/284 (28%), Positives = 129/284 (45%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDD 


461 






HIA+ G+ + V LL +E +K PL K+ L P + 




Sbjct: 


508 


HIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGK 


567 


Query: 


4 62 


RGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 


521 






G TPLIIVA ++ LL+ +G ++ ++G TPLH+A ++ V LL Y S 




Sbjct: 


5 68 


NGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGS 


627 


Query: 


522 


AEVQDNNGNTPLHLACT YGHEDCVKALVYYDVESCRLDIGNEKGDTPLHI AARWGYQGVI 


581 






A + G TPLHLA GH + V L+ ++GN+ G TPLH+ A+ G+ V 




Sbjct: 


628 


ANAESVQGVTPLHLAAQEGHAEMVALLLSKQANG NLGNKSGLTPLHLVAQEGHVPVA 


684 


Query: 


582 


ETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPV-QSPQR 


637 






+ L+++G + R+ TPL A N K++ + + + K +P+ Q+ Q+ 




Sbjct: 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGKIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


744 


Query: 


638 


S-VDSISQ — ESSTSSFSSMSAGSRQEETKK — DYREVEKLLRAVAD 679 








D ++ ++ S S G+ K Y V +L+ V D 




Sbjct: 


745 


GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKVVTD 7 91 




Score 


- 235 


(35.3 bits), Expect = 7.9e-34, Sum P(2) = 7.9e-34 




Identities = 


= 58/165 (35%), Positives = 83/165 (50%) 




Query: 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 






G N S G +PLH+AA G A+++ LLL AN N PLHL Q+GH V 




Sbjct: 


625 


GGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEGHVPVA 


684 






644 
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Query : 


794 


KCLLDSNAKPNKKDLSGNTPLI YACSGGHH3LVALLLQKGASINASNNKGNTALHEAVIE 


853 






L+ + G TPL A G+ +LV LLQH A +NA G + LH+A + 




Sbjct: 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


744 


Query: 


854 


KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS — KIMELLQVV 896 








H +V LLL +GAS ++ T + A++ + ++L+VV 




Sbjct: 


745 


GHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKVV 789 




Score 


= 233 


(35.0 bits), Expect = 7.9e-34, Sura P(2) = 7.9e-34 




Identities = 


= 67/202 (33%), Positives = 100/202 (49%) 




Query: 


404 


HIAS-GNQKEVERLLSQEDHDKDTVQKMCH — PLCFCDDC-EKLVSGRLNDPSVVTPFSR 


459 






H+A+ G+ + RLL Q D + D + + H PL C V+ L D P SR 




Sbjct: 


310 


HMAAQGDHLDCVRLLLQYDAEIDDIT-LDHLTPLHVAAHCGHHRVAKVLLDKGA-KPNSR 


367 


Query: 


4 60 


DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 




G TPLH+A +++LL+ GA ++A G TPLH+A G+ + LL 




Sbjct: 


368 


ALNGFTPLHIACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLPIVKNLLQRG 


427 


Query: 


520 


ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 


579 






AS V + TPLH+A GH + K L+ +++ + TPLH AAR G+ 




Sbjct: 


428 


ASPNVSNVKVETPLHMAARAGHTEVAKYLLQ NKAKVNAKAKDDQTPLHCAARIGHTN 


484 


Query: 


580 


VIETLLQNGASTEIQNRLKETPLKCA 605 








+++ LL+N A+ + TPL A 




Sbjct: 


485 


MVKLLLENNANPNLATTAGHTPLHIA 510 




Score 


= 226 


(33.9 bits). Expect = 7.0e-33, Sum P(2) = 7.0e-33 




Identities ■ 


= 53/153 (34%), Positives = 83/153 (54%) 




Query: 


743 


DGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNAK 


802 




+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct: 


601 


NGYTPLHIAAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQAN 


660 


Query: 


803 


PNKKDLSGNTPLI YACS GGHHELVALLLQHGAS INASNNKGNTALHE AVI EKHVFVVELL 


8 62 






N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFL 


720 


Query: 


8 63 


LLHGASVQVLNKRQRTAVDCAEQ--NSKIMELL 893 








LHAV K + + A Q ++ 1+ LL 




Sbjct: 


721 


LQHQADVNAKTKLGYSPLHQAAQQGHTDIVTLL 753 




Score 


= 198 


(29.7 bits), Expect = 2.5e-ll, Sum P(2) = 2.5e-ll 




Identities = 


= 51/157 (32%), Positives = 82/157 (52%) 




Query: 


737 


VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 


796 






+ T++ G++ LH+AAL G+ +++R L+ +GAN A+ + PL++A Q+ H +VVK L 




Sbjct : 


71 


LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFL 


130 


Query: 


797 


LDSNAK PNKKDLSGNTPLI YACSGGHHELVALLLQHGAS I NASNNKGNTALHEAVIEKHV 


856 






L++ AN G TPL A GH +VA L+ +G ALH A 




Sbjct: 


131 


LENGANQNVATEDGFTPLAVALQQGHENVVAHLINYGTK GKVRLPALHIAARNDDT 


186 


Query: 


857 


FVVELLLLHGASVQVLNKRQRTAVDCAE — QNSKIMELL 893 








+LL + + VL+K T + A +N + +LL 




Sbjct: 


187 


RTAAVLLQNDPNPDVLSKTGFTPLHIAAHYENLNVAQLL 225 




Score 


= 186 


(27.9 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 




Identities = 


= 55/143 (38%), Positives = 68/143 (47%) 




Query : 


4 63 


GHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASA 


522 






GHTPLH+AA G + L+ K A G TPLH+A + G V LLL A 




Sbjct: 


503 


GHTPLHI AAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHP 


562 


Query : 


523 


EVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARHGYQGVIE 


582 






NG TPLH+A + + D VK L+ S N G TPLHIAA+ V 




Sbjct: 


563 


NAAGKNGLTPLHVAVHHNNLDI VKLLLPRG-GSPHSPAWN — GYTPLHIAAKQNQVEVAR 


619 


Query: 


583 


TLLQNGASTEIQNRLKETPLKCA 605 








+LLQ G S ++ TPL A 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLA 642 




Score 


= 182 


(27.3 bits), Expect = 2.9e-28, Sum P(2) = 2.9e-28 




Identities = 


= 54/185 (29%), Positives = 89/185 (48%) 




Query: 


738 


NVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLL 


797 






N+ ++ G +PLH+ AG + +L+KHG A PLH+A G+ ++VK LL 




Sbjct: 


662 


NLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLL 


721 


Query: 


798 


DSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVF 


857 






A N K G +PL A GH ++V LLL++GAS N ++ G T L A ++ 
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Sbjct: 


722 


QHQADVNAKTKLGYSPLHQAAQQGHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 


781 


Query : 
Sbjct: 


858 
782 


VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAETDRKEYVTV 
V ++L + V++ V+S PV + DV+E + +E ++ 
VTDVLKV VTDETSFVLVSDKHRMS FPETVDEILDVSEDEGEELISF 


917 
827 


Query: 


918 


KIRKK 922 
K ++ 




Sbjct: 


828 


KAERR 832 




Score 


= 180 


(27.0 bits), Expect = 5.0e-29, Sum P(2) = 5.0e-29 





Identities = 41/121 (33%), Positives = 67/121 (55%) 



Query: 


486 


GAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCV 


545 




G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 




Sbjct: 


35 


CvnTNTrNrtTtfGT WGT.HT.A^KFGHvTCMvVFT.T.HKFT TLETTTKKGNTALHI AALAGODEVV 


94 


Query: 


546 


KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 


605 






4- LV Y ++ 4-4-KG TPL++AA+ + V++ LL+NGA+ + TPL A 




Sbjct: 


95 


T5TTT UMV (TlTiTvTVNriin^riK'^FT PT.VMA APiFMHI .FWK FT.T.FNGANOTJ VAT F.PjGFT PTiAUA 


151 


Query: 


606 


L 606 
L 




Sbjct: 


152 


L 152 




Score 


- 166 


(24.9 bits), Expect = 3.4e-06, Sum P(2) = 3.4e-06 




Identities = 


= 89/318 (27%), Positives = 140/318 (44%) 




Query: 


448 




507 




L + + V -T-+DD+ TPLH AA G +++ LL+ AN G TPLH+A + + G 




Sbjct: 


457 


LQNKAKVNAKAKDDQ — TPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLHIAAREG 


514 


Query: 


508 


YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 


552 






+■ L LL +AS G TPLH4-A YG + L+ D 




Sbjct: 


515 


1417FTWT 7AT T F PTF A Qi"l7A(" , MTK'Kf" FT DT H\7A AK"Vni!fVI5\7AFT T T.FR HAH PM A AGKNGT .T PT.H 


574 


Query: 


553 


— VESCRLDI GNE KGDTPLHIAARWGYQGVIETLLQNGASTEIQNRL 


597 






V LDI G+ G 7PLHIAA+ V +LLQ G S + + 




Sbjct: 


575 


i/ewuummt nn/tfT t t Dcrr q dh c p a wmpvt dt ht a AKnNnuFVAR^T.T.rtYr^r^&NAF^vo 


634 


Query: 


598 


KETPLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSM-SA 


656 




TPL A M A LS +Q + +S + ++QE + 




Sbjct: 


635 


GVTPLHLAAQEGHAE-MVALLLS KQANGNLGNKSGLTPLHLVAQEGHVPVADVLIKH 


690 


Query: 


657 


GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 


716 






G 4- T + LA G++++V++LL+ + D+ +A+ + + PL Q 




Sbjct: 


691 


GVMVDATTR--MGYTPLHVASHYGNIKLVKFLLQH-QADV-NAKTKLGYS PLHQ 


740 


Query: 


717 


CPKCAPAQKRLAKVPASGLGVNVTSQDGSSPLHVA 751 








+ + + +G N S DG++PL +A 




Sbjct: 


741 


flanrif^HTnT -VTT T T KNnA^PMFV^^nGTTPT.AIA 774 




Score 


= 162 


(24.3 bits), Expect = 1.8e-07, Sum P(2) = 1.8e-07 




Identities » 


= 48/149 (32%), Positives = 71/149 (47%) 




Query: 


737 


VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 


796 




V D ++ AA G D L++G + N + LHLA ++GH ++V L 




Sbjct : 


5 


VGFREADAATSFLRAAR3GNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 


64 


Query: 


7 97 


LDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 


856 




L GNT LAG E+V L+ +GA++NA + KG T L+ A E H+ 




Sbjct: 


65 




124 


Query: 


857 


FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 








W+ T T 4-P,A4- V + T + AO 




Sbjct: 


125 


EVVKFLLENGANQNVATEDGFTPLAVALQ 153 




Score 


= 158 


(23.7 bits), Expect = 5.7e-26, Sura P(2) = 5.7e-26 




Identities = 


= 38/135 (28%), Positives = 65/135 (48%) 




Query: 


4 60 


DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 






+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 




Sbjct: 


42 


NQNGLNGLHLASKEGHVKMVVELLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYG 


101 


Query: 


520 


ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQG 


579 






A+ Q G TPL++A H + VK L+ ++ E G TPL +A + G++ 




Sbjct : 


102 


ANVNAQSQKGFTPLYMAAQENHLEVVKFLLE— NGANQNVATEDGFTPLAVALQQGHEN 


158 


Query: 


580 


VIETLLQNGASTEIQ 594 








V+ L+ G +++ 




Sbjct: 


159 


VVAHLINYGTKGKVR 173 
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Score = 115 (17.3 bits), Expect = 1.8e-21, Sum P(2) = 1.8e-21 
Identities = 37/119 (31%), Positives = 58/119 (48%) 

Query: 4 97 ATPLHLACQKGYQSVTLLLLHYKASAEVQ — DNNGNTPLHLACT YGHEDCVKALVYYDVE 554 

AT A + G ++ L H + ++ + NG LHLA GH V L++ ++ 
Sbjct: 13 ATSFLRAARSG— NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMWELLHKEI I 70 

Query: 555 SCRLDIGNEKGDTPLHI AARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 

L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

Sbjct: 71 LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVV 127 

Query: 615 E 615 
+ 

Sbjct: 128 K 128 

Score = 106 (15.9 bits). Expect = 1.8e-01, Sum P(2) - 1.6e-01 
Identities = 34/121 (28%), Positives = 54/121 (44%) 

Query: 769 NAGARNADQAVPLHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 

+ G R AD A A + G+ L + N + +G L A GH ++V 

Sbjct: 4 SVGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

Query: 829 LLQHGASINASNNKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 

LL + + KGNTALH A + VV L+ +GA+V +++ T + A Q + 

Sbjct: 64 LLHKEIILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENH 123 

Query: 889 I 889 
+ 

Sbjct: 124 L 124 

Score = 40 (6.0 bits), Expect = 1.6e-14, Sum P(2) = 1.6e-14 
Identities = 11/56 (19%), Positives = 23/56 (41%) 

Query: 622 ERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ+++Q++ Q++ +K++R V 

Sbjct: 1614 DRRQQGQEEQVQEAKNTFTQVVQGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKV 1669 

Score = 38 (5.7 bits), Expect = 2.6e-14, Sum P(2) = 2.6e-14 
Identities = 6/12 (50%), Positives = 10/12 (83%) 

Query: 806 KDLSGNTPLIYA 817 

+D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 



Pedant information for DKFZphtes3_1817, frame 2 



Report for DKFZphtes3_1817 . 2 



[LENGTH] 
[MW] 
[pl] 
[HOMOL] 
complete 
[FUNCAT] 
[FUNCAT] 
3e-12 
[FUNCAT] 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-08 
[FUNCAT] 
[FUNCAT] 
5e-05 
[FUNCAT] 
[ FUNCAT ] 
5e-05 
[ FUNCAT ] 
[FUNCAT] 
[BLOCKS] 
[SCOP] 
[EC] 
[PIRKW] 
[PIRKW] 



cds . 



1050 

117013.72 
6.47 

TREMBL:DMANKY_1 product: "ankyrin"; Drosophila melanogaster ankyrin mRNA, 
2e-45 

08.19 cellular import [S. cerevisiae, YOR034c] 5e-13 

10.05.99 other pheromone response activities [S. cerevisiae, YDR264c] 



03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YDR264c] 3e-12 

99 unclassified proteins [S. cerevisiae, YlL112w] 2e-ll 

06.13.01 cytoplasmic degradation [S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e-08 
04.05.01.07 chromatin modification [S. cerevisiae, YIR033w] 2e-08 



01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YGR233c] 



08.13 vacuolar transport [S. cerevisiae, YML097c] 5e-05 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YML097c] 

30.03 organization of cytoplasm [S. cerevisiae, YML097C] 5e-05 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

03.22 cell cycle control and mitosis [S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllc] 3e-04 
BL00901A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1.91.3.1.2 GA binding protein (-GABP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus le-13 
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DTR IfW 1 
if X C\l\ /V J 


pu idosi mil L-iicinne x os i o 


' DTRTfWI 


COtiy piULcJ.ll 4C 1J 


■ p i p.KW ] 


t*nmnT simnrp^^nr 1c- HQ 

l_ L_llLl^ J_ O I-*- .1- C VJ I, X C 


PIRKW] 


rin ril if"rit"ion 1 p — 1 4 


r J- r\ rv vv j 


t"^nrlPTH rpnpflt' 1 »— 1 Q 
Lou Jcni i-cpco u XC X _^ 


PIRKW ] 


lit; Lex cjcxxiue l it 11 


tr x r\r\ »v j 


L*LJ La J LULL Llali oUUl L J C X J 


P I RKW ] 


("■oil pvrl o An +• 1 la— in 

L.C X X *— J" — 1- C i-Ull LLUl XC 1U 


' p i RKW ] 


se rine / threonine - "* speci f ic protein kinase Is - 19 


PTRKWl 


t r ansmeniDr dn@ protein 5 s — 15 


PIRKW ] 


L-xclllojyvJxL piULcllt .J c XJ 


PIRKW] 


DNA binding 2e-ll 


P I RKW ] 


UllLU^CiiC xc vu 




ATP 1 e-~ 1 9 


' PTRTCW 1 


nrrit-o i r W "i r~i a q p iriVtiVti't'i'ii'" 1 o-HQ 

pi U LCXll nlUOOC XlIllJ.UXL.vJX XC U J 


' D T R TfM 1 


UU1 LaUc — uflLcu _njii Liiaiuici Jc xj 


' D T RTf W 1 


phosphopr o t e i n 4 e — 3 8 


tr x t\rv w j 


3n r\n1" r»c "i c 1 a-1 Q 
opup LUoxa xe xj 


PIRKW ] 


X X V C X 'JtS VJ 3 


PIRKW] 


111 Ley Llll DXIlUXIly JC ID 


PI RKW ] 


/"I i f f oronf'i a +- i on 9a - 1 7 
Ul llctcllLXaLXUll ^.ri X ^ 


' DTRTfW 1 


transforming protein 1 e — 08 


DTB Tf TaT 1 
, tf X JTCfv/V J 


al T?fl 1 i pi fiA 1 A — A ll 
ct X Lcilla L X v c apil< Liiy XC ID 


' P T R ITTaT 1 


coiled coil 1 e — 14 


' P T D If TAI 1 
, C X KJS./V J 


pSlipilcrdl lilcIILU Ldllc px.CJLtrx.Il ^tr Jo 


' OTBTfUl 

rinM J 


trans crip t ion factor 4 e~ 1 6 


P I RKW ] 


LlalloLIlJJLlUll Icy UlaL J.U11 e- C lu 


. r j.t\is.yv J 


nucleotide binding 5e — 15 


' PTRJf Wl 


pilOapilCJx XC lllUIlCJci Lex IiyUiUlaic IC XL. 


PIRKW] 


LyLUoKclc LUll Cc 


' PTRKWl 


1 mrtHn 1 i n ViinrlinfT 1 p — 1 Q 
Laiiuuuuxxii uxiiuiuy xc x _/ 


DTD VT«I 1 

_ Jr IKfvW J 


smooth muscle le- 12 


JUrt rti*l J 


ankyrin le — 40 


jure t\vi j 


Hoa t"h acenri afoH r"iT"/"^t"Pin H na^fi 1 p — 1 Q 

LLC O. L 11 agoULluLCU LClll MlLaOC XC J- _> 


oUrl Ml v l | 


dnKyxxii lepcoL i loiiLCJX oy y x c iu 


.SUrt -Hl v l J 


nvAhai r Vinaco V~i/^ifrij^l r\n\f 1 a - 1 Q 
piULcxn jv-Xiiaoc i icjuivj x vjy y xe i 3 


, O U C t Hll J 


vdLLXUXu vil Uo t- i . tu ninui l _ Vj pi j Lcxil iivjiiivj x uyy ~j c vj / 


^Url him J 


i nf»^ t - nancf"nrmi nn nrAhoi n 1 a— An 
XII L j L X a ii x u x inxiiy piULciii xc vO 




UliaSdlCjneQ <anKyJ-±Il xtijj^cIL pxCJLeXIls xc jo 


;supfam] 


notch protein 2e-12 


;supfam] 


fowlpox virus BamHi -0RF7 protein 2e-13 


;SUPFAM] 


rel homology 2e-ll 


;SUPFAM] 


EGF homology 2e-12 


;prosite] 


AT P_GTP_A 1 


;pfam] 


Ank repeat 


;kw] 


Irregular 


;kw] 


3D 


;kw] 


L0W_C0MPLEXITY 3.05 % 



SEQ MALYDEDLLKNPFYLALQKCRPDLCSKVAQIHGIVLVPCKGSLSSSIQSTCQFESYILIP 

SEG 

lawcB 

SEQ VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPILFEETFYNEKEESFSILCIAHPLEKR 

SEG 

lawcB 

SEQ ESSEEPLAPSDPFSLKTIEDVREFLGRHSERFDRNIASFHRTFRECERKSLRHHIDSANA 

SEG 

lawcB 

SEQ LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVET YVHHEI YNLI FKYVGTMEASEDAAFN 

SEG 

lawcB 

SEQ KITRSLQDLQQKDIGVKPEFSFNIPRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 

SEG 

lawcB 

SEQ RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 

SEG xxxxxxxxxx 

lawcB 

SEQ YIRQGSLSAKPPESEGFGDRLFLKQRMSLLSQMTSSPTDCLFKHIASGNQKEVERLLSQE 

SEG 

lawcB 

SEQ DHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDDRGHTPLHVAAVCGQASLID 

SEG 

lawcB 
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SEQ LLVS KGAMVNAT D YHGAT PLHLACQKG YQSVTLLLLH YKASAEVQDNNGNT PLHLACT YG 

SEG '. 

lawcB 

SEQ HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 

SEG 

lawcB 

SEQ PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQ 

SEG xxxxxxxxxxxxxxxxxxxxxx. 

lawcB 

SEQ EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

SEG 

lawcB 

SEQ APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

SEG 

lawcB CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTTTTCCH 

SEQ LHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 

SEG 

lawcB HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHHHHHHHHCCCTTTTEE 

SEQ NKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCV 

SEG 

lawcB TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHC 

SEQ ASLDDVAETDRKEYVTVKIRKKWNSKLYDLPDEPFTRQFYFVHSAGQFKGKTSREIMARD 

SEG 

lawcB 

SEQ RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

SEG 

lawcB 

SEQ RHTVEDAVVSQGPEAAGPLSTPQEVSASRS 

SEG 

lawcB 

Prosite for DKFZphtes3_1817 .2 
PS00017 945->953 ATP_GTP_A PDOC00017 

Pfam for DKFZphtes3_1817 . 2 
HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+AA ++ ++++LL+++GA +N 
Query 4 63 GHT PLHV AAVCGQAS LI DLLVS KGAMVN 4 90 

32.12 (bits) f: 496 t: 523 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G TPLH+A++ + ++ LLL + A+ 

dkfzphtes3 496 GATPLHLACQKGYQS VTLLLLHYKASAE 523 

Query f: 529 t: 556 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 52 9 GNTPLHLACTYGHEDCVKALVYYDVESC 556 

42.65 (bits) f: 565 t: 592 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+TPLHIAAR + +++ LLQ+GA+ 

dkfzphtes3 565 GDTPLHIAARWGYQGVIETLLQNGASTE 592 

Query f: 744 t: 771 Target: dkf zphtes3_1817 .2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G +PLH+AA +++ +++RLLL+HGA+ 
Query 744 GSSPLHVAALHGRADLIRLLLKHGANAG 771 



649 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



36.38 (bits) f: 777 t: 804 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHI AARyNNvEMVrlLLQHGADIN* 

PLH+A+++++ ++V+ LL+ +A +N 

dkfzphtes3 777 QAVPLHLACQQGHFQVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GNTPLIYACSGGHHELVALLLQHGASIN 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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DKFZphtes3_19f 19 



group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein YFL04 6W. 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to YFL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map="405 . 0/ . 3 CR from top of Chrll linkage group" 
Insert length: 1395 bp 

Poly A stretch at pos. 1367, no polyadenylation signal found 

1 GGGACCACGG TGGCGCCTGC GCTGGGAGGT GAGCTTGTGA CAGAGCGAAA 
51 ACTACAATTC CCAGCATTCC TGTGGTGCCA GAACTACCTT GCCCGAAAGC 

101 CTGTGCGAGA TTTACCCCGT CTTCCGCCTC CCTCCCACCG GAAAACTCTG 

151 AGGACATGAA TAGTCGCCAG GCTTGGCGGC TCTTTCTCTC CCAAGGCAGA 

201 GGAGATCGTT GGGTTTCAAG GCCCCGCGGG CATTTCTCGC CGGCCCTGCG 

251 GAGAGAGTTC TTCACTACCA CAACCAAGGA GGGATATGAT AGGCGGCCAG 

301 TGGATATAAC TCCTTTAGAA CAAAGGAAAT TAACTTTTGA TACCCATGCA 

351 TTGGTTCAGG ACTTGGAAAC TCATGGATTT GACAAAACAC AAGCAGAAAC 

401 AATTGTATCA GCGTTAACTG CTTTATCAAA TGTCAGCCTG GATACTATCT 

451 ATAAAGAGAT GGTCACTCAA GCTCAACAGG AAATAACAGT ACAACAGCTA 

501 ATGGCTCATT TGGATGCTAT CAGGAAAGAC ATGGTCATCC TAGAGAAAAG 

551 TGAATTTGCA AATCTGAGAG CAGAGAATGA GAAAATGAAA ATTGAATTAG 

601 ACCAAGTTAA GCAACAACTA ATGCATGAAA CCAGTCGAAT CAGAGCAGAT 

651 AATAAACTGG ATATCAACTT AGAAAGGAGC AGAGTAACAG ATATGTTTAC 

7 01 AGATCAAGAA AAGCAACTTA TGGAAACAAC TACAGAATTT ACAAAAAAGG 

7 51 ATACTCAAAC CAAAAGTATT ATTTCAGAGA CCAGTAATAA AATTGACGCT 

801 GAAATTGCTT CCTTAAAAAC ACTGATGGAA TCTAACAAAC TTGAGACAAT 

851 TCGTTATCTT GCAGCTTCGG TGTTTACTTG CCTGGCAATA GCATTGGGAT 

901 TTTATAGATT CTGGAAGTAG TATTAATGCT CATCCTGCTG TGGCTGTTGG 

951 CTTCTTAGAA CACCAAACCG GGAGAGATTT ACTTTGAACA TTGTCAGTTG 
1001 CAGCAAAAAT TTACTACACA AGATTATTCG AAGTGTATAC GGACTAAAAG 
1051 AGGAAGTGTT TTAGAATGAG AAGAGATACT GTGTCTTTAT TGTGTGTGTG 
1101 TGAGTGCAGG TGTGTGTCTT TATTATATTG AAAAGCTGTC ACTCAGACCT 
1151 GGTTTGAGAT AGAAGAGCAT TTTGTCCTTT TGATAGTTAA TAGAAATTGA 
1201 ACCAGAGTTT TCTTATGTTT GCTTGAACAG TTGTGTAAAT CATACAGGAT 
1251 TTTGTGGGTA TTGGTTGAAT ATTTGTAAAC CATTCCCTAG CCTACATATT 
1301 TATTACTGAA TTAACTTTCC TGATAACCAT TGCATAATTA CATTTTTCTA 
1351 TAAAATGAAA GATTATTACA ACAAAAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS419346 from database EMBL: 
human STS WI-13569. 
Score = 2154, P = 8.6e-91, identities = 446/459 

Entry HS1292427 from database EMBL: 
human STS SHGC-50338. 
Score = 1737, P = 7.2e-72, identities = 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score = 1578, P = 1.0e-64, identities = 358/397 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 



1 MNSRQAWRLF 
51 ITPLEQRKLT 
101 EMVTQAQQEI 
151 VKQQLMHETS 
201 QTKSIISETS 
251 RFWK 



LSQGRGDRWV 
FDTHALVQDL 
TVQQLMAHLD 
RIRADNKLDI 
NKIDAEIASL 



SRPRGHF5PA 
ETHGFDKTQA 
AIRKDMVILE 
NLERSRVTDM 
KTLMESNKLE 



LRREFFTTTT 
ETIVSALTAL 
KSEFANLRAE 
FTDQEKQLME 
TIRYLAASVF 



KEGYDRRPVD 

SNVSLDTIYK 

NEKMKIELDQ 
TTTEFTKKDT 

TCLAIALGFY 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19f 19, frame 3 

SWISSPROT: YAN8_SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1 . 08 IN CHROMOSOME 
I., N = 1 , Score = 144, P = 8.4e-09 

PIR:S56209 probable membrane protein YFL04 6w - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 138, P = 5.4e-08 

>SWISSPROT: YAN8_SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length = 211 

HSPs: 

Score = 144 (21.6 bits). Expect = 8.4e-09, P = 8.4e-09 
Identities = 34/121 (28%), Positives = 67/121 (55%) 

Query: 70 LETHGFDKTQAETIVSALTALSNVSLDTIYKEMVTQAQQE-ITVQQLMAHLDAIRKDMVI 128 

LE G+ AETI + + ++ +L + K + +A+QE ++ QQ L IRK + 
Sbjct: 46 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 IEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 

Pedant information for DKFZphtes3_19f 19, frame 3 



Report for DKFZphtes3_19f 19.3 



[LENGTH] 

[MW] 

[pl] 

[HOMOL] 

2e-10 

[ FUNCAT ) 

[PROSITE] 

[KW] 

[KW] 

[KW] 



254 

29505.73 
6.99 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 



99 unclassified proteins 
RGD 1 
TRANSMEMBRANE 1 
LOWCOMPLEXITY 5.12 % 

COILED COIL 11.02 % 



[S. cerevisiae, YFL046w] 8e-12 



SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVD ITPLEQRKLT 

SEG 

PRD ccchhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTHALVQDLETHGFDKTQAETI VSALTALSNVSLDTI YKEMVTQAQQEITVQQLMAHLD 

SEG 

PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 
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MEM 

SEQ AIRKDMVILEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ FTDQEKQLMETTTEFTKKDTQTKSIISETSNKIDAEIASLKTLMESNKLETIRYLAASVF 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM MMMMMMM 

SEQ TCLAIALGFYRFWK 

SEG 

PRD hhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMM. . . . 



Prosite for DKFZphtes3_19f 19 . 3 
PS00016 15->18 RGD PDOC00016 

(No Pfam data available for DKFZphtes3_19f 19 . 3) 
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DKFZphtes3_19jl7 



group: testes derived 

DKFZphtes3_19 j 17 encodes a novel 436 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rsp5/wwp domain signatures. 

The WW domain (or rsp5 or WWP domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophin, vertebrate YAP 
protein, mouse NEDD-4 and yeast RSP5. 

The domain is repeated up to 4 times in some proteins. It has been shown to bind proteins with 
particular proline-motifs, [ AP] -P-P- [API -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp. The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to C.elegans Y40B1A.2 

there are two long ORFs in this cDNA according to EST: 
HS12146/HS75086/AA923755/MMAA17335 remaining intron at Bp 1506-1733 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2762 bp 

Poly A stretch at pos. 2740, no polyadenylation signal found 



1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATAAAA TGTTGCGGAG 
401 ATCTGATAGT C C TGAAAAC A AATACAGTGA CAGCACAGGT CACAGTAAGG 
451 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTC7TC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCAATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAGATGGCAG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
12 01 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 
12 51 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 
1351 ATTCTAATGT GGACATATCT AAAATAAATG AAGTTCTTAC AGCAGCTGTG 
1401 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGAAGTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 
1751 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
1801 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 
1851 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACAGAAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG CCTGCACTAG CAGCACACTT CAGTGAAAAT CTCATAAAAC 
2151 ACGTTCAAGG ATGGCCTGCA GATCATGCAG AGAAGCAGGC ATCAAGATTA 
2201 CGCGAAGAAG CGCATAACAT GGGAACTATT CACATGTCCG AAATTTGTAC 
2251 TGAATTAAAA AATTTAAGAT CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 
2301 CTTTGCGAGA GCAAAGGATA CTATTTTTGA GACAACAAAT TAAGGAACTT 
2351 GAAAAGCTAA AAAATCAGAA TTCCTTCATG GTGTGAAGAT GTGAATAATT 
2401 GCACATGGTT TTGAGAACAG GAACTGTAAA TCTGTTGCCC AATCTTAACA 
2 451 TTTTTGAGCT GCATTTAAGT AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 
2501 AATGACAAGG GGACGGGGTC TGTGAGAGTC AATTCAGGGG AAAGATACAA 
2551 GATTGATTTG TAAAACCCTT GAAATGTAGA TTTCTTGTAG ATGTATCCTT 
2601 CACGTTGTAA ATATGTTTTG TAGAGTGAAG CCATGGGAAG CCATGTGTAA 
2 651 CAGAGCTTAG ACATCCAAAA CTAATCAATG CTGAGGTGGC TAAATACCTA 
2701 GCCTTTTACA TGTAAACCTG TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 
2751 AAAAAAAAAA AA 



BLAST Results 



Entry AC005876 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1188I5 map lOpll .2-10pl2 . 1, 
complete sequence. 

Score = 2130, P = 0.0e+00, identities = 426/426 
12 exons matching Bp 492-2740 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 



1 MSLTSDASSP RSYVSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPWKQ 
51 GPVSQSATQQ PVTADKQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSNAS 
101 NATVVPQNSS ARSTCSLT PA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for 
No Alert BLASTP hits found 



DKFZphtes3_19j 17, frame 2 



Peptide information for frame 3 



ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: WW_D0MAIN_1 (90-116) 
WW_DOMAIN 1 (90-116) 



1 MRDAGDPSPP NKMLRRSDSP ENKYSDSTGH SKAKNVHTHR VRERDGGTSY 
51 SPQENSHNHS ALHSSNSHSS NPSNNPSKTS DAPYDSADDW SEHISSSGKK 
101 YYYNCRTEVS QWEKPKEWLE REQRQKEANK MAVNSFPKDR DYRREVMQAT 
151 ATSGFASGME DKHSSDASSL LPQNILSQTS RHNDRDYRLP RAETHSSSTP 
201 VQHPIKPVVH PTATPSTVPS SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 
251 SSVPAQKTER KESTSGDKPV SHSCTTPSTS SASGLNPTSA PPTSASAVPV 
301 SPVPQSPIPP LLQDPNLLRQ LLPALQATLQ LNNSNVDISK INEVLTAAVT 
351 QASLQSIIHK FLTAGPSAFN ITSLISOAAQ LSTQDIPLHE GIQMERDTHR 
4 01 SKWEVKGSLC QKADKQQECL VWNGSIMVQR LLQPSG 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19j 17, frame 3 

TREMBL : CEY4 0B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid 
Y40B1A, N = 1, Score = 144, P = 1.8e-09 

>TREMBL : CEY4 0B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 
Length = 120 

HSPs: 

Score = 144 (21.6 bits), Expect = 1.8e-09, P = 1.8e-09 
Identities = 30/67 (44%), Positives = 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 146 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 

Pedant information for DKFZphtes3_19jl7, frame 2 

Report for DKFZphtes3_19 j 17 . 2 

[LENGTH] 209 

[MW] 22873.85 

[pi] 9.95 

[KH] All_Alpha 

[KW] LOW_COMPLEXITY 13.40 % 

SEQ MSLTSDASSPRSYVSPRISTPQTNTVPIKPLISTPPVSSQPKVSTPVVKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATVVPQNSSARSTCSLTPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

SEQ LAAHFSENLIKHVQGWPADHAEKQASRLREEAHNKGTIHMSEICTELKNLRSLVRVCEIQ 

SEG 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRILFLRQQIKELEKLKNQNSFMV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 

(No Prosite data available for DKFZphtes3_19 j 17 .2 ) 
(No Pfam data available for DKFZphtes3_19j 17 . 2) 

Pedant information for DKFZphtes3_19jl7, frame 3 

Report for DKFZphtes3_19 j 17 . 3 

[LENGTH] 43 6 

[MW] 47716.62 

[pi] 8.71 

[HOMOL] TREMBL :CEY40B1A_2 gene: "Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 6e-08 

[FONCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YPR152c] 6e-04 

[BLOCKS) BL01159 WW/rsp5/WWP domain proteins 

[PROSITE] WW_DOMAIN_l 2 

[PFAM] WW/rsp5/WWP domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSKAKNVHTHRVRERDGGTSYSPQENSHNHS 

SEG xxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYNCRTEVSQWEKPKEWLE 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

SEQ REQRQKEANKMAVNSFPKDRDYRREVMQATATSGFASGMEDKHSSDASSLLPQNILSQTS 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhhhhcccccccccccccccccccccccccccccc 

SEQ RHNDRDYRLPRAETHSSSTPVQHPIKPVVHPTATPSTVPSSPFTLQSDHQPKKSFDANGA 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLNPTSAPPTSASAVPV 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 

SEQ FLTAGPSAFNITSLISQAAQLSTQDIPLHEGIQMERDTHRSKWEVKGSLCQKADKQQECL 

SEG 

PRD hhcccccceeehhhhhhhhhhhccccccccccccccccccceeeecccchhhhhhhccee 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 



Prosite for DKFZphtes3_19j 17 . 3 

PS01159 90->116 WW_DOMAIN_l PDOC50020 

PS01159 90->116 WW DOMAIN 1 PDOC50020 



Pfam for DKFZphtes3_l 9 j 17 . 3 



HMM NAME 



ww/rsp5/wwp domain containing proteins 



HMM *LPsGWEeHWDpsGRpWYYWNHETkTTQWEpP* 

+ ++W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHISSSGKK-YYYNCRTEVSQWEKP 
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DKFZphtes3_lcl 



group: signal transduction 

DKFZphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila rotund transcript and human n-chimaerin. 

rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4, 5-bisphosphate . The new protein is 
expected to activate p21rac-related small GTPases. 

The new protein can find application in modulating/blocking the response to a cellular 
receptor . 



similarity to GTPase-activating proteins 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus : unknown 



Insert length: 3237 bp 

Poly A stretch at pos . 3227, no polyadenylation signal found 



1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 
51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 
101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 
151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TTATACTACT 
201 GGTAGATAGA AGAGCTAAAG AAAG AT GG AT ACTATGATGC TGAATGTGCG 
251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 
301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 
351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 
401 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA TGTTAAGCTG AAGCATGCAC 
451 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 
501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 
551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 
601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 
651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 
701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 
7 51 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 
801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 
851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 
901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 
951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 
1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 
1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 
1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 
1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 
1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 
1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 
1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATTGAGCAAA 
1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 
14 01 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 
14 51 CAGCAAAGTG GATGATATCC ATGCTATCTG TAGCCTTCTA AAAGACTTTC 
1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 
1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 
1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 
1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 
17 01 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 
17 51 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 
1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 
1851 ATGATGGTGG AGCAAGAGAA CATTGACCCC CTACATGTCA TTGAAAACTC 
1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 
1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 
2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 
2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 
2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 
2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 
22 01 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 
2251 TGTGCATGAG GGTTTTATTA AAACTATATA TATCTCCCCT TCCTTCTCCT 
2301 CAAGTCACAT AATATCAGCA CTTTGTGCTG GTCATTGTTG GGAGCTTTTA 
2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 
2401 TTCTTTTTGG GGAAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 
2451 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 
2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 
2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2 601 GCCTCCTTCT AAGAACCAGA AGTTCTCATT CCCCATTATG AACTGAGCTA 
2 651 TAATATGGAG CTTTCATAAA AATGGGATGC ATTGAGGACA GAACTAGTGA 
2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 
2751 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC AACTCGTATT TATCTCTGAT 
2B01 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 
2 851 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 
2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 
2 951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 
3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 
3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 
3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 
3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 
3201 AAAAAAGTAA ATAGCTTTTT C AAAAT G AAA AAAAAAA 



BLAST Results 



Entry U82984 from database EMBLEST : 

Homo sapiens DRES 56 mRNA sequence. 

Score = 8775, P = 0.0e+00, identities = 1757/1758 

matches 3 ' end 



Medline entries 



93071974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP : cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
imaginal disc 

morphogenesis encodes a protein which is similar to human Rac 
GTPase -activating 

(racGAP) proteins. 



Peptide information for frame 3 



ORF from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 



1 MDTMMLNVRN LFEQLVRRVE ILSEGNEVQF IQLAKDFEDF RKKWQRTDHE 
51 LGKYKDLLMK AETERSALDV KLKHAP.NQVD VEIKRRQRAE ADCEKLERQI 
101 QLIREMLMCD TSGSIQLSEE QKSALAFLNR GQPSSSNAGN KRLSTIDESG 
151 SILSDISFDK TDESLDWDSS LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 
201 TRSIGSAVDQ GNESI VAKTT VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 
251 TLQPWNSDST LNSRQLEPRT ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 
301 VPCGKRIKFG KLSLKCRDCR VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 
351 LADFVSQTSP MIPSIVVHCV NEIEQRGLTE TGLYRISGCD RTVKELKEKF 
401 LRVKTVPLLS KVDDIHAICS LLKDFLRNLK EPLLTFRLNR AFMEAAEITD 
451 EDNSIAAMYQ AVGELPQANR DTLAFLMIHL QRVAQSPHTK MDVANLAKVF 
501 GPTIVAHAVP NPDPVTMLQD IKRQPKVVER LLSLPLEYWS OFMMVEQENI 
551 DPLHVIENSN AFSTPQTPDI KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 
601 TLTKNTPRFG SKSKSATNLG RQGNFFASPM LK 



BLASTP hits 



Entry CEK08E3_4 from database TREMBLNEW: 

gene: "K08E3.6"; Caenorhabditis elegans cosmid K08E3 

Score = 452, P = 2.6e-4S, identities = 126/377, positives = 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel. 7 - fruit 
fly (Drosophila melanogaster) (fragment) 

Score = 480, P = 9.2e-46, identities = 111/270, positives = 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score = 480, P = 9.2e-46, identities = 111/270, positives = 155/270 
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Entry DM22539_1 from database TREMBL: 

gene: "rotund"; product: "rnracGAP"; Drosophila melanogaster rnracGAP 
(rotund) gene, complete cds. 

Score = 480, P = 9.2e-46, identities = 111/270, positives = 155/270 

Entry S29128 from database PIR: 
N-chimerin - rat 

Score = 336, P = 8.8e-30, identities = 86/253, positives = 128/253 



Alert BLAST P hits for DKFZphtes3_lcl, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lcl, frame 3 



Report for DKFZphtes3_lcl . 3 



[LENGTH] 632 

[MW] 71026.84 

[pi] 9.08 

[HOMOL] PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
fruit fly (Drosophila melanogaster) 2e-46 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YBR260c] 3e-12 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YER155c] 2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YER155C] 2e-ll 

[ FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YER155c] 

2e-ll 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDL240w] 3e-09 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YOR134w] 4e-09 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR134w] 4e-09 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YOR127wl 5e-09 

[FUNCAT] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPL115c] 3e-08 

[FUNCAT] 10.02.09 regulation of g-protein activity [S. cerevisiae, YPL115c] 3e-08 

[BLOCKS] BL00479B Phorbol esters / diacylglycerol binding domain proteins 

[BLOCKS] BL00479A Phorbol esters / diacylglycerol binding domain proteins 

[SCOP] dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn le-55 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) le-49 

[PIRKW] breakpoint cluster region le-19 

[pirkw] transmembrane protein 7e-08 

[ PIRKW] brain 3e-22 

[ PIRKW] alternative splicing le-19 

[PIRKW] P-loop 2e-25 

[SUPFAM] CDC24 homology 3e-22 

[SUPFAM] bcr protein 3e-22 

[SUPFAMJ myosin motor domain homology 2e-25 

[SUPFAM] pleckstrin repeat homology 4e-10 

[SUPFAM] LIM metal-binding repeat homology 2e-09 

[SUPFAM] protein kinase C zinc-binding repeat homology 5e-29 

[PROSITE] MYRISTYL 6 

[PROSITE] AM I DAT TON 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 13 

[PROSITE] TYRPHOSPHOSITE 2 

[ PROSITE] PKC_PHOSPHO_SITE 9 

[ PROSITE] ASN_GLYCOSYLATION 1 

[PROSITE] DAG_PE_BINDING_DOMAIN 1 

[PFAM] Phorbol esters / diacylglycerol binding domain 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 2.22 % 

[KW] COILED_COIL 8.54 % 

SEQ MDTMMLNVRNLFEQLVRRVEILSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG 

COILS CCCCCCCCCCCC 

Irgp- 



SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLIREMLMCDTSGSIQLSEE 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTIDESGSILSDISFDKTDESLDWDSSLVKTFKLKKR 

SEG 

COILS 
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Irgp- 

SEQ EKRRSTSRQFVDGPPGPVKKTRSIGSAVDQGNESIVAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

Irgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVIKPESC 

SEG 

COILS 

Irgp- 

SEQ VPCGKRIKFGKLSLKCRDCRVVSHPECRDRCPLPCIPTLIGTPVKIGEGMLADFVSQTSP 

SEG 

COILS 

irgp- 

SEQ MI PS I VVHCVNEI EQRGLTETGL YRI SGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS 

Irgp- .CCHHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCG-GGCCCCHHHHH 

SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDNSIAAMYQAVGELPQANRDTLAFLMIHL 

SEG 

COILS 

Irgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRVAQSPHTKMDVANLAKVFGPTI VAHAVPNPDPVTMLQDIKRQPKVVERLLSLPLEYWS 

SEG 

COILS 

Irgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENIDPLHVIENSNAFSTPQTPDIKVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

SEG xxxxxxxxxxx 

COILS 

Irgp- 

SEQ TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 

SEG xxx 

COILS 

Irgp- 









Prosite for DKFZphtes3_ 


_lcl.3 


PS00001 


212- 


>216 


ASN_GLYCOSYLATION 


PDOC00001 


PS00004 


141- 


>145 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


182- 


>186 


CAMP PHOSPHO SITE 


PDOC0 00 04 


PS00004 


246- 


>250 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


6: 


;->66 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


174- 


>177 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


186- 


>189 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


245- 


>248 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


313- 


•>316 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


392- 


>395 


PKC PHOSPHO_SITE 


PDOC0 00 05 


PS00005 


435- 


■>438 


PKC PHOSPHO SITE 


PDOC0 00 0 5 


PS00005 


595- 


>598 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


606- 


>609 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


47 


->51 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


66 


i->70 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


144- 


>148 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


206- 


>210 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


234- 


>238 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


270- 


>274 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


323- 


>327 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


387- 


>391 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


392- 


>396 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


410- 


>414 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


449- 


>453 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


489- 


>493 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


46 


;->55 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


37 6- 


>385 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


131- 


>137 


MYRISTYL 


PDOC00008 


PS00008 


150- 


>156 


MYRISTYL 


PDOC00008 


PS00008 


276- 


>282 


MYRISTYL 


PDOC00008 


PS00008 


377- 


>383 


MYRISTYL 


PDOC00008 


PS00008 


388- 


>394 


MYRISTYL 


PDOC00008 


PS00008 


623- 


>629 


MYRISTYL 


PDOC00008 


PS00009 


303- 


■>307 


AMIDATION 


PDOC00009 
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PS00479 



287->336 DAG PE BINDING DOMAIN PCOC00379 



Pfam for DKFZphtes3_lcl . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Phorbol esters / diacylglycerol binding domain 

*HrFmrHTFrqPTWCDHCgeFIWGWgKQGYQCQnCgMNCHKRCHelVPmm 
H+F+ +T + P +C CG +1 +GK ++C +C+++ H +C+ + P 
287 HDFVSKTVIKPESCVPCGKRI-KFGKLSLKCRDCRVVSHPECRDRCPLP 334 



c 

335 C 



335 
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DKFZphtes3_lgl3 



group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 1007 amino acid protein with similarity to human 256 kD 
golgin . 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cpl51 shows 
haploid-specif ic transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 



similarity to 256 kD golgi, strong similarity to rat "cpl51" 
21 exons encoded on AC004682 

EST from a testis library, two mouse ESTs of a testis cDNA library, 
rat cpl51 shows haploid-specif ic transcription! 
testis or haploid-specif ic transcription 

Sequenced by DKFZ 

Locus: map="16q22 . 2" 

Insert length: 3405 bp 

Poly A stretch at pos. 3394, polyadenylation signal at pos . 3373 



1 GGGATAGGGG ATGTGGTTTG TTACAAAGGA TGAGTATTTT GATAGCTTCT 

51 CATTCCTTGA ACTATTCTGC AGGTTTATAA CAAAGCTCAG AAAATACTAA 

101 AGGTTAAAGG AGAATTGAGA GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 

151 GAGAGAGACA GAGAAGTGAG CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 

201 ACTTGACATC AAGAATCTGC ACGATGTCTG CAAGAGACAG AGGAAGACCT 

251 TGCAGGACAA TCAGCTCTGC ATGGAGGAGG CAATGAACAG CAGCCACGAC 

301 AAGAAGCAAG CACAGGCATT AGCATTCGAG GAGTCAGAGG TGGAATTTGG 

351 GTCCAGTAAA CAGTGTCATC TGAGACAACT CCAGCAACTG AAGAAAAAAT 

401 TGCTGGTCCT TCAACAAGAA CTGGAGTTTC ACACAGAGGA GTTGCAGACT 

451 TCTTACTATT CTCTCCGCCA GTATCAGTCC ATCCTAGAGA AGCAGACTTC 

501 CGACCTGGTT CTTCTGCACC ATCACTGCAA ACTGAAAGAA GATGAGGTGA 

551 TTCTCTATGA GGAGGAAATG GGAAATCACA ACGAGAACAC AGGGGAGAAG 

601 CTCCATTTGG CGCAGGAGCA ACTCGCCTTG GCCGGGGACA AGATCGCCTC 

651 TCTAGAGAGG AGCTTAAACC TCTACAGGGA TAAATACCAG TCTTCCCTGA 

701 GCAACATCGA GTTACTAGAA TGCCAAGTGA AGATGTTGCA GGGGGAACTC 

751 GGCGGGATCA TGGGTCAGGA GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 

801 ACGGATATAC ACTTCTCCTT GCATGATTCA AGAGCATCAG GAGACTCAGA 

851 AACGACTGTC TGAAGTCTGG CAAAAGGTCT CTCAACAGGA TGATCTCATT 

901 CAAGAACTTC GAAATAAGCT GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 

951 TGAAAAGGCT TTGATAAAAC TACAAGCCGA TTTTGCTTCC TGTACAGCCA 

1001 CCCACAGATA CCCTCCTAGC TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 

1051 ATACTGAAGC ACTTGCAGGA GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 

1101 GGAGTACCAG AACCTGGTGA AGGATCTGCG CGTGGAACTA GAGGCCGTGT 

1151 CGGAACAGAA GAGAAACATC ATGAAGGACA TGATGAAGCT GGAGCTGGAC 

1201 CTGCACGGAC TGCGGGAGGA GACATCTGCC CACATTGAGA GGAAGGATAA 

1251 GGACATCACC ATCCTGCAGT GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 

1301 CCGAGACCCA AAAGCTCACT TTGAAGAAAG ACAAGTTCCT CCAAGAGAAA 

1351 GATGAGATGC TGCAAGAGCT GGAGAAGAAA CTGACACAGG TTCAGAACAG 

1401 CCTCCTGAAA AAGGAGAAGG AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 

1451 AACTTGAAAT GACAGTCAAG GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 

1501 GAGTGCAAGG CCCTGCAGGC TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 

1551 AGAGGCCAAG CAGCAGGAGA GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 

1601 AAGAAGAGGC TGCACTGGCA GGCTGTCACC TGGAGGACAC CCAGAGGAAA 

1651 CTGCAGAAGG GTCTCCTCCT GGACAAGCAG AAGGCAGACA CCATCCAGGA 

1701 ACTACAGAGA GAACTTCAGA TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 

1751 AGGAACAAAC CTCCAACAGA AAACGGGTGG AGGAGCTGTC ATTAGAACTC 

1801 TCTGAAGCCC TGAGGAAGCT TGAAAATTCA GACAAGGAAA AGAGGCAGCT 

1851 TCAGAAGACA GTGGCTGAGC AGGATATGAA AATGAATGAC ATGCTTGATC 

1901 GTATCAAGCA CCAGCACAGG GAGCAAGGCT CCATCAAATG CAAGTTAGAA 

1951 GAAGATCTTC AGGAGGCCAC AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 
2 001 GAAGAAGAGC AAAGAGCATG AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 

2 051 TGCGGCAGGA ATTTAAAAAG A7AAGACAAGA CGTTGAAAGA GAATTCCAGA 

2101 AAGTTGGAGG AAGAAAATGA GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 

2151 TACACAACTG GAATCCTCTC TCAACAAATA CAACACCAGC CAGCAAGTCA 

2201 TCCAAGACTT GAATAAAGAG ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 

2251 CTGCAGGCCC AGCTGGACAA AGCTCTGCAG AAGGAGAAGC ACTATCTCCA 

2301 GACTACCATC ACCAAAGAAG CCTATGATGC ATTATCCCGG AAGTCAGCCG 

2 351 CCTGCCAGGA TGACCTGACA CAAGCCCTCG AGAAGCTCAA TCACGTGACC 

2401 TCAGAGACAA AGAGCCTGCA GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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24 51 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 

2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 

2551 GAGGTGCACG CCTTTGACAA GAAGCTAGAG GAGATGAGCT GCCAGGTGCT 

2601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 

2 651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 

27 01 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 

2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 

2 801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 

2851 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 

2 901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 

2951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 

3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 

3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 

3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCC CGGG TTCCTCATAC 

3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 

3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 

3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 

3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 

3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 
3401 AAAAA 



BLAST Results 



Entry AC004682 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAC clone CIT987SK-A-259H10, complete 
sequence . 

Score = 1291, P = 0.0e+00, identities = 265/272 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known' protein 

Prosite motifs: LEUCINE_ZIPPER (83-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE_ZIPPER (97-119) 

LEUCINE_ZIPPER (104-126) 

LEUCINE_ZIPPER (403-425) 

LEUCINE_ZIPPER (410-432) 

LEUCINE ZIPPER (918-940) 



1 MKDEAGERDR EVSSLNSKLL SLQLDIKNLH DVCKRQRKTL QDNQLCMEEA 
51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLQQLKKKL LVLQQELEFH 
101 TEELQTSYYS LRQYQSILEK QTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 
151 ENTGEKLHLA QEQLALAGDK IASLERSLNL YRDKYQSSLS NIELLECQVK 
201 MLQGELGGIM GQEPENKGDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 
251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 
301 CEDIKKILKH LQEQKDSQCL HVEEYQNLVK DLRVELEAVS EQKRNIMKDM 
351 MKLELDLHGL REETSAHIER KDKDITILQC RLQELQLEFT ETQKLTLKKD 
401 KFLQEKDEML QELEKKLTQV QNSLLKKEKE LEKQQCMATE LEMTVKEAKQ 
451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 
501 EDTQRKLQKG LLLDKQKADT IQELQRELQM LQKESSMAEK EQTSNRKRVE 
551 ELSLELSEAL RKLENSDKEK RQLQKTVAEQ DMKMNDMLDR IKHQHREQGS 
601 IKCKLEEDLQ EATKLLEDKR EQLKKSKEHE KLMEGELEAL RQEFKKKDKT 
651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSOOVI QDLNKEIALQ 
701 KESLMSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 
751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI IAYEERMKKL NTELRKLRGF 
801 HQESELEVHA FDKKLEEMSC QVLQWQKQHQ NDLKMLAAKE EQLREFQEEM 
851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 
901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQENKKLK KEIEEKKMKA 
951 ENTRLCTKAL GPSRTESTQR EKVCGTLGWK GLPQDMGQRM DLTKYIGMPH 
1001 CPGSSYC 

BLAST P hits 
Entry HS417401_1 from database TREMBL : 

product: "trans-Golgi p230"; Human trans-Golgi p230 mRNA, complete 
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cds . 

Score = 411, P = 3.9e-34, identities = 212/862, positives = 420/862 
Entry SCINTANA_1 from database TREMBL : 

Saccharomyces cerevisiae integrin analogue gene, complete cds. 
Score = 404, P = 6.2e-34, identities = 199/897, positives = 423/897 

Entry HS6802_2 from database TREMBL: 

gene: "MYH9"; product: "dJ6802.2"; Homo sapiens DNA sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS . 

Score = 404, P = 1.9e-33, identities = 231/1028, positives = 469/1028 
Entry AF092090_1 from database TREMBL: 

product: "cpl51"; Rattus norvegicus cpl51 mRNA, partial cds. 

Score = 2523, P = 3.0e-262, identities = 506/733, positives = 611/733 



Alert BLASTP hits for DKFZphtes3_lgl3, frame 1 

TREMBL :HSGOLGIN_l product: "256 kD golgin"; H. sapiens mRNA for golgin, 
N = 1, Score = 411, P = 4.4e-34 

TREMBL:HS417401_1 product: "trans-Golgi p230"; Human trans-Golgi p230 

mRNA, complete cds., N = 1, Score = 411, P = 4.5e-34 

TREMBL :SCINTANA_1 Saccharomyces cerevisiae integrin analogue gene, 
complete cds., N = 1, Score = 404, P = 7.1e-34 



>TREMBL:HSGOLGIN_l product: "256 kD golgin"; H. sapiens mRNA for golgin 
Length = 2, 185 

HSPs: 

Score = 411 (61.7 bits), Expect = 4.4e-34, P = 4.4e-34 
Identities = 212/816 (25%), Positives = 420/816 (51%) 

Query: 145 EMGNHNEN-TGEKLHLAQEQLALAGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL+ YR KY ++ ++L+ + K LQ 

Sbjct: 119 DMDSEAEDLVGNSDSLNKEQLI QRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQ 175 

Query: 204 GELGGIMGQEPENKGDHSKVRI YTSPCMIQEHQETQKRLSEVWQ-KVSQQDDLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 176 G ILSQSQ DKSLRR1AELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQ 227 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSSEEC-ED--IKKILKHLQE 313 

++ + + ++ K L +L+ A P S E ED K L+ LQ+ 

Sbjct: 228 VSLLKQRLRNGPMNVDVLKPLPQLEPQ-AEVFTKEENPESDGEPVVEDGTSVKTLETLQQ 286 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ Q C ++ ++ L E EA+ EQ ++++ K++ DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIK-DLH-MAEKTKL 34 4 

Query: 367 HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV--QNSL 424 

+ +D I Q Q+ + ET++ + + L+ K+E + +L ++ Q+ Q 
Sbjct: 345 ITQLRDAKNLIEQLE-QDKGMVIAETKR QMHETLEMKEEEIAQLRSRIKQMTTQGEE 400 

Query: 425 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 484 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 
Sbjct: 401 LREQKE-KSERAAFEELEKALSTAQ — KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 456 

Query: 485 QA-AQCKEEAA-LAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL ++LQ ++E E+ + 

Sbjct: 457 QELSRVKQEVVDVMKKSSEEQIAKLQK — LHEKELARKEQELTKKLQTRERE — FQEQMK 512 

Query: 543 TSNRKRVEELSLELSEALRKLEMSDKEKRQLQKT — VAEQDMKMNDMLDRIKHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + E + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILE 571 

Query: 601 IKCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG ELEALR-QEFKKKDKTL 651 

++ LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE IALQKESLMS 706 

+ ++ + E E LR + C + E+ L +K Q I+++N++ + +++ L S 

Sbjct: 632 QVLKQQYQTEMEKLREK CEQEKETLLKDKEIIFQAHIEEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT— TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQ 764 
L ++L + L K +H L+ ++ K+ D + ++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDE--QKNHHQQQVDSIIKEHEV 745 

Query: 765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

S+ +T+ KA L+++I E +K+ + L++ + + E ++ + +L++ S ++ 
Sbjct: 746 SIQRTE--KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ + E L + + KD C 

Sbjct: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTE 855 

Query: 879 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLS-GEKDHLHSVMVHLQQENK 937 

L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + + ++EN 

Sbjct: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT — QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 
Sbjct: 913 ILQMREGQKKEIEILTQKLSAK 934 

score = 338 (50.7 bits), Expect = 3.1e-26, P = 3.1e-26 
Identities - 216/953 (22%), Positives = 468/953 (49%) 

Query: 2 KDEAGERDRE— VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EE AM 51 

K+E E D E V S K L +LQ +K ++ KR + +T+Q + + C +EA+ 
Sbjct: 260 KEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 319 

Query: 52 NSSHDKKQAQALAFEESEVEFGSSKQCHLRQ LQQLK— KKLLVLQQELEFHTEELQ 105 

D++ + + + + + LR ++QL+ K +++ + + + H E L+ 

Sbjct: 320 QEQLDERLQELEKI KDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVI AETKRQMH-ETLE 378 

Query: 106 TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 164 

+ Q +S +++ T+ L K K + EE +T +K A+ +L 

Sbjct: 379 MKEEEIAQLRSRIKQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 434 

Query: 165 ALAGDKIASLERSLNLYRDKYQSSLSNI--ELLECQVKMLQGELGGIMGQEPENKGDHSK 222 

A + + I ++E++ R Q LS + E+++ K + ++ + Q+ K K 

Sbjct: 435 AEMDEQIKTIEKTSEEERISLQQELSRVKQEWDVMKKSSEEQIAKL--QKLHEKELARK 492 

Query: 223 VRIYTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQA 282 

+ T +E +E Q+++ +K SQ + L ++ + +L LE ++LQ 
Sbjct: 493 EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL— KISQEKEQQESLALEE LELQK 544 

Query: 283 DFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 341 

AT+ +E E + + L+ + ++E +N KDL V LEA + + 

Sbjct: 545 K-AILTESENKLRDLQQEAETYRTRILELESSLEKS LQENKNQSKDLAVHLEAEKNK 600 

Query: 342 QKRNIMKDMMKLELDLHGLREETSAHIERKDKDITI-LQCRLQELQLEFTETQKLTLKKD 400 

+1 + K + +L L+ + A K++ Q +++L+ E E +K TL KD 
Sbjct: 601 HNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLLKD 659 

Query: 401 K FLQEKDEM-LQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKS 453 

K ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 

Sbjct: 660 KEIIFQAHIEEMNEKTLEKLDVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 717 

Query: 454 K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC-KEEAALAGCHLEDTQRKLQKGL 511 

K E E K + + + ++ ++ ++ Q+ + K++ L++ + L++ 

Sbjct: 718 KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQ 776 

Query: 512 L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 570 

++ +AD 1+ + ELQ + + + Q++ ++ + +L++ +KL + + E+ 
Sbjct: 777 AHVENLEAD-IKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 835 

Query: 571 RQLQKTVAEQDMKMNDM LD — RIKHQHREQGSIK — CKLEEDLQEATKLLEDKREQL 623 

L K VAE + + D+ LD +1+ Q Q K ++E+ ++ T++ EKE 
Sbjct: 836 ILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESKLEDG 895 

Query: 624 KKSKEHEK — LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 681 

K+E K L+E E L+ +K K ++ ++KL + +++ + T+ ++ 
Sbjct: 896 NKEQEQTKQILVEKENMILQMREGQK-KEIEILTQKLSAKEDS1HILNEEYETKFKNQEK 954 

Query: 682 KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 741 

K +Q +++ + + K+ L+ +A+L K L E L+ + ++ ++A + A 
Sbjct: 955 KMEKVKQKAKEMQETL KKKLLDQEAKLKKEL-7ENTALELSQKEKQFNAKMLEMAQA 1009 

Query: 742 QD-DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGF 800 

++ A+ +L T++ + ++ SLT+ + +L + I +E KKLN + +L+ 
Sbjct: 1010 NSAGISDAVSRLE--TNQKEQIE-SLTEVHRR— ELNDVISIWE KKLNQQAEELQEI 1061 

Query: 801 HQESELEVHAFDKKLEEMSCQVLQW — QKQHQNDLKMLAAKEEQLREFQEEMAALKENLL 858 

H E+++ ++++ E+ ++L + +K+ N ++ KEE +++ + L+E L 

Sbjct: 1062 H EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 1116 
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Query: 859 EDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQ — WAKQQKVANEKLGNQLREQVNYI- 915 

+ L Q K L + + +L++ + ++Q V + L + + +V+ + 

Sbjct: 1117 QKSAHVNSLAQ-DETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 117 5 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 
Sbjct: 1176 SKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE 1212 

Score = 337 (50.6 bits), Expect = 4.0e-26, P = 4.0e-26 
Identities = 215/951 (22%), Positives = 433/951 (45%) 

Query: 10 REVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQALAFEESE 69 

+E + +++L L+ ++ K Q K L + EA + H+K+ + E+ + 

Sbjct: 560 QEAETYRTRILELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT — VMVEKHK 613 

Query: 70 VEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

Sbjct: 614 TELESLK— H-QQDALWTEKLQVLKQQYQTEMEKLREK CEQEKETLLKD-KEI I FQA 666 

Query: 130 HHCKLKE DEVILYEEEMGNHNENTGEKL HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 

Sbjct: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSLSNIELLECQVKMLQGE--LGGIMGQEPENKGDHSKVRIYTSPCMIQEHQE 237 

+ +Q++I+E+V++EL + Q +K+ +++ 

Sbjct: 727 EQKNHHQQQVDSI-IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
SbjCt: 785 DIKRSEGELQQASAKLDVFQSYQS AT HEQTKAY EEQLAQLQQKLLDLE - TERI L 837 

Query: 298 SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVSLEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE + + +K + ++ E 
Sbjct: 838 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITILQCRL-QELQLEFTETQKLTLKKDKF — LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+- + +E TQKL+ K+D L E+ E + 

Sbjct: 892 LEDGNKEQEQTKQILVEKENMILQMREGQKKEIEIL-TQKLSAKEDSIHILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLLK KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 466 

EKK+ +V+ + K+K L+ + + + ELE T E Q K K+ K L+ Q 

Sbjct: 951 NQEKKMEKVKQKAKEMQETLKKKLIiDQEAKLKKELENTALELSQ-KEKQFNAKMLEM-AQ 1008 

Query: 4 67 KLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQR 526 

+ +A RL Q Q + + L D +K L Q+A+ +QE+ 
Sbjct: 1009 ANSAGI S DAVS - -RLETNQKEQIESLTEVHRRELNDVISIWEKKL NQQAEELQEIH- 1062 

Query: 527 ELQMLQKESSMAEKEQT SNRKRV EELSLELSEALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

SbjCt: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L++++ + L+E LE LE+++++ K+ 

SbjCt: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E 4+L+ +K +K+L++ S + ++ 4-E L +L C + E+ L T++ + + 
Sbjct: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q + K KE ++T E +A R+ Q+ L QA 

Sbjct: 1242 KTNAILSR-ISHCQHRTTKV--KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLN TELRK— LRGFHQESE 8 05 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

Sbjct: 1298 QLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTL 1357 

Query: 806 LEVHAFDKKLE — EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 8 63 

++ +KK+E +S Q+ Q QN + L+ KE + +++ K LL D + 

Sbjct: 1358 MKEELKEKKVEISSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 8 64 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE QVNYIAKLSG 920 

++ K+ D +W K+ + + N ++E Q+ +K + 

SbjCt: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EKDH-LHSVMVHLQQENKK LKKEIEEKKMKAE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQINLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score = 332 (49.8 bits), Expect = 1.4e-25, P = 1 . 4e-25 
Identities = 209/953 (21%), Positives = 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + R+R+ + ++ +E++ + S + 

Sbjct: 470 MKKSSEEQIAKLQKLHEKELARK-EQELTKKLQTREREFQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ L + + KL LQQE E + + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAILTESEN KLRDLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE — VILYEE EMGNHNENT — GEKLHLAQEQLALA 167 

+ Q+ DL + K K ++ ++ E+ E H ++ EKL + ++Q 

SbjCt: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDKIASL — ERSLNLYRDK YQSSLS — NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

+K+ + L +DK +Q+ + N + LE ++ + Q EL + + E 

Sbjct: 642 MEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRI YTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ S ++++ +T K E+ K+ +Q + Q+ + + + + ++R + +K 
Sbjct: 701 HKLEEELS--VLKD— QTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVSIQRTEKALKD 756 

Query: 281 QADFASCTATHR— YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+++ +K + + ++ +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKA 816 

Query: 339 VSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKLTLK 398 

EQ + + ++ LE + L ++ A +E + KD+ C EL + Q L + 

Sbjct: . 817 YEEQLAQLQQKLLDLETERILLTKQV-AEVEAQKKDV CT--ELDAHKIQVQDLMQQ 869 

Query: 3 99 KDKFLQEKDEMLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 4 57 

+K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 
SbjCt: 870 LEK QNSEMEQKV-KSLTQVYESKLEDGNKEQEQTKQILVEKENMILQMREGQKKEIE 925 

Query: 458 C--KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK — LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

SbjCt: 926 ILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K MA+ V L E + L ++ +R+ 

Sbjct: 982 KKELENTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL — TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKS KE 628 

L ++ + K+N + ++ H Q K + +L++ L ++E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 688 

+ L L+++ K+K + NS L ++ L+A L+ L SL + Q+ 

SbjCt: 1100 EGVKQDTTLNELQEQLKQKSAHV--NS--LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 VIQDLNKEIALQKESLMSLQAQL DKALQ — KEKHYLQTTITKEA YDALSRKSAA 740 

+ +L K + L ++L D+ Q K H ++ + LS + A 

SbjCt: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKL 790 

Q D+ T+AL E +N +S+T ++ ++ Q + +++E ++ + +L 

Sbjct: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+LR+L + +LEE Q+ K + D++ L ++E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNISFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

Sbjct: 1328 G--NQQQAASEKESC-ITQ— LKKELSE NINAVTLMKEELKEKKVEISSLSKQLTD 1378 

Query: 911 QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ LS ++ + S+ +E +L ++++ K + 

Sbjct: 1379 LNVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVD 1422 

Score = 329 (49.4 bits), Expect = 2.9e-25, P = 2.9e-25 
Identities = 226/941 (24%), Positives = 444/941 (47%) 

Query: 61 QALAFEESEVE--FGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

Q L E+ +++ S+ LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV 1 L YEEEMGNHNENT GEKL HLAQEQLALA 167 

+ QSL + + D+ ++E+ EN GE+ + + L 

Sbjct: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYT 227 
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++ EL ++ QS LL + + LQ +L + QE E D + + 



Sbjct: 


285 


QQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERL-QELEKIKD LHMAE 


340 


Query 


228 


SPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASC 


287 






+1 +++++++ Q +1 E + ++ L ++ E+ + +L++ 




Sbjct: 


341 


KTKL I TQLRDAKNL I EQLEQDKGM VIAETKRQM — HETLEMKEEE- 1 AQLRSRI KQM 


394 


Query 


288 


TATH RYPPSSSEEC--EDIKKILKHLQEQKDSQCLHVEEYQNLVKDL RVE 


335 






T R SE E+++K L Q+ ++++ E +K + R+ 




Sbjct: 


395 


TTQGEELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERIS 


454 


Query 


336 


t FA- V^FOKRNTMKDMMKT, — FT.DLHGLREETSAHIERKDKDIT I LOCRLOELOLEFTET 


392 






L+ +S K+ ++ D+MK E + L++ + RK++++T +LQ + EF E 




Sbjct: 


455 


LQQELSRVKQEVV-DVMKKSSEEQI AKLQKLHEKELARKEQELTK KLQTREREFQEQ 


510 


o 

y- 


3 93 


QKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 


452 






K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 




Sbjct: 


511 


MKVALEKSQ — SEYLKISQEKEQ QESLALEELELQKKAIL-TESENKLRDLQQE- 


561 


uery. 


4 53 


^KFAFrfcaT dafvcikt kn^t.ffakoofr t,a AOfiAAorKFFAAT.AGrHT.FnTnR-K 


506 






++ + L+ E L+ SL+E K Q + L A++ KE + H + + K 




Sbjct: 


5 62 


AETYRTRILELE-SSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLK 


620 


Quer 
uery. 


sen 




565 






Q+ L ++ Q+ Q E++- L +E EKE K + + E K LE 




Sbjct: 


621 


HQQDALWTEKLQVLKQQYQTEMEKL-REKCEQEKETLLKDKEII-FQAHIEE^4NEKTLEK 


678 


o 

uery. 


5 66 


^nKFKROT.OKTVAFOnMKMMnMT.nRTKHnURFnf^ T -KCTCT.FFDT.OF A.-TKT.T.FDKR — F. 


621 






D ++ +L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 




Sbjct: 


679 


LDVKQTELESLSSE LSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQ 


733 


Que ry : 


£11 


Hi vtfc vfufett Mcrrr rir pc^f trwvnvn vfm^pt^t fffm — fht raft nrr^Tni 


67 6 






Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ + L 




Sbjct: 


734 


QQVDSI IKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGEL 


793 


Que ry : 




cqci MKVHTcrKiuTnnT ukttt zi t nif f<it mqt nam nvaT ("itfFiirw vt dttttkitzi yd a t qr 

LbbLNtM N 1 oljij V J. IJU1jNI\Cj ±H.J_iIJ1M1iO J_iLYL j J_iy>llJijUn.rtljiJJ\IliJ\n I Lyi 1 ± 1 IV C.H I UrtljCi t\ 


736 






+ + K + Q +++ +E L LQ +L L+ E+ L TK+ + + + 




Sbjct: 


794 


QQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKL-LDLETERILL TKQVAEVEAQ 


848 


uery. 


737 


tfQSBfnn r>7 TDCiT FKT NFHVT < iP , TK^T.rM*)^T.TnTAFIfFCAri— — T.PPPTT1VPP 


785 






K C + DL Q LEK N SE + +SLTQ E K + +E+ + 




Sbjct: 


849 


KKDVCTELDAHKIQVQDLMQQLEKQN SEMEQKVKSLTQVYESKLEDGNKEQEQTKQI 


905 


Que ry : 


/ O 0 




843 






+ +K N L-H G Q+ E+E+ +E S +L +++ + +N K + +++ 




Sbjct: 


906 


LVEKENMILQMREG--QKKEIEILTQKLSAKEDSIHILNEEYETKFKNQEKKMEKVKQKA 


963 


u ry. 


844 


REFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 


899 






+E QE LK+ LL+ + + L++ L+ Q + + A+ 




Sbjct: 


964 


KEMQE TLKKKLLDQEAK LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGISD 


1016 


Query: 


900 


ANEKLGNQLREQVNYIAKLSG-EKDHLHSVMVH-LQQENKKLKK — EIEEKKMKAENTRL 


955 






A +L +EQ+ + ++ E + + S+ L Q+ ++L++ EI+ ++ + E L 




Sbjct: 


1017 


AVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIHEIQLQEKEQEVAEL 


1076 


Query : 


956 


CTKALGPSRTESTQREKVCGTLGKKGLPQD 985 








K L E + K L +G+ QD 




Sbjct : 


1077 


KQKIL-LFGCEKEEMNKEITWLKEEGVKQD 1105 




Score 


= 326 


(48.9 bits). Expect = 6.0e-25, P = 6.0e-25 




Identities = 


= 220/907 (24%), Positives = 444/907 (48%) 




Query: 


67 


ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILE KQTS 


123 






E+E G+S + QL Q +++ EL T+Y L++ + L+ Q+ 




Sbjct : 


123 


EAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 


182 


Query : 


124 


DLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKI ASLERSLNLYRD 


183 






D L +L+E+ + +++ H + E+ + E+ 1+ L+ ++L + 




Sbjct : 


183 


DKSL-RRIAELREE--LQMDQQAKKHLQ EEFDASLEE KDQYISVLQTQVSLLKQ 


233 


Query: 


184 


KYOS^T.^NTFTJ.FCOVKMT.OGFr.GGTMGOF-PFNTCG DH^KVR- T YTS PrMTOFHO 


236 






+ ++ N+++L+ + L+ + +E PE+ G D + V+ + T ++ + 




Sbjct : 


234 


RLRNGPMNVDVLK-PLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQE 


292 


Query: 


237 


ETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPS 


296 






KR E Q +Q L+ K A L ER + L K++ D T 




Sbjct: 


293 




34 6 


Query: 


297 


SSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELD 


356 






+ D K +++ L++ K + E +++L++E++QR++KM + 




Sbjct: 


347 


QLRDAKNL I EQLEQDKGM — VIAETKRQMHETLEMKEEEIA-QLRSRIKQMTTQGEE 


400 
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Query: 


357 


LHGLREETS-AHIERKDKDITILQCRLQE LQLEFTETQKLTLKKDKFLQEKDEMLQ 


411 






L +E++ A E +K ++ Q + +E L+ E E K T++K +E+ + Q 




Sbjct : 


401 


LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TIEKTSE-EERISLQQ 


457 


Query : 


412 


ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 


470 






EL + +V + + K E+++ K Q + E E+ KE Q+ +K+ + + + + Q +K 




Sbjct : 


458 


ELSRVKQEVVDVMKKSSEEQIAKLQKLH-EKELARKE--QELTKKLQTREREFQEQ-MKV 


513 


Query: 


471 


SLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQ-KGLLLD-KQKADTIQELQREL 


528 






+LE++ QEL Q++EAL L+ ++LD +Q+A+T + EL 




Sbjct : 


514 


ALEKS-QSEYLKI SQEKEQQESLALEELELQKKAILTESENKLRDLQQEAET YRTRI LEL 


572 


Query : 


52 9 


QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNDM 


587 






+ ES+E+S VLE++ +++ +K K +L+ +QD + 




Sbjct : 


573 


ES-SLEKSLQENKNQSKDLAVH-LEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 


630 


Query : 


588 


LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE— QLKKSKEHEKLMEGELEALRQEF 


644 






L +K Q++ E ++ K E QE LL+DK Q + +EK +E +L+ + E 




Sbjct : 


63 1 


LQVLKQQYQTEMEKLREKCE QEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTEL 


68 6 


Query : 


64 5 


KKKDKTLKE — NSR-KLEEENENLRAELQCCSTQLESSLNKY-NTSQQVIQDLNKE — IA 


698 






+ L E +R KLEEE L+ + +LE+ + + + N QQ + + KE ++ 




Sbjct : 


687 


ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSIIKEHEVS 


74 6 


Query: 


699 


LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 


749 






+Q+ E + Q QL+ L +++KH + E +A ++S A+ + D+ Q+ 




Sbjct : 


747 


IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 


806 


Query : 


750 


EKLNHVTS ETKSLQQSLTQTQEKKAQLEEEI IAYEERMKKLNTELRKLRGFHQESELEVH 


809 






+ H +TK+ ++ L Q Q+K LE E I +++ ++ + + + +++V 




Sbjct : 


307 


QSATH — EQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTELDAHKIQVQ 


864 


Query : 


810 


AFDKKLEEMSCQVLQWQKQHQN--DLKMLAAKEEQLREFQEEMAALKEMLL EDDKE 


863 






++LE+ + ++ Q K + K+ +EQ E +++ KEN++ E K+ 




Sbjct: 


865 


DLMQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ--EQTKQILVEKENMILQMREGQKK 


922 


Query: 


864 


PC-CLPQ-WSVPKDTCRLYRGNDQIMTNLE-QWAKQQKVANE — KLGNQLREQV-NYIAK 


917 






L Q S +D+ + N++ T + Q K +KV + ++ L++ + + + AK 




Sbjct: 


923 


EIEILTQKLSAKEDSIHIL — NEEYETKFKNQEKKMEKVKQKAKEMQETLKKKLLDQEAK 


980 


Query: 


918 


LSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKALGPSRTESTQREKV 973 








L K L + + L Q+ K+ E M N+ + A+ SR E+ Q+E++ 




Sbjct: 


981 


L KKELENTALELSQKEKQFNAKMLE — MAQANSAGISDAV — SRLETNQKEQI 1029 


Score 


= 318 


(47.7 bits), Expect = 4.4e-24, P = 4.4e-24 





Identities = 184/827 (22%), Positives = 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ + ++ L++ ++ -H D Q 

sbjct: 1323 LQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q +++ EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSISLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIA 172 

E K+ + H +KE ++ L + + ++ E+++L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD--EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNIEL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRIYTSPC 230 

L+ + + K + SN+E L+ Q + EL + Q+ E + + ++ Y 
Sbjct: 1497 CLKGEMEDDKSKMEKKESMLETELKSQTARIM-ELEDHITQKTIEIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH+E ++L + ++D+ ++E K+ L LE + +K + + 

Sbjct: 1556 DI-EHKELVQKLQHFQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + L+ + ++ +++++++ +L + E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEK EE 1664 

Query: 350 MMKLELDLHGLREETSAHI ERKDKDI TI LQCRLQELQLEFTETQKL — TLKKDKFLQEKD 407 

K + H E + ++ +++++ IL+ +L+ ++ +ET + + K E++ 
Sbjct: 1665 QYKKGTESH--LSELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEMTVK-EAKQDKSKE 455 

E +Q+ E+K++ +Q +L +KEK L EK++ +++ EM + + + K + 

Sbjct: 1723 EADSQGCVQKTYEEKISVLQRNLTEKEKLLQRVGQEKEETVSSHFEMRCQYQERLIKLEH 1782 

Query: 456 AECKAL--QAEVQKLKNSLEEAKQQERLAAQQAAQCK — EEAALAGCHLEDTQRKLQKGL 511 
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AE K Q+ + L+ LEE ++ L Q + + + A +LE+ +QK L 

Sbjct: 1783 AEAKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKTL 1842 

Query: 512 LLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELS — LELS EALRKLEN SDKE 569 

++K T Q L+++++ L +S + +++ +R +EEL+ E +AL++++ +K 
Sbjct: 1843 QEKELTCQILEQKIKEL — DSCLVRQKEV-HRVEMEELTSKYEKLQALQQMDGRNKP 1896 

Query: 570 KRQLQKTVAEQD MKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 

L++ E+ + +L ++ QH + E + Q+ K + ++ L+ 

Sbjct: 1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDLRML 1956 

Query: 626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNT 685 

KEH++ ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

Sbjct: 1957 RKEHQQ ELEILKKEYDQ EREEKIKQEQEDL — ELKHNST-LKQLMREFNT 2003 

Query: 686 S-QQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 744 

Q Q+L I ++A+L ++ Q+E +L IEDLR+A ++ 

Sbjct: 2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

Query: 745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMK — KLNTELRKLRGFH 801 

+ A E+ +T++ + LQ L + Q+K Q LE+E + + +L T+L + 

Sbjct: 2062 ILDAREE--EMTAKVRDLQTQLEELQKKYQQKLEQEENPGNDNVTIMELQTQLAQKTTLI 2119 

Query: 802 QESELEVHAFDKKLEEMSCQVLQWQK 827 

+S+L+ F +++ + ++ +++K 
Sbjct: 2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 

Score = 316 (47.4 bits), Expect = 7.1e-24, P = 7.1e-24 
Identities = 213/977 (21%), Positives = 454/977 (46%) 

Query: 4 eagerd-revsslnskllslqld-iknlhdvckrqrktlqdnqlcmeeamnsshdkkqaq 61 

E R+ +V S+ K L+ Q + ++ +H++ +QK++L++ +++ 
Sbjct: 1034 EVHRRELNDVISIWEKKLNQQAEELQEIHEI-QLQEKEQEVAELKQKILLFGCEKEEMNK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE — LEFHTEELQTSYYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L+ + 

Sbjct: 1093 EITWLKEE GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLNKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV 1 L YEEEMGNHNENTGEKLH L AQEQLALAGDK I 171 

+ L++Q +L +L K K E+ + +E +++ EK + + E +L K+ 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKL 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS — NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI YTS 228 

+ L L++ K ++ L EL+ L 1 +++ K + 

Sbjct: 1210 SEELAIQLDICCKKTEALLEAKTNELINISSSKTNAILSRI— SHCQHRTTKVKEALLIK 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + E Q L+ +Q+ + Q ++ + +++ A +LV E+E L 
Sbjct: 1268 TCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKENQIKSMKADIESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVS 340 

Q++ + SECI++KLE++LEE +K+ +VE+ ++S 
Sbjct: 1324 QKEGGN QQQAASEKESC — ITQLKKELSENINAVTLMKEE LKEKKVEISSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQEL — QLEFTETQKLT-L 397 

+Q ++ + + L S+ ++ D++ L ++Q+L +++ +K++ L 

Sbjct: 1374 KQLTDLNVQLQN-SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKI5AL 1432 

Query: 398 KK-DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV KEAKQDKS 453 

++ D + + E ++ + + TQ QN++ + + +LE + A E + + KE ++ 
Sbjct: 1433 EQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQN 1492 

Query: 454 KEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLE-DTQRKLQKGLL 512 

K +C + E K K +E+ + L +Q A + E + +E ++ ++ K 

Sbjct: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKT I EI ESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

++QK +EL ++LQ Q+ + +++ L ++ +LE KE 

Sbjct: 1552 -NQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQ-GSIKCKLEEDLQEATKLL EDKREQLKKSK 627 

+ +V ++ ++ + DR++ + + +K K E+ + K L E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGT 167 0 

Query: 628 EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE +++ L+E + +E ++E L A+ T+ E + ++ 

Sbjct: 1671 ESHL SELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQ 1727 

Query: 683 YNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 739 

T ++ I L + + +KE L+ Q+K H+ +E L A 

Sbjct: 1728 GCVQKTYEEKISVLQRNLT-EKEKLLQRVGQ-EKEETVSSHFEMRCQYQERLIKLEHAEA 1785 
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Query: 740 ACQDDLTQALEKLNHVTSET — KSLQQSLTQTQEKKAQLEEEII AYEERMKKLNTELRKL 797 

+D Q++ + H+ E K+ + SL Q + + + I ++ ++ + +++K 
SbjCt: 1786 KQHED — QSM — IGHLQEELEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKT 1841 

Query: 798 RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

Sbjct: 1842 L QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKYEKLQALQQMDGRNKPT 1897 

Query: 857 -LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E PK + ++ + L A+++K +KLG ++ + 
Sbjct: 1898 ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK QKLGKEIVRLQKDL 1953 

Query: 916 AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

Sbjct: 1954 RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNST--LKQLMREFNTQLAQKEQ 2010 

Score = 301 (45.2 bits), Expect =■ 2.9e-22, P = 2.9e-22 
Identities = 221/952 (23%), Positives = 441/952 (46%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CMEEAMNSSHD- 5 6 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 
Sbjct: 1160 LKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 1219 

Query: 57 — KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT— -EELQTSYY 109 

KK L + +E + SSK L ++ + + +++ L T EL+ 

Sbjct: 1220 CCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLR 1279 

Query: 110 SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 

Sbjct: 1280 QLTEEQNTLNI S FQQAT HQLEEKENQI KSMKADI ESLVTEKEALQKEGGNQQQA 1333 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI Y 226 

A -t-K E + + + +++ + L++ ++K + E+ + Q + V+ + 

SbjCt: 1334 ASEK ESCITQLKKELSENINAVTLMKEELKEKKVEI SSLSKQLTD LNVQLQ 1384 

Query: 227 TSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFAS 286 

S + ++ + ++ + D +Q+L K+ + L E+ AL ++ D+++ 
Sbjct: 1385 NSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKV DTLSKEKISALEQVD-DWSN 1440 

Query: 287 CTATHRYPPSS— SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+ + S ++ +K++ L E K + +E NL+K+ R + L+ 

Sbjct: 1441 KFSEWKKKAQSRFTQHQNTVKELQIQL-ELKSKEAYEKDEQINLLKEELDQQNKRFDCLK 1499 

Query: 339 AVSEQKRNIM-KDMMKLELDLHGLRE ETSAHIERKDKDITILQCRLQEL-QLEFTET 392 

E ++ M K LE +L E HI +K +1 L L+ Q + E 

Sbjct: 1500 GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 1559 

Query: 3 93 QKLTLKKDKFLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD ++E E+K+ ++N + + ELE ++ + ++VK 
Sbjct: 1560 KELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVK 1616 

Query: 450 QDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQK 509 

SKE E KAL + ++ S + + +R A Q+ A K++ +E+ + + +K 

Sbjct: 1617 SKEEELKALEDRLES — ESAAKLAELKRKAEQKIAAIKKQLL SQMEEKEEQYKK 1668 

Query: 510 GLLLDKQKADT-IQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 

G + +T +QE +RE+ +L+++ EQ+ + S+ A+E+D 

Sbjct: 1669 GTESHLSELNTKLQEREREVHILEEKLKSVESSQSETL— IVPRSAKNVAAYTEQEEADS 1726 

Query: 569 E KRQLQK-TVAEQDMKMND-MLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQ 622 

+ K +K +V ++++ + +L R+ Q +E+ ++ E Q +L+ K E 

Sbjct: 1727 QGCVQKTYEEKISVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI --KLEH 1782 

Query: 623 LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ +K+HE +MGLEL++KK +++ KE N++A+ LE 
Sbjct: 1783 AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QNLE 1832 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKAL— QKEKHYLQTTITKEAYDALSR-K 737 

N ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

Sbjct: 1833 NVFDDVQKTLQE— KELTCQ— ILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQ 1888 

Query: 738 SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEIIAYEERMKKLNTEL — 794 

++ T+ LE+ S++ +Q L E + LE ++ E +KL E+ 

Sbjct: 1889 QMDGRNKPTELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVR 1948 

Query: 795 — RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAA 852 

+ LR +E + E+ K+ ++ + ++ Q+Q +LK + ++ +REF ++A 
Sbjct: 1949 LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLMREFNTQLAQ 2007 

Query: 853 LKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQV 912 
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++ L KE Q V + + Q TN Q K K+A EK + R 

Sbjct: 2008 KEQELEMTIKETINKAQ-EVEAELLESH QEETN — QLLK — KIA-EKDDDLKRTAK 2057 

Query: 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKE^EEKKMKAEN 952 

Y L ++ + + + LQ + ++L+K+ ++K + EN 
Sbjct: 2058 RYEEILDAREEEMTAKVRDLQTQLEELQKKYQQKLEQEEN 2097 

Score = 300 (45.0 bits), Expect = 3.7e-22, P = 3.7e-22 
Identities = 195/961 (20%), Positives = 435/961 (45%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKN— LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 

+KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

Sbjct: 657 LKDKEI IFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLRQYQSI 117 

+ E E + K H +Q+ + K+ V +Q+ + +++ L++ 
Sbjct: 715 DKMK QELEAKMDEQKNHHQQQVDS 1 1 KEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

Query: 118 LEKQTSDLVLLHHHCKLKEDEVILYEEEMG NHNENTGEKLHLAQEQLAL AGDK I ASL 174 

L++ + + L K E E+ ++ ++ T E+ +EQLA K+ L 

SbjCt: 772 LKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDL 831 

Query: 175 ERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQ-EPENKGDHSKVRI YTSPCMIQ 233 

E L + + + + ++ + ++ +M Q E +N KV+ T 

Sbjct: 832 ETERILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQ-VYES 890 

Query: 234 EHQETQKRLSEVWQKVSQQDDLIQELRN KLACSNALVLEREKALIKLQADFASCTA 289 

+ ++ K + Q + ++++ + I ++R ++ + +E ++ L ++ + 
Sbjct: 891 KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDSIHILNEEYET 947 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

++ + ++ E +K+ K +QE + L E L K+L +S +++ + 

Sbjct: 948 — KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA— KLKKELENTALELSQKEKQFNAK 1002 

Query: 350 MMKL-ELDLHGLREETSA-HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKD 407 

M+++ + + G+ + S +K++ ++ + +EL + +K ++ + LQE 

SbjCt: 1003 MLEMAQANSAGI SDAVSRLETNQKEQIESLTEVHRRELNDVI SIWEKKLMQQAEELQEIH 1062 

Query: 408 EM-LQELEKKLTQVQNSLLK KEKELEKQQCMATE LEMTVKEAKQD-KSKEAEC 458 

E+ LQE E+++ + + +■ +L +++E+ K+ E + T+ E ++ K K A 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 459 KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+L + KLK LE+ + + ++ +E+ E+ 4RK+ + L K K 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE— LTSKLKT 1180 

Query: 519 DTIQELQRELQMLQKES3MAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

T +E Q +K + E + +K EEL+++L +K E + K + + 

Sbjct: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN— ELIN 1237 

Query: 579 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + + QL++ E + + 

Sbjct: 1238 ISSSKTNAILSRISHCQHRTT KVKEALLIKTCTVSELEAQLRQLTEEQNTLNISF 1292 

Query: 638 EALRQEFKKKD KTLKENSRKLEEENENLR AELQCCSTQLESSL 680 

+ + ++K+ K+ + K + L E E L+ +E + C TQL+ L 

Sbjct: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENI 1352 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK +++ K+ YD + 

Sbjct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 740 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELR-KL 797 

DL+ ++ L+ S + + + E K + + ++ +K+L +L K 

Sbjct: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 798 RGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +++ E +++ ++L++ + + + + ++D + KE L E++A+E 

Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 857 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIA 916 

LED + + T + N+ ++ H Q QK K +L +++ + 
Sbjct: 1530 -LEDH ITQKTIEIESLNE-VLKNYNQ QKDIEHK ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

+L EKD+ ++ L+ + +K E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score = 298 (44.7 bits), Expect = 6.1e-22, P = 6.1e-22 
Identities = 207/886 (23%), Positives = 412/886 (46%) 
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Query: 47 MEEAMNSSHDKKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQT 106 

+EN++ Q EEE+SK ++L+ LQ+E + 

Sbjct: 1281 LTEEQNTLNISFQQATHQLEEKENQIKSMKA DIESLVTEKEALQKEGGNQQQAASE 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ +++ + N + L++++ A 

Sbjct: 1337 KESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI Y 226 

I + SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWK-K 1447 

Query: 227 TSPCMIQEHQETQKRLS EVWQKVSQQDDLIQEL — RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI--TILQCRLQEL 385 

LV+ L+ + + E+K N +K+ + L L A +E K K++ L + +E 

SbjCt: 1562 LVQKLQ-HFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+L+ E + L+ + + E+ ++ E+K+ ++ LL + +E E+Q TE ++ 
Sbjct: 1621 ELKALEDR LESES-AAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGTESHLSE 1676 

Query: 446 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

K + +E E L+ +++ +++S E R A AA + EEA GC + + 

Sbjct: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLI VPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLE 564 

K+ +L + + + LQR Q +KE +++ + R + +E ++L A K 

Sbjct: 1736 EKIS — -VLQRNLTEKEKLLQRVGQ— EKEETVSSHFEM-- RCQYQERLIKLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG— SIKCK— LE EDLQ E 611 

LQ+ + E++ K + ++ +H +E G +1+ K LE +D+Q E 
SbjCt: 1789 EDQSMIGHLQEELEEKNKKYSLIV--AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 184 6 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T ++LE K ++L +K + E+E L +++K '+ + R +L EEN 

Sbjct: 1847 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMDGRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLN-KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L+KE H + 

Sbjct: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEIVRLQKDL-RMLRKE-HQQEL 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI IAYE 784 

I K+ YD R+ Q+ + LE L H ++ + +++ TQ +K+ +LE I + 
Sbjct: 1965 EILKKEYDQ-EREEKIKQEQ--EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E+K +L HQE E + KK+ E + + K+++ ++L A+EE++ 
Sbjct: 2018 ETINKAQEVEAELLESHQE ETNQLLKKIAEKDDDLKRTAKRYE EILDAREEEMT 2071 

Query: 845 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

++ EL+++ LQ PD + ++TLQK + + + K 

SbjCt: 2072 AKVRDLQTQLEELQKKYQQK — LEQEENPGNDNVTIM ELQTQLAQ — KTTLISDSK 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ++ + +L + ++++ V HL 
Sbjct: 2124 LKEQEFREQIHNLEDRLKKYEKNVYATTVGHL 2155 

Score = 280 (42.0 bits), Expect = 5.2e-20, P = 5.2e-20 
Identities = 209/938 (22%), Positives = 432/938 (46%) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKN-LHDVC-KRQRKTLQDNQLCMEEAM-NSSHDKK 58 

+ + ++ +E+ +L KLL + +K L + + +K Q N +E A NS+ 
Sbjct: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

L + E + S + H R+L + + + +++L EELQ + ++ + 
Sbjct: 1017 AVSRLETNQKE-QI ESLTEVHRRELNDV I SI WEKKLNQQAEELQ-E I HE I QLQEK — 1069 

Query: 119 EKQTSDLV — LLHHHCKLKE-DEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLE 175 

E++ ++L +L C+ +E ++ I + +E G + T +L +Q + + +A E 
Sbjct: 1070 EQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHVNSLAQDE 1129 

Query: 176 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI— MGQEPENKGDHSKVRI YTSPCMIQ 233 
L + +K + L N L E LQ +L + + +E + K ++ T+ Q 
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Sbjct: 1130 TKLKAHLEKLEVDL-NKSLKENT — FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

Query: 234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL— AC — SNALVLEREKALIKLQADFA 285 

H+++ K L + K + L +EL +L C + AL+ + LI + + 
Sbjct: 1187 SLKSSHEKSNKSLED KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

Query: 286 SCTATH-RYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 

+ + + + + ++I + ++Q + E QN + + E+K 

SbjCt: 1244 NAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLRQLTEEQNTLNISFQQATHQLEEKE 1303 

Query: 345 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 

N +K M K +++ L +E + + + + + +L+ E +E +TL K++ 

Sbjct: 1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

Query: 4 03 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQ 4 62 

L+EK + L K+LT + N L+ L +++ +L EK+ ++ L 

Sbjct: 1362 LKEKKVEISSLSKQLTDL-NVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQ — DLS 1418 

Query: 463 AEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+V L A +Q + + ++ K++A ++T -f+LQ L L ++A 

Sbjct: 1419 FKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD 1478 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

Sbjct: 1479 EQINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLET ELKSQTARIMELEDHIT 1535 

Query: 57 9 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

++ +++ + + +K+ + +Q 1+ K 1 t LQ +L E+K ++K+++E +E ++ 
Sbjct: 1536 QKTIEIESLNEVLKN-YNQQKDIEHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQV 1594 

Query: 638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 

+++ E+KKL+ + +++E L+A L+ +LES S K ++ + ++ 

SbjCt: 1595 YSMKAELETKKKELEHVNLSVKSKEEELKA-LE DRLESESAAKL AELKRKAEQK 1647 

Query: 697 IALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 756 

IA K+ L+S Q++ +KE+ Y + T + L+ K + ++ EKL V 

Sbjct: 1648 IAAIKKQLLS QME EKEEQYKKGT— ESHLSELNTKLQEREREVHILEEKLKSVE 1699 

Query: 757 S ET KSLQQSLTQTQEKKAQLEEEII-AYEERMKKLNTELRKLRGFHQESELEV 808 

S ET +S + T++++A + + YEE++ L L EE + 

Sbjct: 1700 SSQSETLIVPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

Query: 809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 

++ EE + + Q+Q L L E + E Q + L+E L E +K+ + 

Sbjct: 1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSLI V 1812 

Query: 869 QWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANSK-LGNQLREQ-VNYIAKLSGEKDHL 925 

V K+ + N Q NLE + QK EK L Q+ EQ + + + + 

Sbjct: 1813 AQHVEKEGGK NNI QAKQNLEN VFDDVQKTLQEKELTCQI LEQKI KELDSCLVRQKEV 1869 

Query: 926 HSV-MVHLQQENKKLK 940 

H V M L + +KL+ 
Sbjct: 1870 HRVEMEELTSKYEKLQ 1885 

Score = 227 (34.1 bits), Expect = 2.5e-14, P = 2.5e-14 
Identities = 160/716 (22%), Positives = 318/716 (44%) 

Query: 233 QEHQETQKRLSEVWQKVSQQDDLIQE-LRNKLACSNALV-LEREKALIKL-QADFASCTA 289 

+E +TQ ++ +V + L + ++ L S++ LR+ L + DSTA 
Sbjct: 53 RESGDTQSFAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTA 112 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 349 

+ P E ED+ L +++ QL + + R+ + + + ++ 

Sbjct: 113 SFDPPSDMDSEAEDLVGNSDSLNKEQLIQRLR — RMERSLSS YRGKYSELVTAYQMLQRE 170 

Query: 350 MMKLELDLHGLREETSAHIERKDKDIT-ILQCRLQELQLEFTETQKLTLKKDKFLQEKDE 408 

KL+ G+ ++ +DK + I + R +ELQ++ + L + D L+EKD+ 

Sbjct: 171 KKKLQ GILSQS QDKSLRRIAELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAE V 465 

+ L+ +++ ++ L ++ + + +LE + ++++ E++ + + + V 

Sbjct: 220 YISVLQTQVSLLKQRLRNGPMNVDVLKPLP-QLEPQAEVFTKEENPESDGEPVVEDGTSV 278 

Query: 466 QKLKNSLEEAKQQERLA--AQQAAQC-KEEAALAGCHLEDTQRKLQKGLL-LDKQKADTI 521 

+ L+ + K+QE L ++ Q KE+ L E Q +L + L L+K K + 

Sbjct: 279 KTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIKDLHM 338 

Query: 522 QELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQD 581 

E++L+ ++E++ +E ++EL E +R K+Q 

Sbjct: 339 AEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMHETLEMKEEEIAQLRSRIKQMTTQG 398 
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Query : 


582 


MKMNDMLDRIKHQHREQGSIKCKLEEDLQEAT-KLLEDKREQLK KSKEHEKL-MEGE 


636 






+ + + ++ + E+ + +EA KL + EQ+K K+ E E++ ++ E 




Sbjct: 


399 


EELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERI SLQQE 


458 


Query : 


637 


LEALRQEFKK-KDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNK 


695 




L ++QE K+ +E KL++ +E EL +L L T ++ Q+ K 




Sbjct: 


459 


LSRVKQEWDVMKKSSEEQIAKLQKLHEK ELARKEQELTKKLQ TREREFQEQMK 


512 


Query : 


696 


EIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLN-H 


754 






+AL+K L+ +K Q+ + + K+A S DL Q E 




Sbjct: 


513 


-VALEKSQSEYLKISQEKEQQESLALEELELQKKAILTESENKLR DLQQEAETYRTR 


568 


Query : 


755 


VTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEV — HAFD 


812 






+ SL+ + SL QE K Q ++ + E K N E+ + H+ +ELE H D 




Sbjct : 


569 


ILELESSLEKSL QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHQQD 


624 


Query : 


813 


KKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLRE FQEEMAALKENLLED-DK 


8 62 






E QVL+ +Q+Q +++ L K EQ +E FQ + + E LE D 




Sbjct: 


625 


ALWTE-KLQVLK--QQYQTEMEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLEKLDV 


681 


Qu e r y ; 


863 


EPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLSGEK 


922 






+ LS++++++L Q ++L ++ EQ N+ + 




Sbjct: 


682 


KQTELE — SLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI 


739 


Query: 


923 


DHLHSVMVHLQQENKKLKKEIEEKKM 948 








H V + Q+ K LK +1 + ++ 




Sbjct: 


740 


IKEHEVSI — QRTEKALKDQINQLEL 763 




Score 


= 183 






Identities = 


-- 132/584 (22%), Positives = 251/584 (42%) 




Query: 


409 


MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK-QDKSKEAECKALQAEVQK 


4 67 






M ++L++K+++ Q L + + +T M + + + + E + Q 




Sbjct: 


1 


MFKKLKQKISEEQQQLQQALAPAQASSNSSTPTRMRSRTSSFTEQLDEGTPNRESGDTQS 


60 


Query: 


4 68 


LKNSLE-EAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA — DTIQEL 


524 






L+ EL + ++ + + R+ L LD A D ++ 




Sbjct: 


61 


FAQKLQLRVPSVESLFRSPIKESLFRSSSKESLVRTSSRESLNRLDLDSSTASFDPPSDM 


120 


Query: 


525 


QRELQMLQKESSMAEKEQTSNRKRVEELSL ELSEALRKLENSDKEKRQLQKTVAE 


579 






E + L S KEQ R R E SL + SE + + +EK++LQ ++ + 




Sbjct: 


121 


DSEAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQ 


180 


Query: 


580 


-QDMKMNDMLDRIKHQHREQGSIKCKLEE DLQEATK LLEDKREQLKKSKEHEKL 


632 






QD + + + + +q + f; EE L+E + +L+ + LK+ + + 




Sbjct: 


181 


SQDKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQVSLLKQRLRNGPM 


240 


Query: 


633 


MEGELEALRQ-EFKKKDKTLKENSRKLEE ENENLRAELQCCSTQLESSLNKYNTSQQ 


638 




L+ L Q E + + T +EN E E+ L+ +++ N + + 




Sbjct: 


241 


NVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKE 


300 


Query: 


689 


VIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQA 


748 






jg l +lq QLD+ LQ E ++ E +++ A +L + 




Sbjct: 


301 


TIQSHKEQCTLLTSEKEALQEQLDERLQ-ELEKIKDLHMAEKTKLITQLRDA— KNLIEQ 


357 


Query : 


749 


LEK-LNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELE 


807 






LE+ V -t-ETK + + +T E K EEEI R+K++ T+ +LR Q+ + E 




Sbjct: 


358 


LEQDKGMVIAETK RQMHETLEMK EEEIAQLRSRI KQMTTQGEELR — EQKEKSE 


409 


Query: 


808 


VHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQ EEMAALKENLLEDDKE 


863 






AF EE+ + QK + K+A +EQ++ + EE +L++ L +E 




Sbjct: 


410 


RAAF EELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQELSRVKQE 


4 65 


Query: 


8 64 


PCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEKLGNQLR EQVNYIAK 


917 






+ + S + +L + +++ + EQ K+ + + Q++ Q Y+ K 




Sbjct: 


466 


VVDVMKKSSEEQIAKLQKLHEKELARKEQELTKKLQTREREFQEQMKVALEKSQSEYL-K 


524 


Query: 


918 


LSGEKDHLHSVMVH-LQQENKKLKKEIEEK KMKAENTRLCTKALGPSRTESTQREK 


972 




+ S EK+ S++ L+ + K+ EEK + +AE R L S +S Q K 




Sbjct: 


525 


ISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILELESSLEKSLQENK 


584 




Pedant information for DKFZphtes3_lgl3, frame 1 





Report for DKFZphtes3_lgl3 . 1 



[LENGTH] 1007 

[MW] 117480.77 

[pi] 5.90 
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15 



[HOMOL] 
0.0 

[FUNCAT] 

[FUNCAT] 
5e-15 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[ FUNCAT ] 

[ FUNCAT ] 
repair) 

[FUNCAT] 

[FUNCAT] 
jannaschii, 

[ FUNCAT ] 

[ FUNCAT ] 
MYOl - myos 

[FUNCAT] 
myosin-1 

[ FUNCAT ] 

[ FUNCAT ] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 



TREMBL:AF092090_1 product: "cpl51"; Rattus norvegicus cpl51 mRNA, partial cds. 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-15 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

09.10 nuclear biogenesis [S. cerevisiae, YDR356w] le-11 

30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] le-11 
03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-11 
30.10 nuclear organization [S. cerevisiae, YKR095w] le-08 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] le-08 

99 unclassified proteins [S. cerevisiae, YLR309c] le-08 

1 genome replication, transcription, recombination and repair [M. 
MJ1322] 4e-06 

09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] 9e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
in-1 isoform] 3e-04 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

oform] 3e-04 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-04 

98 classification not yet clear-cut [S. cerevisiae, YJR134c] 5e-04 
3.6.1.32 Myosin ATPase le-16 
nucleus 3e-10 
phosphotransferase 6e-09 
duplication 2e-06 
citrulline 2e-12 
tandem repeat le-16 
endocytosis 2e-13 
heart 8e-13 

transmembrane protein le-13 

serine/threonine-specif ic protein kinase 6e-09 
zinc finger 2e-13 
metal binding 2e-13 
DNA binding 4e-12 
muscle contraction le-16 
acetylated amino end le-11 
actin binding le-16 
mitosis 5e-15 
microtubule binding 5e-15 
ATP le-16 

thick filament le-16 
phosphoprotein 4e-16 
skeletal muscle 2e-14 
calcium binding 2e-12 
alternative splicing le-16 
coiled coil le-16 
P-loop le-16 
heptad repeat 3e-10 
methylated amino acid le-16 
immunoglobulin receptor 2e-06 
peripheral membrane protein 2e-13 
cardiac muscle 8e-13 
hydrolase le-16 
microtubule 3e-10 
muscle 8e-13 
EF hand 2e-12 
cytoskeleton 2e-15 
hair 2e-12 

calmodulin binding 2e-13 
Golgi apparatus 3e-10 
myosin heavy chain le-16 

conserved hypothetical P115 protein le-07 
centromere protein E 5e-15 

unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 
calmodulin repeat homology 2e-12 
myosin motor domain homology le-16 
alpha-actinin actin-binding domain homology 2e-07 
plectin 2e-07 
trichohyalin 2e-12 
pleckstrin repeat homology 8e-08 
ribosomal protein S10 homology 2e-07 
giantin 3e-13 

protein kinase homology 6e-09 

protein kinase C zinc-binding repeat homology 8e-08 
kinesin motor domain homology 5e-15 
human early endosome antigen 1 2e-13 
M5 protein le-07 
LEUCINE_ZIPPER 7 
MYRISTYL 2 
CAMP_PHOSPHO_SITE 2 
CK2 PHOSPHO SITE 20 
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[PROSITEJ TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 16 

[PROSITE] ASN_GLYCOS YLAT I ON 2 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 15.00 % 

[KW] COILED_COIL 42.40 % 



SEQ MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQA 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRKIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ REETSAHIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCC CCCCCCCCCCCCCC 

SEQ QNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG . . .xxxxxxxxxx xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCC 

SEQ EQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCC 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ IAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 

SEQ NEKLGNQLREQVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKAL 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccccccccccccccccccccccccc 

SEQ GPSRTESTQREKVCGTLGWKGLPQDMGQRMDLTKYIGMPHCPGSSYC 

SEG 

PRD cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 

COILS 



Prosite for DKFZphtes3_lgl3 . 1 
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(No Pfam data available for DKFZphtes3_lgl3 . 1) 



DKFZphtes3_lkll 



group: cell structure and motility 
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DKFZphtes3_lkll encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-binding protein (ENC-1) . 

Ectoderm-neural cortex-1 protein (ENC-1) is an early and highly specific marker of neural 
induction in vertebrates. The protein is related to the kelch family proteins and is expressed 
during early gastrulation in the prospective neuroectodermal region of the epiblast and later 
in development throughout the nervous system (NS) . ENC-1 functions as an actin-binding protein 
organising the actin cytoskeleton during neural differentiation and development of the NS . 
The novel protein is highly similar to ENC-1. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 



strong similarity to mouse ENC-1 

complete cDNA, compete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 3525 bp 

Poly A stretch at pos. 3515, polyadenylation signal at pos . 3499 



1 GGTGGAGAGC CGGCCGACGG 

51 GGGCTGCCGG GAGTGGTCTC 

101 CGGCACTGGC GCACCATGTC 

151 GAGCAGCACG GGGTCCATGA 

201 CGGACTGTGT GCTGGCCCAC 

251 ACCGACGTCA CACTCTGGGC 

301 CGTGCTGGCC GCCTCTAGCC 

351 TTCGGGAGAG CCGGGATGAC 

401 GAGGTGCTGG AGCTGCTGCT 

451 CAACGAGGAG AACGCTGAGT 

501 TCCACGATGT GCGGGATGCT 

551 CCCTCCAACT GCCTGGGCAT 

601 CCGGCTGTAT GAGTTCTCCT 

651 TGAGGCAGAG CGAGGACTTC 

701 CTCATCTCGA GTGATGAGCT 

751 GGCCATCCTC CAGTGGGTGA 

801 TGCCCGAGCT CCTCCGCAGC 

851 CTGCAGGAGG CCGTCTCCAG 

901 CAAGCTTATC ATGGATGAGG 

951 ATGATGGCGT GGTCACCAGC 

1001 ACGCTACTCA TCCTGGGGGG 

1051 GGTGGACCAC AAGGCCAAGG 

1101 CCCGGAAGGA GTTCAGCGCC 

1151 GGGGGCAGGG GCTCCGAGAA 

1201 CACCGTACAT GAGGAATGGT 

1251 TTGGCCATGG CTCAGCTGAG 

1301 CACACATCCC TGGCAGGGGT 

1351 ACAAGTGGAG AAATACGACC 

1401 CCTTGCGGGA TGGCGTCAGC 

14 51 CTCTTTGTTT TCGGAGGAAC 

1501 CCAGTGCTAT GACCCCTCGG 

1551 CCCAGCCTTG GCGGTACACA 

1601 ATCATGGGAG GTGACACGGA 

1651 CTGTGAGACC AACCAGTGGA 

1701 TGTCCTGCCA TGCCCTGGCT 

1751 TACTTTGGGA CCCAGAGGTG 

1B01 AGATACATGG AACTGCATCA 

1851 CCTTTGTCAG CACCTGGAAG 

1901 CCCAGCCAGA CCGCGGCCTT 

1951 CACAGCGGGA GCTAAGCCGG 

2001 GGCCCTGCCA GCTCTGGGGA 

2051 GCAAGAGAAG AGAAGCATCT 

2101 GCTTTGCAGT GGTTTGTGGG 

2151 CCACCAGGAC TGACCCTGGC 

2201 AGATCACCTG TTTGGCAGGT 

2251 GGAGGCGCCC CGGGTGGGCT 

2301 CCCTCCTGGC CTGCCCTGCT 

2351 CTGGGCCTGG GAAACTAGGT 

2401 AGACAGATTT TTTAAGGTGC 

2451 ATGAGGCCTT ATTAGCAAAG 

2501 CTTCCACAAA GCTGTAAGTC 

2551 GCTGTGGCCC GGTGGGGACA 

2601 GCCTGCAGCA GACTCAAGGC 

2651 CCCCTCCTCA GAGCCCACCC 

2701 ACCTGCCAAC AGCACTGGGG 



GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 
TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 
GGTCAGTGTC CATGAGACCC GCAAGTCGCG 
ACGTCACCCT CTTCCACAAG GCCTCCCACC 
CTCAACACGC TTCGCAAGCA CTGCATGTTC 
GGGCGACCGT GCCTTCCCCT GTCACCGTGC 
GCTATTTTGA GGCCATGTTC AGCCATGGCC 
ACTGTCAACT TCCAGGACAA CCTGCACCCG 
GGACTTTGCC TACTCCTCAC GCATCGCCAT 
CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 
GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 
GATGCTGCTC TCGGACGCCC ACCAGTGCCG 
GGCGCATGTG CCTGGTGCAC TTTGAGACGG 
AACAGCCTGT CCAAGGACAC ACTGCTGGAC 
GGAGACCGAG GACGAGCGGG TGGTCTTCGA 
AGCACGACCT GGAGCCACGG AAGGTCCACT 
GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 
CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 
CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 
CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 
CCAGACCTTC ATGTGTGACA AGATCTACCA 
AGATCATCCC CAAGGCCGAC CTGCCCAGCC 
TCAGCGATCG GCTGCAAGGT CTATGTGACG 
CGGGGTCTCC AAGGATGTCT GGGTGTACGA 
CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 
CTGGAGAACT GCCTCTATGT GGTGGGGGGA 
CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 
CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 
AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 
CAGCATCCAC CGGGACATGG TGTCCAAGGT 
AGAACAGGTG GACGATCAAG GCCGAGTGCC 
GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 
ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 
CGCGGATTGG GGACATGACT GCCAAGCGCA 
TCCGGCAACA AGCTCTATGT GGTCGGGGGC 
TAAGACTCTG GACTGCTATG ACCCCACTTC 
CCACAGTGCC CTACTCACTT ATCCCCACGG 
CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 
CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 
CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 
GCAGCAGCCT TGGGCTGTTC TGAGCTTTAG 
CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 
AAGACATACC TCCCAGAGGG GCATGGACTG 
GTCGGGGAGA AGGACACTTG CAGAGCCTTG 
CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 
TTGGGGCTGC GGCACTGCCA CACATCCTTT 
GGGGCTCTAC TGCCATCTAT AGATGGTGTC 
TCCCAGGGGT TGAGACCAGA AAGGTGACCA 
AG AAAC TGCA GGGGGGCCTC AGTGACATCC 
GACACCCAGA CCTCCAAGGT TTGTGGGCCC 
CCAGCCCACC TACTCAGGGC CTTGCTCAGT 
CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 
TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 
TGAGAGGCAG CAGTGACCCC CATGGCACAC 
GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 
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2751 AAGACCAGGA GCAGCTGTGA GCTGGAGACA GCAGAGGGAC CCCAGGGTGT 

2801 CCCCTGCAGA TCCCACCAGG GCCGCATCCA TCTCAGTGTG GAGGACAGTG 

2851 ACGGGACCCT CACCATCCTC TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 

2 901 CTCCAAGATA AGAATGGCCC CGAGAGAACT GCTGAACATT TGTTCATTGC 

2951 TGTCACCTCC TGAGTCACTG GGGTCCCTCA CCAGCACCTC CCTGACACCT 

3001 GGGCTATGGA GAGGTTGGCG CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 

3051 TCACTCCCAA GCCACCATTT GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 

3101 GACTGGGCAG GGTGTCCAAA CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 

3151 ACACTGCTTG GCTGGCTCAA GATTAGGGCC GCGGAGGGGG CTGTGCACAT 

3201 ACCAGTTACT TAAGCAGCCA CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 

3251 TGGAGGCCTC TTGGGGGTGG GACCTTTGGG CAGGGTTTGC CCACTGACGC 

3301 GCCCGCCATG GGGCACTGGC TGCATGGGGC TCCTTGGACC CTGTAGAGCC 

3351 AGCAGGAGCC TGGCCGCGGG GACTGCAGGG AGGGTGCCTG GACCCGTGGG 

3401 GTTGCTTCAT TGAGATAAAG CACACTTATC ACATAGCACA AAGGACGTGC 

3451 CATGGTGCTT TCCCCAAAAG TTGTGTTGCT TTTATCAGTT TTCTAACTTA 

3501 ATAAAAAGAG TTGAGAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647 : 

ENC-1: a novel mammalian kelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/B, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal dif f erentiati 



Peptide information for frame 2 



ORF from 116 bp to 1882 bp; peptide length: 5B9 
Category: strong similarity to known protein 
Classification: Cell structure/motility 



1 MSVSVHETRK 
51 WAGDRAFPCH 
101 LLDFAYSSRI 
151 GMMLLSDAHQ 
201 ELETEDERVV 
251 SSEALLMADE 
301 GGQTFMCDKI 
351 ENGVSKDVWV 
401 GVFPASPSVS 
451 GTSIHRDMVS 
501 TEFTAASAYR 
551 RCKTLDCYDP 



SRSSTGSMNV 
RAVLAASSRY 
AINEENAESL 
CRRLYEFSWR 
FEAILQWVKH 
RTKLIMDEAL 
YQVDHKAKEI 
YDTVHEEWSK 
LKQVEKYDPG 
KVQCYDPSEN 
FDCETNQWTR 
TSDTWNCITT 



TLFHKASHPD 
FEAMFSHGLR 
LEAGDMLQFH 
MCLVHFETVR 
DLEPRKVHLP 
RCKTRILQND 
IPKADLPSPR 
AAPMLIARFG 
ANKWMMVAPL 
RWTTKAECPQ 
IGDMTAKRMS 
VPYSLIPTAF 



CVLAHLNTLR 
ESRDDTVNFQ 
DVRDAAAEFL 
QSEDFNSLSK 
ELLRSVRLAL 
GVVTSPCARP 
KEFSASAIGC 
HGSAELENCL 
RDGVSNAAVV 
PWRYTAAAVL 
CHALASGNKL 
VSTWKHLPA 



KHCMFTDVTL 
DNLHPEVLEL 
EKNLFPSNCL 
DTLLDLI SSD 
LPSDCLQEAV 
RKAGHTLLIL 
KVYVTGGRGS 
YVVGGHTSLA 
SAKLKLFVFG 
GSQIFIMGGD 
YVVGGYFGTQ 



BLASTP hits 



Entry MMU65079__1 from database TREMBL: 

gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds. 

Score = 2402, P = 1.9e-249, identities = 440/589, positives = 513/589 
Entry AF059611_1 from database TREMBLNEW: 

gene: "NRPB"; product: "nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score = 2400, P = 3.0e-249, identities = 440/589, positives = 512/589 

Entry AF010314_1 from database TREMBL: 

gene: "PIG10"; product: "PiglO"; Homo sapiens PiglO (PIG10) mRNA, 
complete cds. 

Score = 1745, P = 7.8e-180, identities = 335/507, positives = 403/507 
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Entry KELC_DROME from database SWISSPROT: 

RING CANAL PROTEIN ( KELCH PROTEIN). >TREMBL : DMRCPA_1 product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and 0RF2 
mRNA, complete cds . 

Score = 572, P = 3.9e-66, identities = 168/536, positives = 257/536 



Alert BLASTP hits for DKFZphtes3_lkll, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lkll , frame 2 



Report for DKFZphtes3_lkll . 2 



[LENGTH] 


589 


[MW] 


65923.45 


[pi] 


6.10 


[HOMOL] 


TREMBL : MMU 650 7 9 1 gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 


actin-binding protein (ENC-1) mRNA, complete cds. 0.0 


[FUNCAT] 


10.05.99 other pheromone response activities [S. cerevisiae, YHR158c] 


2e-09 




[BLOCKS] 


BL01016D Glycoprotease family proteins 


[PIRKW] 


zinc finger le-08 


[PIRKW] 


DNA binding le-08 


[PIRKW] 


transcription factor le-08 


[SUPFAM] 


POZ domain homology 3e-68 


[SUPFAM] 


vaccinia virus 59K Hindlll-C protein le-15 


[SUPFAM] 


A55R protein 5e-29 


[SUPFAM] 


hypothetical protein YHR158c 4e-08 


[SUPFAM] 


A55R protein middle region homology 5c-29 


[SUPFAM] 


myxoma virus M9-R protein le-14 


[SUPFAM] 


A55R protein carboxyl-terminal homology 5e-29 


[KW] 


Alpha_Beta 



SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 

PRD cccccccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RAVLAASSRYFEAMFSHGLRESRDDTVNFQDNLKPEVLELLLDFAYSSRIAINEENAESL 

PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFHDVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 

PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh , 

SEQ QSEDFNSLSKDTLLDLI SSDELETEDERVVFEAILQWVKHDLEPRKVHLPELLRSVRLAL 

PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIMDEALRCKTRILQNDGVVTSPCARPRKAGHTLLIL 

PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQTFMCDKIYQVDHKAKEIIPKADLPSPRKEFSASAIGCKVYVTGGRGSENGVSKDVWV 

PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPMLI ARFGHGSAELENCLYVVGGHTSLAGVFPASPSVSLKQVEKYDPG 

PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

SEQ ANKWMMVAPLRDGVSNAAVVSAKLKLFVFGGTSIHRDMVSKVQCYDPSENRWTIKAECPQ 

PRD ccce eeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ PWRYTAAAVLGSQIFIMGGDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 

PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLIPTAFVSTWKHLPA 

PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc 



(No Prosite data available for DKFZphtes3_lkll .2 ) 
(No Pfam data available for DKFZphtes3_lkll . 2 ) 
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DKFZphtes3_ln3 



group: signal transduction 

DKFZphtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein . 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a RGD site is present. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to Tuplp 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map="6q24" 
Insert length: 5277 bp 

Poly A stretch at pos . 5267, polyadenylation signal at pos. 5244 



1 GCTGCATAAA GCTGAGAGAT GCCTACAGCT GAGAGTGAAG CAAAAGTAAA 
51 AACCAAAGTT CGCTTTGAAA AATTGCTTAA GACCCACAGT GATCTAATGC 
101 GTGAAAAGAA AAAACTGAAG AAAAAACTTG TCAGGTCTGA AGAAAACATC 
151 TCACCTGACA CTATTAGAAG CAATCTTCAC TATATGAAAG AAACTACAAG 
201 TGATGATCCC GACACTATTA GAAGCAATCT TCCCCATATT AAAGAAACTA 

2 51 CAAGTGATGA TGTAAGTGCT GCTAACACTA ACAACCTGAA GAAGAGCACG 
301 AGAGTCACTA AAAACAAATT GAGGAACACA CAGTTAGCAA CTGAAAATCC 

3 51 TAATGGTGAT GCTAGTGTAG AGGAAGACAA ACAAGGAAAG CCAAATAAAA 
401 AGGTGATAAA GACGGTGCCC CAGTTGACTA CACAAGACCT GAAACCGGAA 

4 51 ACTCCTGAGA ATAAGGTTGA TTCTACACAC CAGAAAACAC ATACAAAGCC 
501 ACAGCCAGGC GTTGATCATC AGAAAAGTGA GAAGGCAAAT GAGGGAAGAG 
551 AAGAGACTGA TTTAGAAGAG GATGAAGAAT TGATGCAAGC ATATCAGTGC 
601 CATGTAACTG AAGAAATGGC AAAGGAGATT AAGAGGAAAA TAAGAAAGAA 
651 ACTGAAAGAA CAGTTGACTT ACTTTCCCTC AGATACTTTA TTCCATGATG 
701 ACAAACTAAG CAGTGAAAAA AGGAAAAAGA AAAAGGAAGT TCCAGTCTTC 
751 TCTAAAGCTG AAACAAGTAC ATTGACCATC TCTGGTGACA CAGTTGAAGG 
801 TGAACAAAAG AAAGAATCTT CAGTTAGATC AGTTTCTTCA GATTCTCATC 
851 AAGATGATGA AATAAGCTCA ATGGAACAAA GCACAGAAGA CAGCATGCAA 
901 GATGATACAA AACCTAAACC AAAAAAAACA AAAAAGAAGA CTAAAGCAGT 
951 TGCAGATAAT AATGAAGATG TTGATGGTGA TGGTGTTCAT GAAATAACAA 

1001 GCCGAGATAG CCCGGTTTAT CCCAAATGTT TGCTTGATGA TGACCTTGTC 
1051 TTGGGAGTTT ACATTCACCG AACTGATAGA CTTAAGTCAG ATTTTATGAT 
1101 TTCTCACCCA ATGGTAAAAA TTCATGTGGT TGATGAGCAT ACTGGTCAAT 
1151 ATGTCAAGAA AGATGATAGT GGACGGCCTG TTTCATCTTA CTATGAAAAA 
1201 GAGAATGTGG ATTATATTCT TCCTATTATG ACCCAGCCAT ATGATTTTAA 
1251 ACAGTTAAAA TCAAGACTTC CAGAGTGGGA AGAACAAATT GTATTTAATG 
1301 AAAATTTTCC CTATTTGCTT CGAGGCTCTG ATGAGAGTCC TAAAGTCATC 
1351 CTGTTCTTTG AGATTCTTGA TTTCTTAAGC GTGGATGAAA TTAAGAATAA 
14 01 TTCTGAGGTT CAAAACCAAG AATGTGGCTT TCGGAAAATT GCCTGGGCAT 

14 51 TTCTTAAGCT TCTGGGAGCC AATGGAAATG CAAACATCAA CTCAAAACTT 

15 01 CGCTTGCAGC TATATTACCC ACCTACTAAG CCTCGATCCC CATTAAGTGT 
1551 TGTTGAGGCA TTTGAATGGT GGTCAAAATG TCCAAGAAAT CATTACCCAT 
1601 CAACACTGTA CGTAACTGTA AGAGGACTGA AAGTTCCAGA CTGTATAAAG 
1651 CCATCTTACC GCTCTATGAT GGCTCTTCAG GAGGAAAAAG GTAAACCAGT 
1701 GCATTGTGAA CGTCACCATG AGTCAAGCTC AGTAGACACA GAACCTGGAT 
1751 TAGAAGAGTC AAAGGAAGTA ATAAAGTGGA AACGACTCCC TGGGCAGGCT 
1801 TGCCGTATCC CAAACAAACA CCTCTTCTCA CTAAATGCAG GAGAACGAGG 
1851 ATGTTTTTGT CTTGATTTCT CCCACAATGG AAGAATATTA GCAGCAGCTT 
1901 GTGCCAGCCG GGATGGATAT CCAATTATTT TATATGAAAT TCCTTCTGGA 
1951 CGTTTCATGA GAGAATTGTG TGGCCACCTC AATATCATTT ATGATCTTTC 
2001 CTGGTCAAAA GATGATCACT ACATCCTTAC TTCATCATCT GATGGCACTG 
2051 CCAGGATATG GAAAAATGAA ATAAACAATA CAAATACTTT CAGAGTTTTA 
2101 CCTCATCCTT CTTTTGTTTA CACGGCTAAA TTCCATCCAG CTGTAAGAGA 
2151 GCTAGTAGTT ACAGGATGCT ATGATTCCAT GATACGGATA TGGAAAGTTG 
2201 AGATGAGAGA AGATTCTGCC ATATTGGTCC GACAGTTTGA TGTTCACAAA 
2251 AGTTTTATCA ACTCACTTTG TTTTGATACT GAAGGTCATC ATATGTATTC 
2301 AGGAGATTGT ACAGGGGTGA TTGTTGTTTG GAATACCTAT GTCAAGATTA 
2351 ATGATTTGGA ACATTCAGTG CACCACTGGA CTATAAATAA GGAAATTAAA 
2401 GAAACTGAGT TTAAGGGAAT TCCAATAAGT TATTTGGAGA TTCATCCCAA 
24 51 TGGAAAACGT TTGTTAATCC ATACCAAAGA CAGTACTTTG AGAATTATGG 
2501 ATCTCCGGAT ATTAGTAGCA AGGAAGTTTG TAGGAGCAGC AAATTATCGG 
2551 GAGAAGATTC ATAGTACTTT GACTCCATGT GGGACTTTTC TGTTTGCTGG 
2601 AAGTGAGGAT GGTATAGTGT ATGTTTGGAA CCCAGAAACA GGAGAACAAG 
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2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4 601 
4651 
4701 
4751 
4801 
4851 
4901 
4951 
5001 
5051 
5101 
5151 
5201 
5251 



TAGCCATGTA 
TATCATCCAT 
GCCAATTCTT 
AAATGTTCAA 
AGTCAAGATG 
TCAGATTGAT 
TAGTAAAACA 
GCAAAAGTCA 
ACAACAGTCT 
TACATCAGTT 
CCTTGTAACC 
CTACACAGCG 
TCCGAGTGTT 
AAGGGACAGG 
GTATCAAGAA 
CTGAGGAAAA 
ATCAATAAGA 
ACATTCTGAA 
TGGATACACG 
ATAGAGTAAA 
TGAAAATGAC 
ATACTAAGGA 
CTAGAAAAAT 
AGTTCAGTTG 
CTTATATTGT 
TAATGAAGAA 
TTTGCGTTAT 
GTAATGGTTG 
TGAAATTCAC 
ATGCCTGTAA 
GGTCAGGAGT 
CTACTAAAAA 
CCATCTACTG 
TGAGCCAAGA 
CTATCTCCAA 
AAAACAGTCT 
TTAAAATTTT 
CTAAATTAAA 
TTTACTTTTT 
GACTTACATT 
TATTGCATTA 
TAGACTATAT 
TTCTTTTCTT 
TATTTAACCC 
CAAATTAGAC 
GGGCATTGAG 
TTCTAATAAA 
AAATTGGTAA 
TGAAAGTGTT 
CAAAAGTCTG 
TCTGTGTGTC 
AAGCTTGTTT 
AATATCTGTT 



TTCTGACTTG 
TTGAAAATAT 
CTGTATATTT 
ACGCTACAAT 
CCCTATGTAC 
GAATTTGTCC 
GAGGCTTGAA 
ACAAAAATCT 
AAGTTAAAGC 
TGGTTTCACT 
ATCAGGTAGA 
AATCGATCAG 
TTTCAAAGAT 
AAGGTTATTT 
CTGCCTCCTG 
AACTAAAATA 
ACAAGTCCCA 
ATGAGAAAAG 
GATGAGGAAG 
GAATTGAAGA 
AAACCAAATG 
GGAAGAAAGG 
CAGAATCAAG 
TTATAAACCA 
TAGTAATTGC 
AACACTGTAA 
TAGGATGTCT 
TATGTGTGAT 
TTTATTTAAA 
TCCTAGCACC 
TCAACAACAC 
TACAAAAATC 
AGGCAGGAGA 
TCACGCCATT 
AAAACAAAAA 
CAGTAACAAA 
GTGTTTCTTA 
AGTGATTTTT 
AAAAAAGGCT 
GTTTAATATT 
TTTATTTTTT 
GTTTTGAAGT 
GACTCCTTTC 
CAAGAAAGTG 
AGTCAATTCC 
GTGTAAATTT 
AATATATTTG 
ACATGTCCCG 
GAGTGGCACT 
TTCTGATGGC 
AGGTACAGCT 
TTTTCTGTCT 
TCTCTGCAAA 



CCATTCAAGT 
GGTTGCATTC 
ACGATTTCCA 
GGAACATTTC 
CTGTCCAAAA 
ACACTGAAAG 
ACTGTCACAG 
CTCATTTACT 
AGTCAAACAT 
CAGACCGGGA 
TACAGCACCA 
ATGAACTAAC 
AATGAAGACT 
TCCAGCTAAT 
AGATAAAGGA 
GAAAAATCTC 
GGACTTCAGA 
AACAGAGCCA 
AACAAGCAAG 
AAAGTTAAGA 
GAATTTCTCT 
ATCCACTACT 
TTGTGGGTGG 
TTGTGACTAT 
ATCATAATTA 
TTGCTACTCA 
GTTAAGTAAT 
GCTATGCCCA 
AGATAAGCAG 
TTGGGAGGCT 
CAGCCTGACC 
AGCCGGGTCT 
ATTGCTTGAC 
GCACTCCAGC 
AGATAAGCAG 
GACATTAAAA 
AGATCAAATC 
GGCTGGACTG 
TTTCATATTT 
TATTTTAATC 
CTAAGTTCCA 
TTGATATTAT 
TCAAGTGTGT 
AAAACTAATA 
ATTAAAATAA 
TGCCCAGATG 
GCTCTTATCC 
CTGTGTAGAA 
GATAACTGGT 
ACTGAGTTTT 
GGGCCCAGCC 
TGTGAATGCA 
AAAAAAA 



CACCCATTCG 
TGTGCATTTG 
TGTTGCCCAG 
CATTACCTGG 
CTACCCCATC 
TTCTTCAACG 
AGGTGATACG 
TCACCACCAG 
GCTGACCGCT 
TTATCAGCAT 
ACGGTAGTGG 
CATCCATCGC 
GGTGGTATGG 
CATGTGGCTA 
GCGATCCCCT 
CAGCTCCTCA 
CTAGGCTCAG 
TGAGGACCAA 
CAGGCAGAAA 
GCTGCCGAAA 
TCAGAGTTCA 
TCTTGTTCTT 
AAAAATCAAC 
TGTTGGTCAA 
CATTACCAGT 
GCAAATGTGA 
CATTTAATAT 
GAATATGAAG 
CTGACTGGGC 
GAGGCAGGTG 
AACATGGTGA 
CATGGCAGGC 
CCAGGAGGCA 
CTGGGGGACA 
CTTTAGAATA 
GAAAACAATT 
ATATAGGTAA 
GCAACAATGT 
AAGCACATAC 
TTAATATTTT 
GAATAATAGT 
AATGGGATAT 
GATAAGGTCT 
TAAAATTAGA 
GAAGTGAGAA 
TAT ACCCAGT 
CTGCACATGT 
CTTTAAAAAA 
GAAGCCTACA 
CATTGTTCTG 
AGCTTGAGTC 
CTTGATAATT 



AGACATTTCT 
GGCAAAATGA 
CAGGAGGCTG 
AATACACCAA 
AAGGCTCTTT 
AAGATGCAGC 
TTCCTGTGCT 
CAGTTTCCTC 
CAAGAGATTC 
AGAAAGAAAG 
CTCTTTATGA 
GGAGACATTA 
CAGCATAGGA 
GTGAAACACT 
CCTTTAAGCC 
AAAGCAATCA 
AATCTATGAC 
GGACACATAA 
AGTCACTCTA 
TGCACAGAGG 
GAATTTTCAG 
ATGAATGACT 
GTGGCCTTTG 
AGTATTGGTA 
GTTGGAAAAC 
ATAAAAGGTG 
TATTATATTG 
TATCTGTTTT 
ACGGTGCCTC 
GATCACCTAA 
AACCCCATCT 
ACCTGTAATC 
GAGGTTGCAG 
GAGCAAGACT 
TGGCGCATTC 
TACTTTCTAA 
CTTCATAGAC 
TCCCAATGTC 
CTATTTTGTA 
TACATTATTA 
GTCATTATTA 
TCATTTTTTG 
GCTGATAAAA 
AAGACCTATC 
AAACAATGTT 
GTGAAATATC 
AGAGGCATAA 
AAGGCATTTT 
GCCATCCGCC 
GATGTATAAG 
ACTCTTGTAC 
TAAAAATAAA 



BLAST Results 



Entry HS32B1 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 32B1 
Score = 4445, P = 0.0e+00, identities = 889/889 

Entry U93816 from database EMBL : 

Human exon-trapped sequence from 6q24. 

Score = 965, P = 4.0e-35, identities = 193/193 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



MPTAESEAKV 
SNLHYMKETT 
LRNTQLATEN 
DSTHQKTHTK 
AKEIKRKIRK 
TLTISGDTVE 
PKKTKKKTKA 
RTDRLKSDFM 
LPIMTQPYDF 
DFLSVDEIKN 
PPTKPRSPLS 
MALQEEKGKP 
HLFSLNAGER 
CGHLNIIYDL 
YTAKFHPAVR 
CFDTEGHHMY 
IPISYLEIHP 
LTPCGTFLFA 
MVAFCAFGQN 
TCPKLPHQGS 
LSFTSPPAVS 
DTAPTVVALY 
FPANHVASET 
QDFRLGSESM 



KTKVRFEKLL 
SDDPDTIRSN 
PNGDASVEED 
PQPGVDHQKS 
KLKEQLTYFP 
GEQKKESSVR 
VADNNEDVDG 
ISHPMVKIHV 
KQLKSRLPEW 
NSEVQNQECG 
VVEAFEWWSK 
VHCERHHESS 
GCFCLDFSHN 
SWSKDDHYIL 
ELVVTGCYDS 
SGDCTGVIVV 
NGKRLLIHTK 
GSEDGIVYVW 
EPILLYI YDF 
FQIDEFVHTE 
SQQSKLKQSN 
DYTANRSDEL 
LYQELPPEIK 
THSEMRKEQS 



KTHSDLMREK 
LPHIKETTSD 
KQGKPNKKVI 
EKANEGREET 
SDTLFHDDKL 
SVSSDSHQDD 
DGVHEITSRD 
VDEHTGQYVK 
EEQIVFNENF 
FRKIAWAFLK 
CPRNHYPSTL 
SVDTEPGLEE 
GRILAAACAS 
TSSSDGTARI 
MIRIWKVEMR 
WNTYVKINDL 
DSTLRIMDLR 
NPETGEQVAM 
HVAQQEAEMF 
SSSTKMQLVK 
MLTAQEILHQ 
TIHRGDIIRV 
ERSPPLSPEE 
HEDQGHIMDT 



KKLKKKLVRS 
DVSAANTNNL 
KTVPQLTTQD 
DLEEDEELMQ 
SSEKRKKKKE 
EISSMEQSTE 
SPVYPKCLLD 
KDDSGRPVSS 
PYLLRGSDES 
LLGANGNANI 
YVTVRGLKVP 
SKEVIKWKRL 
RDGYPIILYE 
WKNEINNTNT 
EDSAILVRQF 
EHSVHHWTIN 
ILVARKFVGA 
YSDLPFKSPI 
KRYNGTFPLP 
QRLETVTEVI 
FGFTQTGIIS 
FFKDNEDWWY 
KTKIEKSPAP 
RMRKNKQAGR 



EENISPDTIR 
KKSTRVTKNK 
LKPETPENKV 
AYQCHVTEEM 
VPVFSKAETS 
DSMQDDTKPK 
DDiiVLGVYIH 
YYEKENVDYI 
PKVILFFEIL 
NSKLRLQLYY 
DCIKPSYRSM 
PGQACRIPNK 
I PSGRFMREL 
FRVLPHPSFV 
DVHKSFINSL 
KEIKETEFKG 
ANYREKIHST 
RDISYHPFEN 
GIHQSQDALC 
RSCAAKVNKN 
IERKPCNHQV 
GSIGKGQEGY 
QKQSINKNKS 
KVTLIE 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_ln3 , frame 1 

TREMBL:U92792_1 gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
pombe general transcriptional repressor Tupl (tupl) mRNA, complete 
cds., N = 1, Score = 186, P = le-10 

TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N = 1, Score = 235, P = 4.6e-18 

TREMBL:SPAC3H5_8 gene: "SPAC3H5 . 08c" ; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5., N = 2, Score = 231, P = 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N = 
2, Score = 228, P = le-13 

TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N =» 1, Score = 235, P - 4.6e-18 

TREMBL:SPAC3H5_8 gene: "SPAC3H5 . 08c" ; product: "beta-transducin"; 
S. pombe chromosome I cosmid c3H5., N = 2, Score = 231, P = 2e-14 

TREMBL : CER03E1_1 gene: "R03E1.1"; Caenorhabditis elegans cosmid R03E1, 
N = 1 , Score = 215, P = 2.3e-13 

SWISSPROT: YZLL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 

PROTEIN K04G11.4 IN CHROMOSOME X., N = 1, Score = 203, P = 7. le-13 



>TREMBL:AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733) mRNA, complete cds. 
Length = 321 

HSPs: 

Score = 235 (35.3 bits), Expect = 4.6e-18, P = 4.6e-18 
Identities = 59/225 (26%), Positives = 111/225 (49%) 

Query: 647 M RELCGHLN I I YDLSWSKDDHYI LTSSSDGTARI WKNEINNTNT FRVLPHPSFV YTAKFH 706 

+ E GH + I DLSWSK+ +L++S D T R+W ++ + +V H ++V +F+ 
Sbjct: 63 VHEFYGHGDAILDLSWSKNGD-LLSASMDKTVRLW — QVGRDSCLKVFSHTNYVTCVQFN 119 

Query: 707 PAVRELVVTGCYDSMIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTG 766 

P +TGC D ++RIW V LV + K + ++C+ +G +G TG 

Sbjct: 120 PTNGNYFITGCIDGLVRIWDVRK CLVVDWANSKEIVTAVCYRPDGKGAVAGTITG 174 

Query: 767 VIVVWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEIHPNGKRLLIHTKDSTLRI 826 
++ +LE V ++N K + + Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV SLNGRKKSLHKRIVGFQYCPSDP— KKLMVTSGDAQVRI 229 

Query: 827 MDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGIVYVWN 871 

+D +++ +G+ +++TPG++ S+D +Y+WN 
Sbjct: 230 LDGAHVISN-YKGLQS-SSQVARSFTPDGDHIVSASDDSRIYMWN 272 



Pedant information for DKFZphtes3_ln3, frame 1 



Report for DKF2phtes3_ln3 . 1 



[LENGTH] 
[MW] 
[pi] 
[HONOL] 
C14B1.4 
[FUNCAT] 
[FUNCAT] 
TAF90 - 
[ FUNCAT ] 
4e-10 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 

9e-08 

[FUNCAT] 

YDL145C] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

YMR116C] 

[FUNCAT] 

[FUNCAT] 

4e-05 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[EC] 

[EC] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



1196 
137114.70 
6.79 

SWISSPROT : YKY4 CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 

IN CHROMOSOME III. Be-21 

99 unclassified proteins [S. cerevisiae, 

04.05.01.01 general transcription activities 
TFIID subunit] 4e-10 

30.10 nuclear organization [S. cerevisiae, 



YKL121w] 2e-ll 

[S. cerevisiae, YBR198c 



YBR198C TAF90 - TFIID subunit] 

06.10 assembly of protein complexes [S. cerevisiae, YPR178w] le-08 

04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] le-08 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR364c] 4e-08 

03.16 dna synthesis and replication [S. cerevisiae, YDR364c] 4e-08 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL145c] 



9e-08 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



4e-06 



04.05.01.04 transcriptional control IS. cerevisiae, YCR084c] 2e-07 
10.99 other signal-transduction activities [S. cerevisiae, YHL002w] 7e-07 
98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 
02.16 fermentation [S. cerevisiae, YMR1 1 6c ] 4e-06 

30.03 organization of cytoplasm [S. cerevisiae, YMR116c) 4e-06 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



03.10 sporulation and germination [S. cerevisiae, 
03.04 budding, cell polarity and filament formation 



YFL009w] 4e-05 
;S. cerevisiae, YFL009w] 



30.04 organization of cytoskeleton IS. cerevisiae, YFL009w] 4e-05 



03.01 cell growth 
03.25 cytokinesis 
BL00024H 
dltbgd_ 2.4 6 

dlgfc 2.21 

dlfmk_l 2.21 
dlad5bl 2.21 
dllckal 2.21 
dlqwea_ 2 .21 
2.21 
2.21 
2.21 
2.21 
2.21 



[S. cerevisiae, YCR088w] 6e-05 
[S. cerevisiae, YCR057c] 7e-05 



dlshg_ 
dlprmc_ 

dlhsq 

dlaboa_ 
dlef na_ 
dlsema_ 
dlgbqa_ 
dlckaa 



l.£ 
1.7 
1. 
1. 
1. 



.1.1 betal-subunit of the signal-transducing 3e-91 
.1.9 Growth factor receptor-bound protein 2 (GRB2), N 4e-14 
(1-64) c-src tyrosine kinase [human (Horn 5e-15 
(1-63) Hemapoetic cell kinase Hck [human (Horn 3e-15 
.16 (1-54) p56-lck tyrosine kinase, SH3 domain [huma le-13 
.15 Src kinase, SH3 domain [Avian sarcoma virus 2e-15 
.6 alpha-Spectrin, SH3 domain [chicken (Gallu 2e-13 
.1.13 Src kinase, SH3 domain [chicken (Gallus gallus) 2e-15 
.1.12 Phospholipase C, SH3 domain [human (Horn 2e-13 
.1.3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 
.1.2 Fyn, SH3 domain [human (Homo sapiens) 2e-15 
2.21.2.1.11 Growth factor receptor-bound protein 2 (GRB2) , N le-13 
2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2), N 3e-16 
2.21.2.1.1 C-Crk, N-terrainal SH3 domain [mouse (Mu 3e-15 
3.1.4.3 Phospholipase C 2e-07 

3.1.4.11 l-Phosphatidylinositol-4, 5-bisphosphate phosphodiesterase 7e-07 

3.6.1.32 Myosin ATPase 7e-07 

2.7.1.112 Protein-tyrosine kinase 8e-06 

nucleus 2e-08 

phosphotransferase 8e-06 

plasma 4e-07 

duplication 4e-07 

phosphoric diester hydrolase 2e-07 
tandem repeat 7e-07 
hormone 4e-07 

transmembrane protein 2e-06 
stomach 4e-07 
actin binding 7e-07 
ATP 7e-07 

phosphoprotein 7e-07 

signal transduction 7e-09 

heterotrimer 7e-09 

P-loop 7e-07 

hydrolase 7e-07 

transcription regulation 5e-06 

GTP binding 7e-09 
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[SUPFAM] l-phosphatidylinositol-4,5-bisphosphate phosphodiesterase II 2e-07 

[SUPFAM] SH3 homology 2e-07 

[SUPFAM] SH2 homology 2e-07 

[SUPFAM] protozoan myosin heavy chain IB 7e-07 

[SUPFAM] myosin motor domain homology 7e-07 

[SUPFAM] pleckstrin repeat homology '2e-07 

[SUPFAM] protein-tyrosine kinase src 8e-06 

[SUPFAM] WD repeat homology 3e-12 

[SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain Y homology 2 
07 

[SUPFAM] protein kinase homology 8e-06 

[ SUPFAM] l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain X homology 2 
07 

[SUPFAM] GTP-binding regulatory protein beta chain 7e-09 

[SUPFAM] yeast coatomer complex alpha chain 4e-07 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL 6 

[PROSITE] AMIDATION 2 

[PROSITE] CAMP_PHOSPHO_SITE 4 

[PROSITE] CK2_PH0SPHO_SITE 25 

[PROSITE] TYR_PHOSPHO_SITE 4 

[PROSITE] PKC_PH0SPHO_SITE 19 

[PROSITE] ASN_GLYCOSYLATION 6 

[ PFAM] Src homology domain 3 

[PFAM] WD domain, G-beta repeats 

[KW] Irregular 

[KW] 3D 

[ KW] LOW_COMPLEXITY 5.77 % 

[ KW] COILED_COIL 2.42 % 



SEQ 
SEG 
COILS 
IgotB 



MPTAESEAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENISPDTIRSNLHYMKETT 
xxxxxxxx 

ccccccccccccccccccccccccccccc 



SEQ 
SEG 
COILS 
IgotB 



SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 



SEQ 
SEG 
COILS 
IgotB 



KQGKPNKKVIKTVPQLTTQDLKPETPENKVDSTKQKTHTKPQPGVDHQKSEKANEGREET 

XXX 



SEQ 

SEG 

COILS 

IgotB 



DLEEDEELMQAYQCHVTEEMAKEIKRKIRKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 

xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 



SEQ 
SEG 
COILS 
IgotB 



VPVFSKAETSTLTISGDTVEGEQKKESSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 
xxxxxxxxxx xxxx 



SEQ 
SEG 
COILS 
IgotB 



PKKTKKKTKAVADNNEDVDGDGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 
xxxxxxxxx 



SEQ 
SEG 
COILS 
IgotB 



ISHPMVKIHVVDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 



SEQ 
SEG 
COILS 
IgotB 



EEQIVFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNN3EVQNQECGFRKIAWAFLK 



SEQ 
SEG 
COILS 
IgotB 



LLGANGNANINSKLRLQLYYPPTKPRSPLSVVEAFEWWSKCPRNHYPSTLYVTVRGLKVP 



SEQ 
SEG 
COILS 
IgotB 



DCIKPSYRSMMALQEEKGKPVHCERHHESSSVDTEPGLEESKEVIKWKRLPGQACRIPNK 
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SEQ HLFSLNAGERGCFCLDFSHNGRILAAACASRDGYPIILYEIPSGRFMRELCGHLNIIYDL 

SEG 

COILS 

IgotB CEEEEEECCCCCEEEE 

SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELVVTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT — TTCEEEEEETTTCEEEEEETTT-TCEEEEEETTT 

SEQ MIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVIVVWNTYVKINDL 

SEG 

COILS 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEEEEEE 

SEQ EHSVHHWTINKEIKETEFKGI PIS YLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGIVYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

SEQ MVAFCAFGQNEPILLYIYDFHVAQQEAEMFKRYNGTFPLPGIHQSQDALCTCPKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESSSTKMQLVKQRLETVTEVIRSCAAKVNKNLSFTSPPAVSSQQSKLKQSN 

SEG 

COILS 

IgotB 

SEQ MLTAQEILHQFGFTQTGI I SI ERKPCNHQVDTAPTVVALYDYTANRSDELTIHRGDIIRV 

SEG 

COILS 

IgotB 

SEQ FFKDNEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKIEKSPAP 

SEG 

COILS 

IgotB 

SEQ QKQSINKNKSQDFRLGSESMTHSEMRKEQSHEDQGHIMDTRMRKNKQAGRKVTLIE 

SEG 

COILS 

IgotB 



Prosite for DKFZphtes3_ln3 . 1 
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PS00006 


1170->1174 


CK2 _ PHOSPHO" 


"site 


PDOC00006 


PS00007 


1083->1091 


TYR PHOSPHO 


SITE 


PDOC00007 
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PS00008 
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MYRISTYL 




PDOC00008 
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AMIDATION 




PDOC00009 


PS00016 


1074->1077 


RGD 




PDOC00016 



Pfam for DKFZphtes3_ln3 . 1 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD- 

+ GH+N ++++++S D ++ I+++S DGT R+W 
Query 650 LCGHLNIIYDLSWSKDDHY-ILTSSSDGTARIWK 682 



HMM_NAME 

HMM 

Query 

HMM 

Query 



Src homology domain 3 

*pyVIALYDYqAqdpDELSFkEGDIIiIIEdsDD.WWrgRnnnTNGQEGW 
P+V+ALYDY+A+++DEL++ +GDII + + +++ WW+G GQEG+ 
105 4 PTVVALYDYTANRSDELTIHRGDIIRVFFKDNEDWWYGSIGK — GQEGY 



1100 



IPSNYVEPi* 
+P+N V+ + 
1101 FPANHVASE 



1109 
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DKFZphtes3_20c21 



group: testes derived 

DKFZphtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /map="22qll.2-12.2" 
Insert length: 3991 bp 

Poly A stretch at pos. 3877, polyadenylation signal at pos. 3853 

1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 

101 AGTCCCGAAC CCCGAGCACT GGGATGCCTG GCTACTCCGA GCCAAGGCAC 

151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 

201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 

251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 

301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 

351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 

401 AAATATACAT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 

451 ATGCTGAGTG AAAACTAACC AAGTGGCAGT TGTGACTGAA AACACTGAAA 

501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 

551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 

601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 

651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 

701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 

751 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 

801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 

851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 

901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 

951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
1251 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGAIGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
1751 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC CTTCCTACAT GTGCCTGTTC 
1801 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CTCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTG CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG CAGTTCAGGC 
2151 TCAGCAAACT GTCAGGGTGC TGGCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2401 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
2451 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGCAAAGCA GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
2751 CCCAGGAGGT GACTGGGAAG GAGAAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 



TTTAGGAACC 
GATGTGTTTT 
ACTTGATAAA 
TGGCTCCCTC 
TTCCATCCCA 
CTAGAACCTG 
CCAGGTGGTG 
GCAGGCCGAC 
TGCTGGGAGG 
CACTTGGTTG 
TTGGACCCAC 
TTTAGAGATC 
TGACAACAGG 
CTGGTTCCTC 
GGGGTAAAAG 
ACTCAGCTGT 
TTACGTTATA 
GAAAATATTT 
TTCCATTTGA 
CTGATGTTTA 
AGAAATAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



AAATGATATT 
GGGGGCAGGG 
GAACTGTATT 
TCTGCCATAC 
GCTTGAATTG 
ATCGTCCACT 
GTAGGCGGTG 
TCCACTCCCA 
TCCGGATCGT 
AATTCTGTTG 
AATGGGGGCA 
CCTTTATAAA 
ACCAACCTGC 
TCGAGCGAGT 
CACTGTGCTT 
GTGTTCCTGG 
GTCAGACATT 
GTCAAAATCT 
GAGTTGTATT 
TGATATGGTG 
TAGCCAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



TGAGTTTTTG 
GTTAGTTCTT 
TAATCGGTAG 
TGAGCCTGAG 
GTGCCACAAG 
AGCCCAGAGT 
TGACTGCACA 
CGCCGCAGGT 
TCCTGCAGGG 
GAACTCTACT 
AGCCTTAATA 
AGCTCTGGGG 
GCTGCCTTTG 
GTCCCTAAAT 
TTCAGTGGTG 
GCTTGTGTGG 
TTTTTGACAG 
TAACTGAATG 
GTTAATAATT 
TCTTTTTCTT 
ATGCTGGAAA 
AAAAAAAAAA 
AAAAAAAAAA 



TTATTCCTTT 
CAGGTCGGCA 
TGTTGGGGCC 
GTATTTCATA 
CTTCCAAGTT 
GTGTGTGTTC 
GCGAGGTGCC 
AGGTTTCTCC 
AAGCGGCAGC 
CAAATCTAGG 
ATATGGAAGG 
GCTGAGCCCT 
ACTACAAGTG 
AGGAGTTTAC 
GCTGCGTGAA 
TACTTAGAAC 
TATGAGACAG 
TTTACTGGAA 
TCATGTCAGT 
GAAACAAGCT 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



TGCAGATTGG 
GACCCAGAGC 
GGGACGGGCT 
TCTCCTGCTG 
GGCATTTTTT 
AACCCCCACA 
GGATCTGTGA 
AGTGCGCTCT 
ACACGGAGAC 
GGCGTCTTCT 
GAGTTTGGGC 
GAGAATTCAG 
GGCCGTGCAG 
AAGATGTCTG 
AGGGAGCGAC 
CTCAGTTCTA 
ACTGCAGGAT 
GTACTTGAGA 
GAACTGATAT 
TCCAAGGGCT 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAA 



BLAST Results 



Entry HS1048E9 from database EMBLNEW : 

Human DNA sequence from clone 1048E9 on chromosome 22qll.2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score = 6540, P = 0.0e+00, identities = 1308/1308 
-14 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 

1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 

51 ELLCGQIAGV VRCVSDISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 

101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 

151 SDLHKIFNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 

201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 

251 TKEEAI SLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 

301 VESMAWTTPD PTSPDEACPD GRKENGCLSG HDLESIRPAG LHNSARGEVL 

351 GLSSSLGKEL VFLQEELDLS EIHI PEAQEV EMASGHFAFL HVPVPDGRAP 

401 YCKASLSASS SLEPTPPEDT AISSLRPPSA PEMLTQHGAQ EQVEDHPGHS 

451 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 

501 HAAPGLECSS GSANCQGAGP SADGISSRLT PAESCMGLVR MNLYTHCVKG 

551 LMLSLLAEEP LLGDSAAIEE VYHSSLASLN GLEVHLKETL PRDEAASTSS 

601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLMHSEF AQLPALYEMT 

651 VRNASTAVYA CCNPIQETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20c21, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c21, frame 3 

Report for DKFZphtes3 20c21.3 
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[LENGTH] 708 

[MW] 76900.23 

[pi] 5.30 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.36 % 

SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTRAGICYFYPSQTLLDQQELLCGQIAGV 

SEG . xxxxxxxxxxxx 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCVSDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeeeccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIEQILKNTSDLHKIFNSLKMLDQTKVEPLLLLKAARIL 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

SEQ PNVQIIPVFVTKEEAISLHEFPVEQMTRSLASPAGLQDGSAQHHPKGGSTSALKENATGH 

SEG 

PRD ccceeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGRKENGCLSGHDLESIRPAGLHNSAP.GEVLGLSSSLGKEL 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccchhh 

SEQ VFLQEELDLSEIHIPEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AISSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPIPRADPLPRRTRRPLLLPRLDPGQRG 

SEG xxxxxxxxxxxxxxxxxxxxx . . . . 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLGDSAAIEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

SEQ TYNFTYYDRIQSLLMANLPQVATPHDRRFLQAVSLMHSES'AQLPALYEMTVRNASTAVYA 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhccceeeee 

SEQ CCNPIQETYFQQLAPAARSSGFPNPQDGAFSLSGKAKQKI.LKHGVNLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc 

(No Prosite data available for DKFZphtes3_20c21 . 3) 
(No Pfam data available for DKFZphtes3_20c21 . 3) 
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DKFZphtes3_20k2 



group: signal transduction 

DKFZphtes3_20k2 encodes a novel 839 amino acid protein with strong similarity to rat vanilloid 
receptor subtype 1 . 

VR1 seems to play an important role in the activation and sensitization of nociceptors. It is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VR1 . 

The new protein can find application as a target for the development of new nociception- 
modulating drugs. 



strong similarity to rat vanilloid receptor subtype 1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4187 bp 

Poly A stretch at pos . 4154, polyadenylation signal at pos . 4135 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 



GGCTCAGGCA 
AGCAGTCGTA 
GTGATGGAGA 
CCCCCACGTC 
CAGATCGCCC 
GCCACAGAGG 
GGCAGCTGCG 
GAGACCCTAA 
AAGAGCCGCA 
GGTGGATTGC 
CAGTCAGCCC 
GCCAGGCTGC 
CAGGCTCTAT 
GCCAGGATCT 
CTCACAGACA 
GAAAGCCATG 
TCCTGGAGAT 
AGCTACACGG 
CGAGAGACGC 
ACGTCCAGGC 
CCTGGATTCT 
CCAGCTGGGC 
ACATCAGCGC 
GAGGTGGCCG 
CAATGAGATT 
AGGAGCTCAC 
ACCGGGAAGA 
GCCCGAGTGC 
CCGTGCACTC 
AACTCGGTGC 
CCACGACATG 
GGGACAGATT 
CTGTACATGA 
CTTGCCTCCC 
GAGAGATCCT 
CAGTATTTCC 
CTACAGTGAG 
TGGTGCTGTA 
TCCCTGGCCT 
GCAGATGGGC 
TGTGCCGTTT 
GCGGTGGTGA 
GTCCACGTCG 
CCTACAACAG 
GGCATGGGCG 
CATCATCCTG 
ACATGCTCAT 
AGCAAGAACA 
GAAGAGCTTC 
TGCAGGTGGG 
AGGGTGGACG 
CAACGAAGAC 
CCCTGCGGTC 
GTCCCCCTTT 



GGCCTGGCCC 
TTCTCTCTCT 
GTCTCTGCCG 
CAGGCCGTAG 
GTCCTGGTAT 
ATCCAGCAAG 
GACCCACTCC 
CTCCAGGCCA 
CCCGGCTCTT 
CCTCACGAGG 
TGTTATCACC 
TGTCCCAGGA 
GATCGCAGGA 
GGAGAGCCTG 
ACGAGTTCAA 
CTCAACCTGC 
CGCGCGGCAA 
ACAGCTACTA 
AACATGGCCC 
TGCGGCCCAT 
ACTTCGGTGA 
ATCGTGAAGT 
CAGGGACTCG 
ACAACACGGC 
CTGATCCTGG 
CAACAAGAAG 
TCGGGGTCTT 
AGGCACCTGT 
CTCGCTGTAC 
TGGAGGTGAT 
CTCTTGGTGG 
CGTCAAGCGC 
TCATCTTCAC 
TTTAAGATGG 
GTCTGTGTTA 
TGCAGAGGCG 
ATGCTTTTCT 
CTTCAGCCAC 
TGGGCTGGAC 
ATCTATGCCG 
CATGTTTGTC 
CGCTGATTGA 
CACAGGTGGC 
CCTGTACTCC 
ACCTGGAGTT 
CTGCTGGCCT 
CGCCCTCATG 
TCTGGAAGCT 
CTTAAGTGCA 
GTACACACCT 
AGGTGAACTG 
CCGGGCAACT 
AAGCAGAGTT 
TAAGAGAGGC 



AGAGTCACGC 
CTCTCTCTCT 
TGCCATCTGG 
ATGCTCCCCG 
CACAGTGCTT 
GAT G AAG AAA 
AAAAGGACAC 
CCTCCAGCCA 
TGGGAAGGGT 
AAGGTGAGCT 
ATCCAGAGGC 
CTCTGTCGCC 
GTATCTTTGA 
CTGCTCTTCC 
AGACCCTGAG 
ATGACGGACA 
ACGGACAGCC 
CAAGGGCCAG 
TGGTGACCCT 
GGGGACTTCT 
ACTGCCCCTG 
TCCTGCTGCA 
GTGGGCAACA 
CGACAACACG 
GGGCCAAACT 
GGAATGACGC 
GGCCTATATT 
CCAGGAAGTT 
GACCTGTCCT 
CGCCTACAGC 
AGCCGCTGAA 
ATCTTCTACT 
CATGGCTGCC 
AAAAAATTGG 
GGAGGAGTCT 
GCCGTCGATG 
TTCTGCAGTC 
CTCAAGGAGT 
CAACATGCTC 
TCATGATAGA 
TACATCGTCT 
AGACGGGAAG 
GGGGGCCTGC 
ACCTGCCTGG 
CACTGAGAAC 
ATGTAATTCT 
GGTGAGACTG 
GCAGAGAGCC 
TGAGGAAGGC 
GATGGCAAGG 
GACCACCTGG 
GTGAGGGCGT 
TCAGGCAGAC 
AAGTGCTCGA 



TGGCAACCAC 
CTCTCAGTAT 
GATGCAAACC 
CCGGTCAGTC 
CTGTTCAGGT 
TGGAGCAGCA 
CTGCCCAGAC 
AGCCCCAGCT 
GACTCGGAGG 
GGACTCCTGC 
CAGGAGACGG 
GCCAGCACCG 
AGCCGTTGCT 
TGCAGAAGAG 
ACAGGGAAGA 
GAACACCACC 
TGAAGGAGCT 
ACAGCACTGC 
CCTGGTGGAG 
TTAAGAAAAC 
TCCCTGGCCG 
GAACTCCTGG 
CGGTGCTGCA 
AAGTTTGTGA 
GCACCCGACG 
CGCTGGCTCT 
CTCCAGCGGG 
CACCGAGTGG 
GCATCGACAC 
AGCAGCGAGA 
CCGACTCCTG 
TCAACTTCCT 
TACTACAGGC 
AGACTATTTC 
ACTTCTTTTT 
AAGACCCTGT 
ACTGTTCATG 
ATGTGGCTTC 
TACTACACCC 
GAAGATGATC 
TCTTGTTCGG 
AATGACTCCC 
CTGCAGGCCC 
AGCTGTTCAA 
TATGACTTCA 
CACCTACATC 
TCAACAAGAT 
ATCACCATCC 
CTTCCGCTCA 
ACGACTACCG 
AACACCAACG 
CAAGCGCACC 
ACTGGAAGAA 
GATAGGCAGT 



GAGTTTGGGA 
CCATGACAGT 
GTCCCTGTGT 
ACTTAGTCGT 
TGCACACTGG 
CAGACTTGGG 
CCCCTGGATG 
CTCCACGGCC 
AGGCTTTCCC 
CCGACCATCA 
CCCCACCGGT 
AGAAGACCCT 
CAGAATAACT 
CAAGAAGCAC 

CCTGTCTGCT 
ATCCCCCTGC 
TGTCAACGCC 
ACATCGCCAT 
AACGGAGCAG 
CAAAGGGCGG 
CGTGCACCAA 
CAGACGGCCG 
CGCCCTGGTG 
CGAGCATGTA 
CTGAAGCTGG 
GGCAGCTGGG 
AGATCCAGGA 
GCCTACGGGC 
CTGCGAGAAG 
CCCCTAATCG 
CAGGACAAGT 
GGTCTACTGC 
CCGTGGATGG 
CGAGTTACTG 
CCGAGGGATT 
TTGTGGACAG 
CTGGCCACCG 
CATGGTATTC 
GCGGTTTCCA 
CTGAGAGACC 
GTTTTCCACA 
TGCCGTCTGA 
CCCGATAGCT 
GTTCACCATC 
AGGCTGTCTT 
CTCCTGCTCA 
CGCACAGGAG 
TGGACACGGA 
GGCAAGCTGC 
GTGGTGCTTC 
TGGGCATCAT 
CTGAGCTTCT 
CTTTGCCCTG 
CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 
2 7 51 CTGAGGTCTT CAAGAGTCCT GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 
2 801 GCAGACAGCA CTGTCAACAC TGGGCCTTAG GAGACCCCGT TGCCACGGGG 
2 851 GGCTGCTGAG GGAACACCAG TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 
2901 CCTGCCCAGC ATGTTCCCAA ATCTGTGCTG GACAAGCTGT GGGAAGCGTT 
2951 CTTGGAAGCA TGGGGAGTGA TGTACATCCA ACCGTCACTG TCCCCAAGTG 
3001 AATCTCCTAA CAGACTTTCA GGTTTTTACT CACTTTACTA AACAGTTTGG 
30 51 ATGGTCAGTC TCTACTGGGA CATGTTAGGC CCTTGTTTTC TTTGATTTTA 
3101 TTCTTTTTTT TGAGACAGAA TTTCACTCTT CTCACCCAGG CTGGAATGCA 
3151 GTGGCACAAT TTTGGCTCCC TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 
3201 TCTCCTGCCT CGGCTTCCCA AGTAGCTGGG ATTACAGGCA CGTGCCACCA 
3251 TGTCTGGCTA ATTTTTTGTA TTTTTTTAAT AGATATGGGG TTTCGCCATG 
3301 TTGGCCAGGC TGGTCTCGAA CTCCTGACCT CAGGTGATCC GCCCACCTCG 
3351 GCCTCCCAAA GTGCTGGGAT TACAGGTGTG AGCCTCCACA CCTGGCTGTT 
34 01 TTCTTTGATT TTATTCTTTT TTTTTTTTCT GTGAGACAGA GTTTCACTCT 
34 51 TGTTGCCCAG GCTGGAGTGC AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 
3501 TGCCTCCCGG GTTCAAGCGA TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 
3551 GATTACAGGT GAGCACTACC ACGCCCGGCT AATTTTTGTA TTTTTAATAG 
3601 AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCGAACT CTTGACCTCA 
3651 GGTGATCTGC CCGCCTTGGC CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 
3701 CCGCTGCGCT CGGCCTTCTT TGATTTTATA TTATTAGGAG CAAAAGTAAA 
3751 TGAAGCCCAG GAAAACACCT TTGGGAACAA ACTCTTCCTT TGATGGAAAA 
3801 TGCAGAGGCC CTTCCTCTCT GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 
38 51 GTGGTTTGGG GGTGTTGGTG TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 
3901 TCCCACTCCC AGCTCTGGCA GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 
3951 ATCCTTCCTT ACGATCAATC ACAGTCTCCA GAAGATCAGC TCAATTGCTG 
4001 TGCAGGTTAA AACTACAGAA CCACATCCCA AAGGTACCTG GTAAGAATGT 
4051 TTGAAAGATC TTCCATTTCT AGGAACCCCA GTCCTGCTTC TCCGCAATGG 
4101 CACATGCTTC CACTCCATCC ATACTGGCAT CCTCAAATAA ACAGATATGT 
4151 ATACATATAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880: 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 



Peptide information for frame 2 



ORF from 272 bp to 2788 bp; peptide length: 839 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 



1 MKKWSSTDLG 
51 GKGDSEEAFP 
101 SVAASTEKTL 
151 DPETGKTCLL 
201 KGQTALHIAI 
251 LPLSLAACTN 
301 DNTKFVTSMY 
351 AYILQREIQE 
401 AYSSSETPNR 
451 MAAYYRPVDG 
501 PSMKTLFVDS 
551 NMLYYTRGFQ 
601 DGKNDSLPSE 
651 TENYDFKAVF 
701 QRAITILDTE 
751 TTWNTNVGII 
801 SARDRQSAQP 



AAADPLQKDT 
VDCPHEEGEL 
RLYDRRSIFE 
KAMLNLHDGQ 
ERRNMALVTL 
QLGIVKFLLQ 
NEILILGAKL 
PECRHLSRKF 
HDMLLVEPLN 
LPPFKMEKIG 
YSEMLFFLQS 
QMGIYAVMIE 
STSHRWRGPA 
IILLLAYVIL 
KSFLKCMRKA 
NEDPGNCEGV 
EEVYLRQFSG 



CPDPLDGDPN 
DSCPTITVSP 
AVAQNNCQDL 
NTTI PLLLEI 
LVENGADVQA 
NSWQTADISA 
HPTLKLEELT 
TEWAYGPVHS 
RLLQDKWDRF 
DYFRVTGEIL 
LFMLATVVLY 
KMILRDLCRF 
CRPPDSSYNS 
TYILLLNMLI 
FRSGKLLQVG 
KRTLSFSLRS 
SLKPEDAEVF 



SRPPPAKPQL 
VITIQRPGDG 
ESLLLFLQKS 
ARQTDSLKEL 
AAHGDFFKKT 
RDSVGNTVLH 
NKKGMTPLAL 
SLYDLSCIDT 
VKRIFYFNFL 
SVLGGVYFFF 
FSHLKEYVAS 
MFVYIVFLFG 
LYSTCLELFK 
ALMGETVNKI 
YTPDGKDDYR 
SRVSGRHWKN 
KSPAASGEK 



STAKSRTRLF 
PTGARLLSQD 
KKHLTDNEFK 
VNASYTDSYY 
KGRPGFYFGE 
ALVEVADNTA 
AAGTGKIGVL 
CEKNSVLEVI 
VYCLYMIIFT 
RGIQYFLQRR 
MVFSLALGWT 
FSTAVVTLIE 
FTIGMGDLEF 
AQESKNIWKL 
WCFRVDEVNW 
FALVPLLREA 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20k2, frame 2 

TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds . , N = 1, 
Score = 3760, P = 0 

TREMBLNEW:AB015231_1 product: "stretch-inhibitable nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), complete cds., N = 2, Score = 2090, P = 2e-219 



>TREMBL: AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length = 838 

HSPs : 

Score = 3760 (564.1 bits). Expect = 0.0e+00, P = 0.0e+00 
Identities = 721/839 (85%), Positives = 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+ + +C DP D DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASP 60 

Query: 61 VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 120 

+DCP+EEG L SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRSI F+ 
Sbjct: 61 LDCPYEEGGLASCPIITVSSVLTIQRPGDGPASVRPSSQDSVSAG-EKPPRLYDRRSIFD 119 

Query: 121 AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LTD+EFKDPETGKTCLLKAMLNLH+GQN TI LLL++ 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDV 179 

Query: 181 ARQTDSLKELVNAS YTDS YYKGQTALHI AIERRNMALVTLLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDSYYKGQTALHIAIERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKTDSLKQFVNAS YTDS YYKGQTALHI AIERRNMTLVTLLVENGADVQAAANGDFFKKT 239 

Query: 241 KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 300 

KGRPGFYFGELPLSLAACTNQL IVKFLLQNSWQ ADI SARDS VGNTVLHALVEVADNT 
Sbjct: 240 KGRPGFYFGELPLSLAACTNQLAIVKFLLQNSWQPADISARDSVGNTVLHALVEVADNTV 299 

Query: 301 DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYILQREI E 
Sbjct: 300 DNTKFVTSMYNEILILGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYILQREIHE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCI DTC3KNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCI DTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCI DTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRIFYFNFLVYCLYMIIFTMAAYYRPVDGLPPFKMEK-IGDYFRVTGEI 479 

RLLQDKWDRFVKRI FY FN F VYCLYMI I FT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRI FYFNFFVYCLYMI I FTAAAYYRPVEGLPPYKLKNTVGDYFRVTGEI 479 

Query: 480 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVA 539 

LSV GGVYFFFRGIQYFLQRRPS+K+LFVDSYSE+LFF+QSLFML +VVLYFS KEYVA 
Sbjct: 480 LSVSGGVYFFFRGIQYFLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVA 539 

Query: 540 SMVFSLALGWTNMLYYTRGFQQMGI YAVMIEKMILRDLCRFMFVYI VFLFGFSTAVVTLI 599 

SMVFSLA+GWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVY+VFLFGFSTAVVTLI 
Sbjct: 540 SMVFSLAMGWTNMLYYTRGFQQMGI YAVMIEKMILRDLCRFMFVYLVFLFGFSTAVVTLI 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H+ RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACKP-GNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 658 

Query: 660 FIILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FI I LLLAYVILT YI LLLNMLI ALMGETVNKIAQESKNIWKLQRAIT ILDTEKS FLKCMRK 
Sbjct: 659 FI I LLLAYVILTY I LLLNMLI ALMGETVNKIAQESKNIWKLQRAIT ILDTEKS FLKCMRK 718 

Query: 720 AFRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 779 

AFRSGKLLQVG+TPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 
Sbjct: 719 AFRSGKLLQVGFTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLR 778 

Query: 780 SSRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 77 9 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 



Pedant information for DKFZphtes3_20k2, frame 2 
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Report for DKFZphtes3_20k2 . 2 



[LENGTH] 839 

[MM] 94950.75 

[pi] 6.90 

[HOMOL] TREMBL:AF029310_1 product: "vanilloid receptor subtype 1"; Rattus norvegicus 
vanilloid receptor subtype 1 mRNA, complete cds. 0.0 

[ FUNCAT] 99 unclassified proteins [S. cerevisiae, YIL112w] 4e-05 

[PIRKW] alternative splicing 3e-06 

[PIRKW] peripheral membrane protein 3e-06 

[SUPFAM] ankyrin repeat homology 3e-06 

[SUPFAM] unassigned ankyrin repeat proteins 3e-06 

[PFAM] Ank repeat 

[KW] TRANSMEMBRANE 4 



SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 

PRD cccccccccccccccccceeeeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 

PRD hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDSLKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 

PRD hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

SEQ DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRHLSRKFTEWAYGPVHSSLYDLSCIDTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 

SEQ RLLQDKWDRFVKRIFYFNFLVYCLYMI IFTMAAYYRPVDGLPPFKMEKIGDYFRVTGEIL 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM 

SEQ SVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MVFSLALGWTNMLYYTRGFQQMGIYAVMIEKMILRDLCRFMFVYIVFLFGFSTAVVTLIE 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMMMMMMMMMM . 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSS YNSLYSTCLELFKFTIGMGDLEFTENYDFKAVF 

PRD cccccccccccccccccccccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ IILLLAYVILTYILLLNMLIALMGETVNKI AQESKNIWKLQRAITILDTEKSFLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 



(No Prosite data available for DKFZphtes3_20k2 . 2) 
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HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+T+LHIA +++N+ +V LL+++GAD+ 
Query 202 GQTALHIAIE RRNMAL VT LLVENGADVQ 229 
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DKFZphtes3_2013 



group: transmembrane protein 

DKFZphtes3_2013 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor . 

The novel protein contains one transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 

similarity to IL-17 receptor 
Sequenced by MediGenomix 
Locus : unknown 
Insert length: 2406 bp 

Poly A stretch at pos . 2345, no polyadenylation signal found 

1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 
51 AGCCAGCAGA AACAGTGGGC TGTACAACAT CACCTTCAAA TATGACAATT 

101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 

151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 

201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 

2 51 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 

301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 

351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 

401 TTGTCCCTTT TCCTTCCATT AAAAACGAAA GCAATTACCA CCCTTTCTTC 

4 51 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 

501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 

551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 

601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 

651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 

7 01 TTTCTCCAGG GGATTATATA ATTGAGCTGG TGGATGACAC TAACACAACA 

751 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 

801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 

851 TCGCGACGCT CTTCACTGTG ATGTCCCGCA AGAAGCAACA AGAAAATATA 

901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 

951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 
1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 
1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 
1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 
1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 
1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 
1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 
1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 
1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 
1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 
1451 CCCGAGACCA CGGCCTCCAG GAGCCGGGGC AGCACACGCG ACAGGGCAGC 
1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 
1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 
1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 
1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 
1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 
1751 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 
1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 
1851 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 
1901 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 
1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 
2001 GTCCTCCTCT TCAGGCCTGG GTGAGGAGGA ACCTCCTGCC CTTCCTTCCA 
2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 
2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACAAAACG AAAGAGTCTA 
2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 
2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 
2251 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 
2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2401 AAAAAA 

BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 346 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 

1 MESQPFLNMK FETDYFVKVV PFPSIKNESN YHPFFFRTRA CDLLLQPDNL 

51 ACKPFWKPRN LNISQHGSDM QVSFDHAPHN FGFRFFYLHY KLKHEGPFKR 

101 KTCKQEQTTE MTSCLLQNVS PGDYIIELVD DTNTTRKVMH YALKPVHSPW 

151 AGPIRAVAIT VPLVVISAFA TLFTVMCRKK QQENIYSHLD EESSESSTYT 

201 AALPRERLRP RPKVFLCYSS KDGQNHMNVV QCFAYFLQDF CGCEVALDLW 

251 EDFSLCREGQ REWVIQKIHE SQFIIVVCSK GMKYFVDKKN YKHKGGGRGS 

301 GKGELFLVAV SAIAEKLRQA KQSSSAALSK FIAVYFDYSC EGDVPGILDL 

351 STKYRLMDNL PQLCSHLHSR DHGLQEPGQH TRQGSRRNYF RSKSGRSLYV 

401 AICNMHQFID EEPDWFEKQF VPFHPPPLRY REPVLEKFDS GLVLNDVMCK 

451 PGPESDFCLK VEAAVLGATG PADSOHESQH GGLDQDGEAR PALDGSAALO 

501 PLLHTVKAGS PSDMPRDSGI YDSSVPSSEL SLPLMEGLST DQTETSSLTE 

551 SVSSSSGLGE EEPPALP3KL LSSGSCKADL GCRSYTDELH AVAPL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2013, frame 1 

TREMBL:U58917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds., N - 1, Score - 215, P = 4.7e-14 

TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus 
interleukin 17 receptor mRNA, complete cds., N = 2, Score = 152, P = 
l.le-13 



>TREMBL:U58917_1 product: "IL-17 receptor"; 
mRNA, complete cds. 

Length = 866 

HSPs: 



Homo sapiens IL-17 receptor 



Score = 215 (32.3 bits). Expect = 4.7e-14, P = 4.7e-14 
Identities = 85/284 (29%), Positives = 131/284 (46%) 



Query: 


213 


Sbjct: 


379 


Query: 


269 


Sbjct: 


438 


Query: 


325 


Sbjct : 


498 


Query : 


384 


Sbjct: 


551 


Query: 


435 


Sbjct: 


611 



KV++ YS+ D +++VV FA FL CG EVALDL E+ ++ 



WV QK 



268 



IIV+CS+G + 



+LF A++ I 



++ YF + SC+GDVP + + +Y LMD ++ 
-YVVCYFSEVSCDGDVPDLFGAAPRYPLMDRFEEV- 



+ +D + +PG+ R 
•YFRIQDLEMFQPGRMHRV 



G S NY RS GR L A+ 



PDWFE + + 



P L 



550 



+ EP+ 



+G+V 



++ PS CL++ VG G A 



G+ 
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[LENGTH] 595 

[MWJ 66847.05 

[pi] 6.27 

[HOMOL] TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus rausculus interleukin 
17 receptor mRNA, complete cds . 2e-14 

[BLOCKS] BL00740A MAM domain proteins 

[BLOCKS] BL01224B N-acetyl-gamma-glutamyl -phosphate reductase proteins 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 13.61 % 



SEQ MESQPFLNMKFETDYFVKVVPFPSIKNESNYHPFFFRTRACDLLLQPDNLACKPFWKPRN 

SEG 

PRD ccccccccccccccceeeeeccccccccccceeeeeeceeeeeeeccccccccccccccc 

MEM 

SEQ LNISQHGSDMQVSFDHAPHNFGFRFFYLHYKLKHEGPFKRKTCKQEQTTEMTSCLLQNVS 

SEG 

PRD eeeecccccceeeecccccccceeeeeehhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ PGDYI IELVDDTNTTRKVMHYALKPVHSPWAGPIRAVAITVPLVVISAFATLFTVMCRKK 

SEG 

PRD ccceeeeeeccccccccccccccccccccccccceeeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ QQENIYSHLDEESSESSTYTAALPRERLRPRPKVFLCYSSKDGQNHMNWQCFAYFLQDF 

SEG xxxxxxx xxxxxxxxxx 

PRD hhhhhhhhhcccccccceeeeccccccccccceeeeeeecccccchhhhhhhhhhhhhhc 

MEM 

SEQ CGCEVALDLWEDFSLCREGQREWVIQKT HESQFI I VVCSKGMK YFVDKKN YKHKGGGRGS 

SEG xxxxxxxxx 

PRD ccchhhhhhhhccccccccchhhhhhhhhhheeeeeeeeccceeeeeccccccccccccc 

MEM 

SEQ GKGELFLVAVSAI AEKLRQAKQSSSAALSKFIAVYFDYSCEGDVPGILDLSTKYRLMDNL 

SEG xxx xxxxxxxxxxxxxxx 

PRD ccceeeeehhhhhhhhhhhhhhcchhhhhhhheeeeccccccccccccccchhhhhhccc 

MEM 

SEQ PQLCSHLHSRDHGLQEPGQHTRQGSRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQF 

SEG 

PRD cchhhhhhcccccccccccccccccceeeeccccccceeeeeeceeeecccccceeeeee 

MEM 

SEQ VPFHPPPLRYREPVLEKFDSGLVLNDVMCKPGPESDFCLKV3AAVLGATGPADSQHESQH 

SEG 

PRD eecccccccccceeeeeccccceeeeecccccccccchhhhhhhhhhccccccccccccc 

MEM 

SEQ GGLDQDGEARPALDGSAALQPLLHTVKAGSPSDMPRDSGIYDSSVPSSELSLPLMEGLST 

SEG xxxxxxxxxxxxxxxxx . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccchh 

MEM 

SEQ DQTETSSLTESVSSSSGLGEEEPPALPSKLLSSGSCKADLGCRSYTDELHAVAPL 

SEG . . xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhheeecccccccccccccccceeeccccceeeccccccccceeeeccc 

MEM 



(No Prosite data available for DKFZphtes3_2013 . 1) 
(No Pfam data available for DKFZphtes3_2013 . 1) 
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DKFZphtes3_20ml8 



group: nucleic acid management 

DKFZphtes3_20ml8 encodes a novel 132 amino acid protein with similarity to the S. cerevisiae 
mitochondrial carrier protein RIM2 . 

The novel protein contains a leucine zipper and a Prosite mitochondrial energy transfer 
proteins signature. It is member of a family of substrate carrier proteins which are found in 
the inner mitochondrial membrane and are involved in energy transfer. The RIM2/MRS12 gene 
encodes a predicted protein of 377 amino acids that is essential for mitochondrial DNA 
metabolism and proper cell growth. Inactivation of this gene causes the total loss of 
mitochondrial DNA and, compared to wild-type rhoo controls, a slow-growth phenotype on media 
containing glucose. The novel protein seems to be the human orthologue of this protein. 

The new protein can find application in modulation of mitochondrial DNA replication and 
maintenance . 



similarity to carrier protein RIM2 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 3572 bp 

Poly A stretch at pos . 3530, polyadenylation signal at pos . 3510 



1 GCCGCGGGGA GGGCTGTGCC GGTTGCTTTC TGCAGCCGCA TCTCGGCCAG 

51 CTCTCCTCGC CGTCCCCGGG GCGCTGTGCG TCTCCAGTCC GGGACCGAAG 

101 CCGCCTGCCG TAGCGGGCGG CCAGATCCGC GTCCCGCCTC AGCGGCCGGA 

151 GGACATGCGG GAGAGAGAAT GAGCCAGAGG GACACGCTGG TGCATCTGTT 
201 TGCCGGAGGA TGTGGTGGTA CAGTGGGAGC TATTCTGACA TGTCCACTGG 

251 AAGTTGTAAA AACACGACTG CAGTCATCTT CTGTGACGCT TTATATTTCT 

301 GAAGTTCAGC TGAACACCAT GGCTGGAGCC AGTGTCAACC GAGTAGTGTC 

351 TCCCGGACCT CTTCATTGCC TAAAGGTGAT CTTGGAAAAA GAAGGGCCTC 
401 GTTCCTTGTT TAGAGGACTA GGCCCCAATT TAGTGGGGGT AGCCCCTTCC 

451 AGAGCAATAT ACTTTGCTGC TTATTCAAAC TGCAAGGAAA AGTTGAATGA 

501 TGTATTTGAT CCTGATTCTA CCCAAGTACA TATGATTTCA GCTGCAATGG 

551 CAGGTATGAA TGTATAATAT TAAAAAAAAA AAAAACTTTC TGAAACCTAG 

601 AGGCTTAATA TTGAATTATA AGTTTGTAGT GAAAAGTTGA TGATTAATGT 

651 GCTTTTCATT GATTAGATGA TTTTTACGTT TATCGATATA AACCAAATTA 
701 GGTATATGTA AAATCTGTCA TCAGTTGACA TTTTTGTAGT CAGGAGTTTA 

751 CATGCTAGGG TACAAGTAAT ATATTTATAT TGCCTTGTGT AGTCCACTGA 

801 ATGTTTAGTG ATCATTGTTA ACAGTTTTAA GAATCCAACC ATAATTACAC 

851 TATAAATAAG TTATGGAGCT GTAATTTACT CTTCTCTCCT CAATTTCTGT 

901 TAGTGCCTTT TCCCTTTTTG CTGCATGTTT TGGCTTCTGT CTGAAATGTG 

951 TCGGCAATTC TTGGTAAAGT ATTCATTTTG TCCTGTGCTC AAATGCTGAA 

1001 ATTTTTGTGA GTGATGTATT ATTATTGACA ATTCAGTTAC TATGTGTATT 

1051 TTTTAAAATT GTTTATTATT CTACATAATT CACACTAGAC AGCACCTGAA 

1101 ATTTAGACAC TGGCTATGTG TACATGCTTA C T AT AGAAAT GTTTCCAGGA 

1151 ACTCTCTGTT TCTGTCATCA CTGATAAGTA TATATGATTC TGAATTAAAA 

1201 TAACTAGTTT TAGGTCTTTA CCCTGCCATA AAGATAAACA GTTGGTTTGA 

12 51 CCAATCTGGT TCTGGAATCA TTTGCTGCTA TGCATGTTAG ACAAAGCCAC 

1301 GAACTTTGAT TTTCCATTGA AAATTCTCCC TAATATCTGA GATTTATTGT 

1351 ATATTTACTC ATATCTCACA TTTTCAAATT ATGCTGTAAC TTTATAAACT 

14 01 GTAGCTGCTT TCATCAGCTA TTGATCAATA AATTGAATGT CAATTATGTG 

14 51 CTTAATAATG AGTGCCTTAA ACTGTTAAAC ACTTTTGGTT TAGAAATAAA 

1501 GTGAATCAAT TTGACCTATA TACTTCATGA AGTAAGTAAG TTTGAAATAC 

1551 AAATTTCTGA AAGGTCAATA GCCCTTATCG TATTACAAAT TGTTTTTAAG 

1601 GCTTTTTGTA TTTATTAATT GTCAGTTGAT TCACTGAAGC TTTAAAACTG 

1651 GAAGGGACAA TCCAAAGGTC AAAAGAGTGA AATACAATCA TTTACCAATA 

17 01 AGGAAACCTT GGGCAAATTA TGTAATTTAT GTGAACCTCT CTTAGCTTAC 

1751 CCATGGAATG AGTCAAGTGG TCTACATAGA TTTGGATTTT GAGAATTAGT 

1801 TCTTTCATTT AGTGTTATAG AGATTATCTT GTTACAACTA GAATTATTTT 

1851 TAATGTAATT TTTACAGATG TTGAATATTA GTAGATAGGA TTTTTCCCCT 

1901 ACGAATTTGG ATGTAAGGTA AAGGTTGGTG GCCAGTGACA AACCTTATAA 

1951 CCACTTTATC AGGTTCTTTA AAAATATATT TGTGAATTAC CAGTGATTAT 

2001 GTTTTTGGCT TATAACCTCA GATAATTATA AAGAAATGTT AATCTTATTT 

2051 GAAAGAATTG GAATCTAGAA AGTTAGATGA GCAGTCATTT TATATTGATA 

2101 TTTGTTATAT CAGTATAGCA AATGCAGAGG TTCAGAATAT CTTTATTTCC 

2151 ACTGGAACAT CTTATTTCAT TAGAGTATCT CATCAGAATT TATTACTGTA 

22 01 TTTGTATCAC ATTGCAAAGA ATTTCAGTAG AATTGTCAGT TTGCACTTTT 

22 51 TTCTCAAATG TGTACAAATG TTAACATATA GTTCATTTTT ATCTGTACAT 

2301 TGATGCCATT TCCCAACTTG AATTCCTCAA GTTTTGGTAA ACTTACAATC 

2351 TCATACTTGT TCAGAGGTTA TTGCACTGTA CACTTACTGT GTAGAAAATA 

24 01 CTGTTTGAAT TTGTTTGCAG TTACATTGTT CTGAGAACTG TGCTCTCAGA 

24 51 GCTTCTGTGC ACTATTCATG AGCATTAACA CTTAGCCTTG CAGTTTTATA 
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2501 CATAACTATA TGGTTAGTAA AACTGAATGG TCCAATGCAG ACTCATTAAA 
2551 GTAGGCTTTT GCCCCCTTTG TTCTTGAAAT AATCTAGACC AGATTACTCG 
2 601 GGGTTTTTTT TAGGATTATT TTTATAGGTC TAAATATGAA TGATTTGGGG 
2651 GTATGAAGTA CTTAAAGATA GTTCTGTGAA AAATCATTTT CAGCTGTCTA 
2"701 TTCAAGGGAA AAAATGCTAA CCTTGTCACT TTACTACACA AAACCACACT 
27 51 AAAATAAACC ATTAATGATA CTGCCTGCAA GATTTTAACA CACCAGATAG 
2801 CACACACATT AAGGATTTAT AAGGCACTGT ACGTAATTTT TATTCCAAGT 
2851 GACCTCTCAA TTCATTTTCA TTTTGCATTT TATCCATATG AACTCATGTT 
2901 TAATTTAGAT AATAAAAATT TATTTTATTA AAAGGACAGT TTATTTAAAG 
2951 TGGGTCTTTT TATTTGTTGT AGTGCATACT ATAAGAATTT GTAAGCCTCT 
3001 AAAGTTGAGC TATAAATTTT CATGCATTAA AAATTTGTTT CAGTTGTGAG 
3051 GATATTTAAT CAGATTAAAT AATGTTGACT CTTAATATTT TGCCTGCCTT 
3101 TTTTTTCTCC TACACATGAC CTTTGACAGA CTAAGTATAT CTCAGCTATT 
3151 GAGGGTATCT GTTTTGTTGC CTGTATATTT TGTTTAAATT AACTTGTATA 
3201 TTCCTTTGTA TACACCTAGG CACAGATGTA TGCAAAAAAA ATTTGTTAAA 
3251 TTACTTCTTT CTTTATACTA ATTCTCAATT TTTAAAAGAT TTTATCTGGC 
3301 ATGTATATAC TTTTATATAG AACATTATAA ATGTAAAGGA AATGAATTCT 
3351 AATTTTAATT GGATTATGTA TTCATACAGT TATTCTCAAT TTTTAAAATA 
3401 CTAATAATGT AATCATTGAA TGTTTCCTAC ATACGTAGTG GGTTTTATTT 
3451 GCTCACAGCA TACAGTTATT TTTCAATTTA TGTTTTTCTA TTAGACTTAA 
3501 ATTTCATTAT AATAAAGGCT TTTACTCATT AAATACAAAA AAAAAAAAAA 
3551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
95198680: 

Overexpression of a novel member of the mitochondrial carrier family rescues defects in both 
DNA and RNA metabolism in yeast mitochondria. 



Peptide information for frame 1 



ORF from 169 bp to 564 bp; peptide length: 132 
Category: similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: LEUCINE_ZIPPER (27-49) 
MITOCH CARRIER (26-36) 



1 MSQRDTLVHL FAGGCGGTVG AILTCPLEVV KTRLQSSSVT LYISEVQLNT 
51 MAGASVNRVV SPGPLHCLKV ILEKEGPRSL FRGLGPNLVG VAPSRAI YFA 
101 AYSNCKEKLN DVFDPDSTQV HMISAAMAGM NV 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_20ml8, frame 1 

PIR:S44092 probable carrier protein c2 - Caenorhabditis elegans, N = 2, 
Score = 147, P = 1.5e-19 

PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 230, P = 6.2e-19 



>PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) 
Length = 377 

HSPs : 

Score = 230 (34.5 bits), Expect = 6.2e-19, P = 6.2e-19 
Identities = 55/133 (41%), Positives = 80/133 (60%) 

Query: 8 VHLFAGGCGGTVGAILTCPLEVVKTRLQSSS-VTLYISEVQLNTMAGA SVNRVVSP 62 

VH AGG GG GA++TCP ++VKTRLQS + Y S+ +N G+ S+N V+ 
Sbjct: 54 VHFVAGGIGGMAGAVVTCPFDLVKTRLQSDIFLKAYKSQA-VNT SKGSTRPKSINYVIQA 112 
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Query: 63 GP LHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLNDVFD— P 115 

G L + + ++EG RSLF+GLGPNLVGV P+R+I FY K+ F+ 

Sbjct: 113 GTHFKETLGIIGNVYKQEGFRSLFKGLGPNLVGVIPARSINFFTYGTTKDMYAKAFNNGQ 172 

Query: 116 DSTQVHMISAAMAG 129 

++ +H+++AA AG 
SbjCt: 173 ETPMIHLMAAATAG 186 

Score = 77 (11.6 bits), Expect = l.le+00, P = 6.8e-01 
Identities = 25/88 (28%), Positives = 39/88 (44%) 

Query: 3 QRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSP 62 

Q ++HL A G A T P+ ++KTR VQL+ SV + + 

Sbjct: 172 QETPMIHLMAAATAGWATATATNPIWLIKTR VQLDKAGKTSVRQYKNS 219 

Query: 63 GPLHCLKVILEKEGPRSLFRGLGPNLVG 90 

CLK ++ EG L++GL + +G 
Sbjct: 220 WD--CLKSVIRNEGFTGLYKGLSASYLG 245 

Score = 71 (10.7 bits), Expect = 6.6e+00, P = 1.0e+00 
Identities = 28/91 (30%), Positives = 45/91 (49%) 

Query: 12 AGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSPGPLHCLKVI 71 

+ G V +1 T P EVV+TRL+ + +NG R+G+ KVI 

Sbjct: 294 SAGLAKFVASIATYPHEVVRTRLRQTP KEN G KRKYT-GLVQSFKVI 338 

Query: 72 LEKEGPRSLFRGLGPNLVGVAPSRAI YFAAY 102 

+++EG S++ GL P+L+ P+ I F + 
SbjCt: 339 IKEEGLFSMYSGLTPHLMRTVPNSI IMFGTW 369 

Pedant information for DKFZphtes3_2Cml8 , frame 1 



Report far DKFZphtes3_20ml8 . 1 

[LENGTH] 132 

[MW] 13993.36 

[pi] 8.42 

[HOMOL] PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast (Saccharomyces 
cerevisiae) 7e-19 

[FUNCAT] 07.16 purine and pyrimidine transporters [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 08.04 mitochondrial transport [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 30.16 mitochondrial organization [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 02.13 respiration [S. cerevisiae, YBR192w] 3e-20 

[FUNCAT] 01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 3e-10 

[FUNCAT] 07.99 other transport facilitators [S. cerevisiae, YEL006w] le-09 

[FUNCAT] 01.07.10 transport of vitamins, cofactors, and prosthetic groups [S. 
cerevisiae, YlL006w] 3e-09 

[FUNCAT] 07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 

2e-08 

[FUNCAT] 01.03.19 nucleotide transport [S. cerevisiae, YPROllc] 3e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) (S. cerevisiae, YKR052c] 4e-08 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 

2e-07 

[FUNCAT] 01.01.07 amino-acid transport [S. cerevisiae, YOR130c] 5e-05 

[ FUNCAT] 07.10 amino-acid transporters [S. cerevisiae, YOR130c] 5e-05 

[ FUNCAT] 01.04.07 phosphate transport [S. cerevisiae, YJR077c] 7e-05 

[FUNCAT] 13.04 homeostasis of other ions [S. cerevisiae, YJR077c] 7e-05 

[BLOCKS] BL00215B Mitochondrial energy transfer proteins 

[BLOCKS] BL00215A Mitochondrial energy transfer proteins 

[PIRKW] duplication 6e-09 

[PIRKWJ transmembrane protein 6e-09 

[PIRKW] mitochondrial inner membrane 4e-07 

[PIRKWJ transport protein 5e-06 

[PIRKW] mitochondrion 7e-08 

[PIRKW] chloroplast 3e-08 

[SUPFAM] Btl protein 3e-08 

[SUPFAM] ADP, ATP carrier protein repeat homology 4e-09 

[SUPFAM] Caenorhabditis probable carrier protein c2 4e-09 

[SUPFAM] probable carrier protein YPR021C 6e-09 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MITOCH_CARRIER 1 

[PFAM] Mitochondrial carrier proteins 

[KW] Alpha_Beta 

SEQ MSQRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVV 
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PRD cccccceeeecccccccceeeeeecchhhhhhhhhhhccccccccccccccccccccccc 

SEQ SPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAI YFAAYSNCKEKLNDVFDPDSTQV 

PRD cccchhhhhhhhhhcccceeeeccccceeeecccceeeeeehhhhhhhhhcccccccccc 

SEQ HMISAAMAGMNV 

PRD chhhhhhhcccc 



Prosite for DKFZphtes3_20ml8 . 1 

PS00029 27->49 LEUCINE_ZIPPER PDOC00029 

PS00215 26->36 MITOCH CARRIER PDOC00189 



Pfam for DKFZphtes3_20ml8 . 1 



HMMNAME Mitochondrial carrier proteins 

HMM *pFwk.dFLAGGIAGmMeHTvMFPIDtIKTRMQlQgEMpM. .ahpR 

+++++++AGG +G + +++++P++++KTR+Q++ ++ + ++ 
Query 5 DTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSS-SVTLYISEVQLNTMA 52 

HMM YkGMIdCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFY 

G+++C++ I+++EG+R+L+RGLG+N+++++P +AI+F+ Y 
Query 53 GASVNRVVSPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAY 102 

HMM EFMKeMFiDyf geddnyWmWFwmnYMaGs* 

+KE ++D F++ D++++++ + +MAG+ 
Query 103 SNCKEKLNDVFDP-DSTQVHMISAAMAGM 130 
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group: signal transduction 

DKFZphtes3_21d4 encodes a novel 464 amino acid putative GTP exchanging factor related to RCC1. 

RCC1 (regulator of chromosome condensation) is a eukaryotic protein which binds to chromatin 
and interacts with ran, a nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP 
with GTP, acting as a guanine-nucleotide dissociation stimulator. 

The new protein can find application in the regulation of gene expression by activition of 
nuclear GTP-binding proteins. The X-linked retinitis pigmentosa is a result of a defect GTPase 
regulator, which contains a RCCl-type repeat. 

similarity to RCCl-like G exchanging factor RLG 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="20" 
Insert length: 2321 bp 

Poly A stretch at pos. 2293, polyadenylation signal at pos. 2262 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
■701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 



GGGTCACGCA 
GATGGCGCTG 
GCGGGCCGGG 
AGCCGGCGCG 
CGTGGGCGAG 
GCTTCTCGGG 
CCCGGGCCCC 
CTATCGCCTG 
GATTCACACT 
ATGGGACTCA 
TAAAACGAGG 
CTCTGGACAG 
GCTCACTCTC 
CAATTCTTAT 
GTGAAAGTCA 
CAGGTCGCCT 
AGTCTATTCT 
ACAATATCAC 
AACGTTATCC 
CGACGGAGGA 
CTGTCACTGA 
GGAGTGGGGA 
GTTAAACGGA 
AAGGTCCAAA 
CTCTTTGGCT 
ATGTGGACTC 
TATGGGGCAA 
CAGTATTTCC 
ATGTGGCGTG 
CCCTCACCTG 
CAGAGGCCAG 
GAGGACCCTG 
CCTACCAGGA 
GCGGAACTCA 
GTCTCTGTTC 
CCTGAGAAGC 
GTTTCTGCTC 
CCATGCGTGT 
GGCTTCGGCC 
GGCCGCCTGC 
GCCTGGGATT 
CCACGCTCAC 
ATCGAGGCTA 
CTCAGCTGCA 
GTTTGATTTA 
AATCTGTCTT 
AAAAAAAAAA 



AGATGGCGGC 
GTGGCGTTGG 
GCTGGGGCGA 
AAGCGGCAGA 
CGCGCTGCCC 
GGCGCTGGGC 
GCGCCGGCGC 
GAGCTGGACC 
GCTGTCCTCT 
ACAAAGATTC 
GGCTACGAGT 
ACCTCAGGAG 
TTGTGTTGAC 
GGGCAATGTG 
CAGAGTCCAC 
GTGGTCAGGA 
TGTGGATGGG 
CAGCTCGCCC 
AAGTTGCCAC 
CTTTTTGGTT 
CTCCACACAG 
AGGTGCGACA 
GAAGGACATG 
CCTAGTGGAA 
TGACGGAGTT 
AGCCACTTTG 
GAACATCCGA 
CATGGAGGGT 
GACCACATGG 
CTTGGGCGGC 
CGCGTGGCCA 
CGTGCAGTGT 
GGATCAGGGC 
GCTTGGATGG 
TCTCCAAGTA 
CCGTCTTAGG 
TGTCTGTCAT 
GCCTCGGACC 
CTCGGTTTTC 
CACACCGCAA 
CCATCTGGAG 
TGCCTGGCTT 
AGTTCCTCTG 
GTGCCAGGCG 
GTAGCGTGAC 
GGAATAAAGT 
AAAAAAAAAA 



GCCCAGAGGC 
TGGCTGGGGC 
GGGCACTGGA 
AGCCGAGGCG 
GCGCCGATCG 
GTGCCTTCCT 
CCGACCGCGC 
AAAAGATTTC 
AAGACTGCGG 
TCAGCTTGGA 
ATGTGTTGGA 
ACACGGGTGC 

TGACAGGGAA 
GAAGAAAGGT 
AGGATGCAGG 
TCATAGTCTG 
GTGCTGATGG 
ACCAAGCTGG 
CTACGGTGAT 
GGGGAAACTC 
GTGAATGTGC 
GGCTGCATGC 
TTTTTGTCTG 
AGTGCCGTCC 
CAACCCAGAA 
CTGCACTGAC 
GGGTGCCTGG 
GACGATGCCT 
TGACCCTGGC 
CCCGTCCCGG 
GCCCCCCGGG 
GACGCTCTGT 
AAGGTCCCTC 
CAGCCTTTGG 
ACATGCGACG 
AAAGCTTAGC 
GGCAGTCTCT 
GAGCCCCAGC 
ATTCAGGCCA 
GCTCGCTGGG 
CTGTCCGCAG 
GGAAAAGTTA 
CGCCGAGGGC 
GAGGAGGAAG 
TTGCCTTTCC 
CTATTTTCTG 
A 



TGCTGAGGCG 
TCGGCTGGGG 
CGGCGGCCAG 
GAGGTGCCCG 
CGTCTTCGTG 
TTGTGGTGCC 
CGCAGGATCC 
ATCTGCTGCT 
ATGTTACGAA 
TTTCACAGGA 
GCCCTCACCC 
TGCAGGTCTC 
GGAGTCTTCA 
GGTCGAAAAT 
ACTTCGATGG 
TTCCTGACGG 
GCAAACAGGT 
GTGGAGACCT 
TGCTGCCTGG 
GGAGTACCTG 
CCCGCTGCTT 
GGTGGCACGG 
GGGCTATGGA 
CTGAAATGAT 
ATCCAGGTTT 
CAACAAAGGA 
GAATCGGTCG 
GGGGAGCCTG 
CAAGTCATTC 
GAACCACTGG 
GTTCTTGGAT 
CCTGAATCCC 
TCCAGCTGCA 
TGGGCCGCTG 
GTGTCTGGTG 
TTGAACACAG 
TGGTTTGTGT 
TTAGGCGAGG 
CCCTGCTCAT 
GGGACACTAG 
GCACCAGCCC 
AGAAGCCCCT 
CCCGAGCATA 
ATCCAGAAAT 
CTTTAAAAAC 
CCTTTTGGTT 



CGGAACGGAG 
CGGCGGCTGA 
GCGCTCCCGG 
TGGTCCAGTA 
TGGGGCTTCA 
CAGCTCCGGG 
AGCCCGTGCC 
TGCGGCTATG 
AGTCTGGGGG 
GCCGGAAAGA 
GTCTCCCTGC 
CTGCGGCCGA 
GCATGGGAAA 

GAAATTTACA 
CCAGGTGGTC 
ATAAAGGAGA 
CTGGGTCACT 
GGCGGGAGTG 
CCGTGTCCGC 
CAGCTGGCCT 
ACACTTCTCA 
GCTGTGCAGT 
ATTCTTGGGA 
TCCACCCACT 
CCCGCATCCG 
GAGCTGTTTG 
CCTGGAGGAC 
TGGACGTGGC 
ATCTAAACCT 
CACTCCTTGG 
GGTGGTGGCG 
TTAGCGGGTA 
GGTGAGGCCT 
TGGCCCGCAC 
TCACGTCTCG 
TGCTCGGGAG 
CTGGCCAAGG 
GAGTCAGGCT 
GGCCCTTCCT 
AAGCACCGTG 
CAGCCTCCCA 
CAGGAAGAGA 
TCCGCCAAGG 
TGTGAACAAT 
ATCTTTTACA 
TTTAAAAAAA 



BLAST Results 
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Entry HS203358 from database EMBL : 
human STS SHGC-31781. 
Score = 1748, P = l.le-72, identities = 376/394 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 52 bp to 1443 bp; peptide length: 464 
Category: similarity to known protein 



1 MALVALVAGA 

51 VGERAARADR 

101 YRLELDQKIS 

151 KTRGYEYVLE 

201 NSYGQCGRKV 

251 VYSCGWGADG 

301 DGGLFGWGNS 

351 LNGEGHVFVW 

401 CGLSHFAALT 

451 CGVDHMVTLA 



RLGRRLSGPG 
VFVWGFSFSG 
SAACGYGFTL 
PSPVSLPLDR 
VENEIYSESH 
QTGLGHYNIT 
EYLQLASVTD 
GYGILGKGPN 
NKGELFVWGK 
KSFI 



LGRGHWTAAR 
ALGVPSFVVP 
LSSKTADVTK 
PQETRVLQVS 
RVHRMQDFDG 
SSPTKLGGDL 
STQVNVPRCL 
LVESAVPEMI 
NIRGCLGIGR 



R3RSRREAAE 
SSGPGPRAGA 
VWGMGLNKDS 
CGRAHSLVLT 
QVVQVACGQD 
AGVNVIQVAT 
HFSGVGKVRQ 
PPTLFGLTEF 
LEDQYFPWRV 



AEAEVPVVQY 
RPRRRIQPVP 
QLGFHRSRKD 
DREGVFSMGN 
HSLFLTDKGE 
YGDCCLAVSA 
AACGGTGCAV 
NPEIQVSRIR 
TMPGEPVDVA 



BLASTP hits 

Entry CEW09G3_5 from database TREMBLNEW: 

gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 

Score = 395, P = 9.3e-37, identities = 111/330, positives 



165/330 



Entry Y032_HUMAN from database SWISSPROT: 
HYPOTHETICAL PROTEIN KIAA0032. 

Score = 309, P = 1.0e-24, identities = 96/308, positives = 143/308 

Entry B38919 from database PIR: 
hypothetical protein 2 - human (fragment) 

Score = 309, P = 1.0e-24, identities = 96/308, positives = 143/308 



Entry AF060219_1 from database TREMBLNEW: 
product: "RCCl-like G exchanging factor RLG 
exchanging factor RLG mRNA, complete cds . 
Score = 273, P = 4.0e-21, identities = 84/262, positives 



Homo sapiens RCCl-like G 
124/262 



Entry S71752 from database PIR: 
giant protein p619 - human 

Score = 282, P = l.le-19, identities = 86/287, positives 



144/287 



Alert BLASTP hits for DKFZphtes3_21d4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_21d4, frame 1 



Report for DKFZphtes3_21d4 . 1 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 



464 

49997.08 
8.74 

TREMBL : CEW09G3 



5 gene: "W09G3.3"; Caenorhabditis elegans cosmid W09G3 5e-34 



[ FUNCAT ] 04.07 rna transport [S 

[ FUNCAT ] 03.07 pheromone response, 

[S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 08.01 nuclear transport ' [S. 

[ FUNCAT ] 04.05.05 mrna processing (5'-end, 

cerevisiae, YGL097w] 2e-09 

[FUNCAT] 04.01.04 rrna processing [S. 

[FUNCAT] 04.03.03 trna processing [S. 

[FUNCAT] 30.03 organization of cytoplasm 



cerevisiae, YGL097w] 2e-09 

mating-type determination, sex-specific proteins 



cerevisiae, YGL097w] 2e-09 

3 '-end processing and mrna degradation) [S. 

cerevisiae, YGL097w] 2e-09 
cerevisiae, YGL097w] 2e-09 

[S. cerevisiae, YGL097w] 2e-09 
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[FUNCAT] 


30.04 organization of cytoskeleton [S. cerevisiae, 


YAL020C] 


[BLOCKS] 


BL00870I 




[BLOCKS] 


BL00625B Regulator of chromosome condensation (RCC1) 


proteins 


[BLOCKS] 


BL00625A Regulator of chromosome condensation (RCC1) 


proteins 


[PIRKW] 


blocked amino end 3e-16 




[PIRKW] 


nucleus 3e-16 




[PIRKW] 


duplication 4e-08 




[PIRKW] 


tandem repeat 3e-16 




[PIRKW] 


DNA binding 3e-16 




[PIRKW] 


mitosis 3e-16 




[PIRKW] 


leucine zipper 3e-21 




[SUPFAM] 


pheromone response pathway component SRM1 4e-08 




[SUPFAM] 


WD repeat homology 3e-21 




[PROSITE] 


MYRISTYL 7 




[PROSITE] 


RCC1 2 2 




[PROSITE] 


AMI DAT I ON 2 




[PROSITE] 


CAMP PHOSPHO SITE 1 




[PROSITE] 


CK2 PHOSPHO SITE 5 




[PROSITE] 


TYR PHOSPHO SITE 2 




[PROSITE] 


GLYCOSAMINOGLYCAN 3 




[ PROSITE] 


PKC PHOSPHO SITE 7 




[PROSITE] 


ASN_GLYCOS YLATION 2 




[PFAM] 


Regulator of chromosome condensation (RCC1) 




[KW] 


All Beta 




[KW] 


LOW_COMPLEXITY 13.58 % 





SEQ MALVALVAGARLGRRLSGPGLGRGHWTAARRSRSRREAAEAEAEVPVVQYVGERAARADR 

SEG .xxxxxxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhheeeccccccccchhhhhhhhhhhhhhhhhhhceeeeeehhhhhhhhh 

SEQ VFVWGFSFSGALGVPSFVVPSSGPGPRAGARPRRRIQPVPYRLELDQKISSAACGYGFTL 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD eeeeccccccccccceeeeeccccccccccccccccccccchhhhhhhheeeccccceee 

SEQ LSSKTADVTKVWGMGLNKDSQLGFHRSRKDKTRGYEYVLEPSPVSLPLDRPQETRVLQVS 

SEG 

PRD eecccccceeeeccccccccccccccccccccccceeeeeccccccccccccccceeeee 

SEQ CGRAHSLVLTDREGVFSMGNNSYGQCGRKVVENEIYSESHRVHRMQDFDGQVVQVACGQD 

SEG 

PRD cccceeeeeeccceeeeeccccccccccccccccccccccccccccccceeeeeeecccc 

SEQ HSLFLTDKGEVYSCGWGADGQTGLGHYNITSSPTKLGGDLAGVNVIQVATYGDCCLAVSA 

SEG 

PRD eeeeeecccceeeecccccccccccccccccccccccccccceeeeeeecccceeeeeec 

SEQ DGGLFGWGNSEYLQLASVTDSTQVNVPRCLHFSGVGKVRQAACGGTGCAVLNGEGHVFVW 

SEG 

PRD ccceeeeccccccccccccccccccccccccccccceeeeeccccceeeeeecccceeee 

SEQ GYGILGKGPNLVESAVPEMIPPTLFGLTEFNPEIQVSRIRCGLSHFAALTNKGELFVWGK 

SEG 

PRD cccccccccccccccccccccceeeeeeecccceeeeeeecccceeeeeecccceeeecc 

SEQ NIRGCLGIGRLEDQYFPWRVTMPGEPVDVACGVDHMVTLAKSFI 

SEG 

PRD cccccccccccccccccceeecccceeeeecccccccccccccc 



Prosite for DKFZphtes3_21d4 . 1 



PS00001 


200- 


>204 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


268- 


>272 


ASN GLYCOS YLATION 


PDOC00001 


PS00002 


17 


->21 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00002 


82 


->86 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00002 


333- 


>337 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


14 


->18 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


34 


->37 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


122- 


>125 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


147- 


>150 


PKC_PHOSPHO SITE 


PDOC00005 


PS00005 


190- 


>193 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


219- 


>222 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


246- 


>249 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


410- 


>413 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


34 


->38 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


147- 


>151 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


190- 


>194 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


290- 


>294 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


317- 


>321 


CK2 PHOSPHO SITE 


PDOC00006 
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PS00007 


209->217 


TYR PHOSPHO 


SITE 


PDOC00007 


PS00007 


208->217 


TYR~ PHOSPHO" 


"site 


PDOC00007 


PS00008 


9->15 


myristyl 




PDOC00008 


PS00008 


20->26 


MYRISTYL 




PDOC00008 


PS00008 


' 133->139 


MYRISTYL 




PDOC00008 


PS00008 


238->244 


MYRISTYL 




PDOC00008 


PS00008 


277->283 


MYRISTYL 




PDOC00008 


PS00008 


302->308 


MYRISTYL 




PDOC00008 


PS00008 


344->350 


MYRISTYL 




PDOC00008 


PS00009 


12->16 


AM I DAT I ON 




PDOC00009 


PS00009 


206->210 


AMIDATION 




PDOC00009 


PS00626 


179->190 


RCC1 2 




PDOC00544 


PS00626 


235->246 


RCC1 2 




PDOC00544 



Pfam for DKFZphtes3_21d4 . 1 



HMM_NAME Regulator of chromosome condensation (RCC1) 

HMM * I AaGqHHTVCLTqDGRV YtWG* 

+A GQ+H++ LT++G VY++G 
Query 235 VACGQDHSLFLTDKGEVYSCG 255 
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DKFZphtes3_21jl5 



group: transcription factors 

DKFZphtes3_21 j 15 encodes a novel 898 amino acid protein with similarity human NY-CO-33 
protein . 

NY-CO-33 is a protein recognised by autologous antibodies of human colon cancer patients. The 
novel protein contains 4 C2H2 Zinc fingers and is a new putativ transcription factor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



strong similarity to "NY-CO-33" 

complete cDNA, complete cds, potential start at bp 27, EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 4407 bp 

Poly A stretch at pos . 4321, polyadenylation signal at pos . 4301 



1 CGCTGCAGCA GGTGTCACAG 
51 AGCACCGTGC AGCTGTACCG 
101 CACGGGGGCC AGCAAGTTCC 
151 CCCTGGTGGA GTTGACAGTG 
201 GACAACCATG AGACCGATAA 
251 CAAACGCTCC TTGCTGGAAA 
301 TGAAGTGCAT GTACTGTGGC 
351 GTCCATATGA TCAAAACAAA 
401 CGTCACTCCT GTCGCCGCCA 
451 CCCTGGAGCT GGAGCTCCCC 
501 AAAGCCACCA TCTCAGACAC 
551 TTACATCACG CCAAATAATC 
601 CATGGCACTT TGAGGCCCGG 
651 GGGAGCTCGC ATGACACCCT 
701 TGGCCACTTC ATCAAGGTCA 
751 TTGTGGAGAC GCCTGTCACA 
801 GTCCAGTCCG TGCCCCTGGC 
851 ACCTGCCAGC ATCTCCCCAA 
901 ACAAGGAGAA AGCGGTCACT 
951 GGCGAAGAAG AGGAGAAGTG 
1001 TGAAAATGAC TTAGAAGAGA 
1051 CCTTGGAAAA CACAGTGACA 
1101 CCTAGCTGGG GGGGCTATCC 
1151 CATGATGAAG TTGTCCCTGG 
1201 CCATGTTTGG CAACAGTGAG 
1251 GTCTCTCCAC CCAGCAGCCA 
1301 TGCCATGGAG GAGCTGGTGA 
1351 AGGAGAAGAT GAAGGAGCCG 
14 01 ACTCCCTCCC CATGTAGCAG 
1451 ATCCAGCGAT GGGGGCTTCC 
1501 GGGATGGGTG CAAGGATGGG 
1551 AAGGAGCTGG TGAAGCCCCT 
1601 CATCACCGAC CACCCGCCTG 
1651 TGCAGTCAGT CATGAACATT 
1701 CCTGCCCTGG ACCCCATGAG 
1751 GGAGAAGGCT GCTGTGGCCA 
1801 ACCACCTCGA CCGCTATTTC 
18 51 TTGACAAAAG GGAAGAGTGA 
1901 GTCACCCACG TCCACAGCCC 
1951 CAAAGACATC TGCCGTCGTA 
2001 AATGCCTTGT CAGATATATC 
2051 CACGTCAAAA TCCTCCACTC 
2101 ACGGGGCCAC TCTGGAGGAG 
2151 AAGGGCCGCC AGTCAAACTG 
2201 CCAGTTTGCC GCCAGCCTCC 
22 51 CAGACCTGAG CCCCCAGGAG 
2301 TCCATGACCA CCATCAGCCA 
2351 AAGGACAGGT GGAACAAAGT 
24 01 TCTTCTTTTG TAACGATTGT 
2451 ATCAGTCACC TAGAGTCACA 
2501 ACTGTCCACC GAACAGATTA 
2551 CAGAAAAAAT GGTGACGTCC 
2601 CAGTGCAAAC TTTGCAATCG 



AGCCGCATGC TCCCGGAGCC CAGCCTCTTC 
GCAGAGCAGC AAGCTCTATG GCTCCATCTT 
GCTGTAAGGA CTGCAGCGCT GCCTACGACA 
CACATGAACG AGACGGGGCA TTACCGCGAC 
CAACAACCCC AAGCGCTGGT CCAAGCCTCG 
TGGAAGGGAA GGAAGACGCC CAGAAGGTGC 
CACTCCTTTG AGTCCCTGCA GGATTTGAGT 
ACACTACCAA AAAGTGCCTC TGAAGGAACC 
AAATCATCCC TGCCACTCGG AAGAAAGCTT 
AGCTCCCCAG ATTCCACAGG TGGAACCCCC 
CAACGATGCA CTTCAGAAGA ACTCCAACCC 
GGTACGGCCA CCAGAATGGG GCCAGCTATG 
AAGTCGCAGA TCCTGAAGTG CATGGAGTGT 
GCAGGAGCTC ACTGCCCACA TGATGGTCAC 
CCAACTCTGC TATGAAAAAG GGGAAGCCCA 
CCTACCATCA CAACCCTGCT GGATGAGAAG 
AGCCACCACC TTCACGTCCC CCTCCAATAC 
AACTGAATGT GGAGGTCAAG AAGGAAGTCG 
GACGAGAAAC CTAAGCAAAA AGACAAGCCT 
TGACATCTCT TCCAAATACC ATTACTTGAC 
GTCCCAAGGG GGGGCTTGAT ATCCTCAAAT 
TCCGCAATCA ACAAGGCCCA GAACGGCACT 
CAGCATCCAT GCCGCCTACC AACTTCCCAA 
GCTCGTCGGG GAAGAGCACG CCCCTGAAAC 
ATTGTCTCCC CGACGAAAAA CCAGACCCTG 
GACGTCCCCC ATGCCCAAGA CAAACTTTCA 
AAAAGGTCAC TGAGAAAGTT GCCAAAGTGG 
GATGGGAAGC TTTCCCCGCC CAAGCGGGCC 
CGAAGTCGGG GAACCCATCA AGATGGAGGC 
GCAGCCAGGA GAACAGCCCC AGCCCCCCGC 
AGCCCCCTCG CTGAGCCGGT GGAGAATGGC 
AGCCAGCAGT TTGAGTGGCA GCACGGCCAT 
AACAGCCTTT TGTTAACCCT TTGAGCGCCC 
CACCTGGGCA AGGCCGCCAA GCCCTCCCTG 
CATGCTTTTC AAGATGAGCA ACAGCCTGGC 
CCCCGCCGCC CCTGCAGTCC AAGAAGGCAG 
TACCACGTCA ACAACGACCA GCCCATAGAC 
CAAAGGCTGC TCCTTGGGTT CAGTGCTTCT 
CGGCAACCTC CTCATCCACG GTGACAACGG 
TCATTCATGT CAAACTCGCC GCTACGCGAG 
CGATATGCTG AAGAACTTGA CAGAGAGCCA 
CTTCCAGCAT CTCCGAGAAG TCTGACATTG 
GCTGAGGAGT CGACGCCCGC CCAGAAGAGG 
GAACCCCCAG CACCTCCTGA TCCTCCAGGC 
GGCAGACCTC AGAAGGGAAG TACATCATGT 
CGGATGCATA TCTCCAGGTT CACCGGGCTG 
CTGGCTGGCC AACGTGAAAT ACCAGCTTCG 
TCCTCAAAAA CTTGGACACT GGCCACCCCG 
GCGTCCCAAA TCAGGACTCC TTCCACGTAC 
CTTAGGCTTC CGGCTACGGG ACTTATCCAA 
ACAGTCAGAT AGCACAAACC AAGTCACCGT 
TCCCCCGAGG AAGACCTGGG GACTTCCTAT 
GACCTTTGCC AGCAAGCACG CTGTTAAACT 
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2651 TCACCTTAGC AAAACACACG 
2701 TCTCTGAGTT AGAGAAGCAG 
27 51 GTTTGCTTTG AGGGAAACTG 
2801 GTTGTTCTTG GCACATGTTC 
2851 TGGACTGTTT TGTATAACTG 
2901 TGTTGTTACT GGTAAAATAT 
2951 AACTTTGTGT AAACGGGATT 
3001 CTGCATGCAT TAACAGACAG 
3051 ACACCTTTTC CACGAGACTC 
3101 TTTAGCCCTC TGAGTACTTT 
3151 ATAAAATAAA ATAATAATAA 
3201 CCAGCTTAGT TATAATGAAT 
32 51 GGGGTATAAC ACGCCTTGAA 
3301 CATAGATGTA TATATTGTAT 
3351 TGATTGTGGT TAAATGACCT 
34 01 TGCACCTGCT ATGCTCTGGG 
3451 TCTTTTTTTC TTTTTTTTAA 
3501 CATTGTAAAT TATACAGAAG 
3551 GAAACATTAT CTGAAAGCAA 
3601 ATCTATATTG ATAGAGGTTC 
3651 ATATTGTCAT TTGTTTTGAG 
3701 TCCCTGGCAG GCATCAGAAC 
3751 TTAGAAATCA AAGAACACTC 
3801 CTATTTGAAA AGGTTAAAAA 
3851 TGTATTTCCT AAACATTGAT 
3901 TTTGCTTAAA AGTCATGAGG 
3951 TATAAGCCCT CTTGGTTGCT 
4001 AATTGGTAAC TTTCTGTTTT 
40 51 CACTGCAGCT TTATCTTTAG 
4101 TCTCCAAGTG ATTCTGTTCT 
4151 TAACTGACAG CTGACACCAT 
4201 TAAACAGCAC AGACACCGTA 
4251 AATGAAGCAC CATTATGTGA 
4301 CATTAAAATT GTCTTTTTGC 
4351 AAAAAAAAAA AAAAAAAAAA 
4401 AAAAAAA 



GGAAATCTCC GGAAGACCAC CTTCTGTATG 
TAGCATTTGC TTTTGATAGA AAGGACTGCA 
TGGAAGGCAC CTTCAGGCCC CCTCTGACTT 
TTATTTTAAC TGCAGAGAAT CACTCTGGGC 
TACAGTGTTT AATAGAGGTG CATAATCAGC 
GAAGGTTAAA ATGCAGTGGT AAGTGTTTGG 
TAGTTGTGAG CATCCTCCCG ATGCTTCAAG 
TTTAATTAAG CATTTATAAC GGAATCAGGC 
GAGTGTGCTG GCATTTCTCA CCCTTTCATC 
GAAGCACTTT TGCATTAATT TGGTTAAAAA 
TGTATGAAGC TCTGTTTTTT AAACTCCTTA 
AATATGAACC TCCATTTATG CAGGTCTGCA 
ATTTAAAAGA ATATTATTTT CACATTGAAA 
AGATTTCAGA CTCTCTTATG AAAAAAAATG 
TTTTCTTGCA TTTATAGCAA CAGTGTTTTA 
CATAAGCTGT GCCTATGTAT AGTGTATATT 
GGTCTATGGG TTTTGTTTTT TACATGCAAA 
ATACCACAGA TAGCATTTAT AAAGTATACA 
AGTATGATAG TTTGTTTTGC TATACAGTAC 
ATGTTTAAAT TATACATATT TATTAGCATC 
CAGTCTGAAT AAACGAGACC GGGAAAGACA 
TATTTTGCAC ATGATTTTTA AAGGTATTTA 
AAAATAAACT CAGTGCTCAA AGGGTTAAGT 
AAAGAACAAA AAAAAAAAAA GAACTTGTAC 
AAAGCCTTTA AAATGTTTGT ACTGTAATAC 
CATTCTGTGA TCCAACCTCT TTCACTTATT 
ATTCCATATT GTAGGATGCC TTTCTATTTC 
GTTCTTCCTA ATTATTCTCC CAAGATCCCA 
GCTTATGAAA GGTAACCCGT GGTTACCGGC 
TCTCCATTTT TGGCAGTTAA TTTGCAGAAG 
ATGAGAACCT TTGTATAAAA TATTGGCATG 
ACACACTCTG TGCCCTGTTT GGTTGTTGAC 
CTCTTCATAT AACCCTTTTT TCTACGGCAG 
TAT AAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 27 bp to 2720 bp; peptide length: 898 
Category: strong similarity to known protein 



1 MLPEPSLFST VQLYRQSSKL YGSIFTGASK FRCKDCSAAY DTLVELTVHM 
51 NETGHYRDDN HETDNNNPKR WSKPRKRSLL EMEGKEDAOK VLKCMYCGHS 
101 FESLQDLSVH MIKTKHYQKV PLKEPVTPVA AKI I PATRKK ASLELELPSS 
151 PDSTGGTPKA TISDTNDALQ KNSNPYITPN NRYGHQNGAS YAWHFEARKS 
201 QILKCMECGS SHDTLQELTA HMMVTGHFIK VTNSAMKKGK PIVETPVTPT 
251 ITTLLDEKVQ SVPLAATTFT SPSNTPASIS PKLNVEVKKE VDKEKAVTDE 
301 KPKQKDKPGE EEEKCDISSK YHYLTENDLE ESPKGGLDIL KSLENTVTSA 
351 INKAQNGTPS WGGYPSIHAA YQLPNMMKLS LGSSGKSTPL KPMFGNSEIV 
401 SPTKNQTLVS PPSSQTSPMP KTNFHAMEEL VKKVTEKVAK VEEKMKEPDG 
451 KLSPPKRATP SPCSSEVGEP IKMEASSDGG FRSQENSPSP PRDGCKDGSP 
501 LAEPVENGKE LVKPLASSLS GSTAIITDHP PEQPFVNPLS ALQSVMNIHL 
551 GKAAKPSLPA LDPMSMLFKM SNSLAEKAAV ATPPPLQSKK ADHLDRYFYH 
601 VNNDQPIDLT KGKSDKGCSL GSVLLSPTST APATSSSTVT TAKTSAVVSF 
651 MSNSPLRENA LSDISDMLKN LTESHTSKSS TPSSISEKSD IDGATLEEAE 
701 ESTPAQKRKG RQSNWNPQHL LILOAQFAAS LRQTSEGKYI MSDLSPQERM 
751 H1SRFTGLSM TTISHWLANV KYQLRRTGGT KFLKNLDTGH PVFFCNDCAS 
801 QIRTPSTYIS HLESHLGFRL RDLSKLSTEQ INSQIAQTKS PSEKMVTSSP 
851 EEDLGTSYQC KLCNRTFASK HAVKLHLSKT HGKSPEDHLL YVSELEKQ 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_21jl5, frame 3 

TREMBL:AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds., N = 1, Score = 
1039, P = S.Se-105 

PIR:A38437 probable homeotic protein tsh - fruit fly (Drosophila 
melanogaster) , N - 3, Score - 158, P = 7.2e-09 

TREMBL:CE33058_1 gene: "unc-89"; product: "UNC-89"; Caenorhabditis 
elegans UNC-89 (unc-89) gene, complete cds., N = 2, Score = 175, P = 
3.3e-07 



>TREMBL:AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds. 
Length = 687 

HSPs: 



Score - 1039 (155.9 bits), Expect = 5.5e-105, P = 5.5e-105 
Identities = 244/504 (48%), Positives = 319/504 (63%) 



Qu e c y * 


170 


n^M<5WPVTTPNINRYr^Hi^Wf^A c W&WHFFARKC.riTT KTMFPfi^^ HFtTT.OFT TftrlMMVTflHFT 


229 










Sbjct: 


14 


QKAANPYVTPNNRYGYQNGASYTWQFEARKAQILKCMECGSSHDTLQQLTAHMMVTGHFL 


73 


Query : 


9 in 

Z JU 


WTMQAMtf UfcK PT UPTPUTPTTTTT T nFTfVnCA/PT A 71 TT VT ^ — P ^MT PA^T^PfcfTiW 


284 






KVT SA KKGK -t-V PV ++EK+QS+PL TT T P+++ P S + 




Sbjct: 


74 


KVTTSASKKGKQLVLDPV VEEKIQSIPLPPTTHTRLPASSIKKQPDSPAGSTT 


126 


Query: 


285 




343 






E KKE +KEK V + K K++ + EK + S+ Y YL E DL++SPKGGLDILKSL 




Sbjct: 


127 


SEEKKEPEKEKPPVAGDAEKIKEESEDSLEKFEPSTLYPYLREEDLDDSPKGGLDILKSL 


186 


Query: 


344 


RHTUT^ a TNK AfiNfiT P^WC^ YPS T H AA YOI.PNMMKI.SLGSSGKSTPLKPMF-GNS.EI VSP 


402 






ENTV++AI+KAQNG PSWGGYPSIHAAYQLP +K L ++ +S ++P + G + +S 




Sbjct : 


187 


ENTVSTAISKAQNGAPSWGGYPSIHAAYQLPGTVK-PLPAAVQSVQVQPSYAGGVKSLSS 


245 


Query: 


403 


TKNQTLVSPPSSQTSPMPKTNFHAMEELVKKVTF.KV-AKVEEKMKEPDGKLSPPKRATPS 


461 




++ L+ P S T P K+N AMEELV+KVT KV K EE+ E + K S K A S 




Sbjct: 


246 


AEHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKE-KSSLAKAA--S 


302 


Query: 


4 62 


PCSSEVG2PIKMEASSDGGFRS0ENSPSPPRDGCKDGSPLAEPVENGKELVKPLA33LSG 


521 






P + E + K E S + Q+ P K PL NG E +K ++ 




Sbjct: 


303 


PIAKENKDFPKTEEVSG KPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCN 


359 


Query: 


522 


STAI ITDHPPEQPFVNPLSALQS VMNI HLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVA 


581 




+ TI DH PE F+NPLSALQS+MN HLGK +KP P+LDP++ML+K+SNS+ +K 




Sbjct: 


360 


MT m TMnUCDCDCCTWDT CAT CiC TMMTUI flfl/CKDUCDCT nPT AMI VVTCUQWIT V>W\7VT> 


H l y 


Query: 


582 


TPPPLQSKKADHLDRYFYHVNNDQPIDLTKGKSDK-GCSLGSVLLSPTSTAPATSSSTVT 


640 






P K+AD +DRY+Y N+DQPIDLTK K+ S+ + SP + S + 




Sbjct: 


420 


ATPV KQADAIDRYYYE-NSDQPIDLTKSKNKPLVSSVADSVASPLRESALMDISDMV 


475 


Query: 


641 


TAKTSAVVS FMSN-S PLRENALS DI SDMLKNLTE 673 








T+ SS+E++DS +LE 




Sbjct: 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDE 509 




Score 


= 8 65 


(129.8 bits), Expect = 7 . 4e-95, P = 7.4e-95 




Identities ■ 


= 211/434 (48%), Positives = 268/434 (61%) 




Query : 


447 


EPDGKLSPPKRATFSPCSSEVG--EPIKMEASSDGGFRSQENSPSPPRDG-CKDGSPLAE 


503 






E+LP TPPSV E+++ ++EP + K SP+A+ 




Sbjct: 


247 


EHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKEKSSLAKAASPIAK 


306 


Query : 


504 


P-VE— NGKELVK-PLASSLSGSTAIITD-HPPE— QPFVNPLSALQSVMNIHLG 


551 




P E +GK KPA+ DHP +P ++ ++I+ 




Sbjct: 


307 


ENKDFPKTEEVSGKPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCNNLGI IMD 


366 


Query: 


552 


KAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYHVNN DQPID 


608 






+ +PS ++P+S L+N+ K+ PL DL Y ++N D+P+ 




Sbjct : 


367 


HSPEPSF — INPLSALQSIMNTHLGKVSKPVSPSL DPL-AMLYKISNSMLDKPV- 


417 


Query : 


609 


LTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENALSDISDML 


668 




K S P + + S+V ++ SPLRE+AL DISDM+ 




Sbjct: 


418 


-YPATPVKQADAIDRYYYENSDQPIDLTKSKNKPLVSSVADSVA-SPLRESALMDISDMV 


475 


Query: 


669 


KNLTESHTSKSSTPSSISEKSDIDGATLEEA-EESTPAQKRKGRQSNWNPQHLLILQAQF 


727 






KNLT T KSSTPS++SEKSD DG++ EEA +E +P KRKGRQSNWNPQHLLI LQAQF 
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Sbjct : 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDELSPVHKRKGRQSNWNPQHLLILQAQF 


535 


Query : 


728 


AASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 


787 






A+SLR+T+EGKYIMSDL PQER+HIS+FTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 




Sbjct : 


536 


ASSLRETTEGKYIMSDLGPQERVHISKFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 


595 


Query : 


788 


TGHPVFFCNDCASQIRTPSTYISHLESHLGFFLLRDLSKLSTEQINSQIAQTKSPSEKMV- 


846 






TGHPVFFCNDCASQ RT STYISHLE+HLGF L+DLSKL QI Q +K + K + 




Sbjct: 


596 


TGHPVFFCNDCASQFRTASTYISHLETHLGFSLKDLSKLPLNQIQEQQNVSKVLTNKTLG 


655 


Query : 


847 


-TSSPEEDLGTSYQCKLCNRTFASK 870 








+ EEDLG+++QCKLCNRTFA + 




Sbjct: 


656 


PLGATEEDLGSTFQCKLCNRTFAKQ 680 




Score 


= 98 


(14.7 bits), Expect = 7.4e-95, P = 7.4e-95 




Identities : 


= 32/95 (33%), Positives = 47/95 (49%) 




Query : 


90 


KVLKCMYCGHSFESLQDLSVHMIKTKHYQKVPL KEPVT- PVAAKI I PATRKKAS 


142 






++LKCM CG S ++LQ L+ HM+ T H+ KV K+ V PV + I + + 




Sbjct: 


45 


QILKCMECGSSHDTLQQLTAHMMVTGHFLKVTTSASKKGKQLVLDPVVEEKIQSIPLPPT 


104 


Query: 


143 


LELELPSS PDSTGGTPKATISDTNDALQKNSNP 175 








LP+S PDS G+ T S+ +K P 




Sbjct: 


105 


THTRLPASSIKKQPDSPAGS TTSEEKKEPEKEKPP 139 




Score 


= 81 


(12.2 bits), Expect = 4.6e-93, P = 4.6e-93 




Identities • 


= 13/29 (44%), Positives = 20/29 (68%) 




Query : 


28 


ASKFRCKDCSAAYDTLVELTVHMNETGHY 56 








A +C +C +++DTL +LT HM TGH+ 





Sbjct: 44 AQILKCMECGSSHDTLQQLTAHMMVTGHF 72 



Pedant information for DKFZphtes3_21 j 15, fia:ne 3 



Report for DKFZphtes3_21 j 15 . 3 



[LENGTH] 


898 




[MW] 


98486.72 




[pi] 


8.61 




[HOMOL] 


TREMBT, : AF039698 1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; 


■ Homo sapiens 


antigen NY- 


•CO-33 (NY-CO-33) raRNA, complete cds. 0.0 




[ BLOCKS] 


BL00028 Zinc finger, C2H2 type, domain proteins 




[PIRKW] 


zinc finger le-06 




[PIRKW] 


DNA binding le-06 




[PIRKW] 


transcription regulation le-06 




[PROSITE] 


MYRISTYL 9 




[PROSITE] 


ZINC FINGER C2H2 4 




[PROSITE] 


CAMP PHOSPHO SITE 5 




[PROSITE] 


CK2 PHOSPHO SITE 19 




[PROSITE] 


TYR PHOSPHO SITE 2 




[PROSITE] 


PKC PHOSPHO SITE 15 




[ PROSITE] 


ASN_GLYCOSYLATION 4 




[PFAM] 


Zinc finger, C2H2 type 




[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 11.36 % 





SEQ MLPEPSLFSTVQLYRQSSKLYGSI FTGASKFRCKDCSAAYDTLVELTVHMNETGHYRDDN 

SEG 

PRD ccccceeeeeeeeccccceeeeeeeccccceeecccchhhhhhhhhhhcccccccccccc 

SEQ HETDNNNPKRWSKPRKRSLLEMEGKEDAQKVLKCMYCGHSFESLQDLSVHMIKTKHYQKV 

SEG 

PRD cccccccccccccccchhhhhhhccchhhhhhhhhcccccchhhhheeeeeeeecceeee 

SEQ PLKEPVT PVAAKI I PATRKKAS LELELPSS PDSTGGTPKATISDTNDALQKNSNP YITPN 

SEG xxxxxxxxxx 

PRD eccccccceeeeeeehhhhhhhhhhcccccccccccccceeeeccchhhhhccccccccc 

SEQ NRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFIKVTNSAMKKGK 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhceeeeeccccccccc 

SEQ PIVETPVTPTITTLLDEKVQSVPLAATTFTSPSNTPASISPKLNVEVKKEVDKEKAVTDE 

SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhccccccccccccccccccccccccccccccccchhhhhhcc 

SEQ KPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSLENTVTSAINKAQNGTPS 



712 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



SEG X 

PRD ccccccccccccccchhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhcccccc 

SEQ WGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMFGNSEIVSPTKNQTLVSPPSSQTSPMP 

SEG 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KTNFHAMEELVKKVTEKVAKVEEKMKEPDGKLSPPKRATPSPCSSEVGEPIKMEASSDGG 

SEG XXXXXXXXXXXXXXXXXXXX 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeecccc 

SEQ FRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSGSTAIITDHPPEQPFVNPLS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccc 

SEQ ALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYH 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccceeee 

SEQ VNNDQPIDLTKGKSDKGCSLGSVLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENA 

SEG xxxxxxxxxxxxxxxxxxxxxxxx 

PRD ecccccceeecccccccccccceeecccccccccccceeeeceeeeeeeeccccccchhh 

SEQ LSDISDMLKNLTESHTSKSSTPSSISEKSDIDGATLEEAEESTPAQKRKGRQSNWNPQHL 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccccccccccceeecccccchhhhhhhhccchhhhhhcccccccchh 

SEQ LILQAQFAASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGT 

SEG 

PRD hhhhhhhhhhhhhccccceeecccccchhhhhhhhccccchhhhhhhhhhhhhhhhcccc 

SEQ KFLKNLDTGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKS 

SEG 

PRD ceeecccccccceeecccceeeecccchhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcc 

SEQ PSEKMVTSSPEEDLGTSYQCKLCNRTFASKHAVKLHLSKTHGKSPEDHLLYVSELEKQ 

SEG 

PRD ccceeeeccccccccceeehhhhhhhhhhhhhhhhhccccccccccceeeeeeecccc 



Prosite for DKFZphtes3_21jl5.3 



PS00001 


51 


.->55 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


405- 


•>409 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


670- 


•>674 


ASN" 


GLYCOSYLATION 


PDOC00001 


PS00001 


864- 


•>868 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00004 


6S 


l->73 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


75 


,->79 


CAMP PHOSPHO SITE 


PDOC0 0C0 4 


PS00004 


139- 


•>143 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


432- 


■>436 


CAMP PHOSPHO SITE 


roocooco4 


PS00004 


456- 


•>460 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


11 


->20 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


137- 


■>140 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


157- 


■>160 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


280- 


■>283 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


318- 


•>321 


PKC" 


"PHOSPHO SITE 


PDOC0000 5 


PS00005 


332- 


>335 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


384- 


■>387 


PKC 


'PHOSPHO SITE 


PDOC00005 


PS00005 


435- 


>438 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


588- 


>591 


PKC 


'PHOSPHO SITE 


PDOC00005 


PS00005 


614- 


>617 


PKC 


'PHOSPHO SITE 


PDOC00005 


PS00005 


641- 


>644 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


676- 


■>679 


PKC 


'PHOSPHO SITE 


PDOC00005 


PS00005 


686- 


•>689 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


730- 


•>733 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS0Q005 


842- 


>845 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00006 


42 


:->46 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


76 


:->82 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


103- 


•>107 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


149- 


>153 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


161- 


>165 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


210- 


>214 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


214- 


■>218 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


253- 


>257 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


325- 


>329 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


573- 


>577 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


684- 


>688 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


689- 


•>693 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


695- 


>699 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


745- 


>749 


CK2 


'PHOSPHO SITE 


PDOC00006 
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PS00006 


810- 


>814 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


840- 


>844 


CK2~ PHOSPHO" 


SITE 


PDOC00006 


PS00006 


848- 


>852 


CK2 _ PHOSPHO~ 


SITE 


PDOC00006 


PS00006 


884- 


•>888 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


893- 


>897 


CK2 _ PHOSPHO" 


SITE 


PDOC00006 


PS00007 


732- 


•>740 


TYR PHOSPHO" 


SITE 


PDOC00007 


PS00007 


883- 


>892 


TYR _ PHOSPHO" 


"site 


PDOC00007 


PS00008 


22 


:->28 


myrYstyl 




PDOC00008 


PS00008 


156- 


>162 


MYRISTYL 




PDOC00008 


PS00008 


188- 


>194 


MYRISTYL 




PDOC00008 


PS00008 


362- 


>368 


MYRISTYL 




PDOC00008 


PS00008 


479- 


•>485 


MYRISTYL 




PDOC00008 


PS00008 


494- 


>500 


MYRISTYL 




PDOC00008 


PS00008 


498- 


■>504 


MYRISTYL 




PDOC00008 


PS00008 


617- 


•>623 


MYRISTYL 




PDOC00008 


PS00008 


757- 


•>763 


MYRISTYL 




PDOC00008 


PS00028 


795- 


>816 


ZING FINGER 


C2H2 


PDOC00028 


PS00028 


860- 


•>882 


ZINC FINGER 


C2H2 


PDOC0002B 


PS00028 


33->56 


ZINC FINGER" 


"C2H2 


PDOC00028 


PS00028 


94- 


>117 


ZINC FINGER" 


"C2H2 


PDOC00028 



Pfam for DKFZphtes3_21 j 15 . 3 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C++ C ++ + +L+ HM+ H 
Query 33 CKD--CSAAYDTLVELTVHMNET-GH 



55 



26.69 (bits) f: 94 t: 116 Target: dkf zphtes3_21 j 1 5 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. .T.H* 

C + CG +F + +L HM+ H 

dkfzphtes3 94 cmy--cghsfeslqdlsvhmikt-kh 116 

Query f: 795 t: 815 Target: dkfzphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 
C++ C R++S+++ H+ +H 
Query 795 CND — CASQIRTPSTYISHLESH 815 

27.12 (bits) f: 860 t: 881 Target: dkf zphtes3_21 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. T . H* 

C+ C++TF +++ + H+ H 

dkfzphtes3 860 CKL— CNRTFASKHAVKLHLSK-TH 881 
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group: intracellular transport and trafficking 

DKFZphtes3_21116 encodes a novel 66 amino acid protein nearly identical to rat ribosome 
attached membrane protein 4 (ramp4). 

The novel protein seems to be the human orthologe of rat ramp 4 . Ramp4 is involved in the 
regulation of translocation of proteins into endoplasmic reticulum, e.g. of the MHC class 
associated invariant (gamma) chain. 

The new protein can find application in modulation of protein translocation into the 
endoplasmic reticulum. 



identical to rat ribo3ome attached membrane protein 4 

ORF Bp 316-513 (66 aa) see BLASTX 

Sequenced by LMU 

Locus: unknown 

Insert length: 2488 bp 

Poly A stretch at pos. 2464, polyadenylation signal at pos . 2442 



1 CTTCCTCTTT CACTCCGCGC 
51 CGGCGCGAGA ACGACCCGGC 
101 CCGCTCGGTC AGTCAGTCGG 
151 GCGCTTGCGG CGCCCAGGCC 
201 ACCTCGGCGC TCCGGCGGCG 
251 TCCAGAGGAG GCAGGCGAGT 
301 GGTGGCGCCG CGAAGATGGT 
351 GAAGCACAGC AAGAACATCA 
401 GAAATGCCCC CGAAGAGAAG 
451 TTCATTTTTG TTGTCTGTGG 
501 CAGGATGGGC ATGTGAAGTG 
551 GAATTTTAAC TTGAACTCAT 
601 ATTCAGTAAA GCATCCTGCC 
651 GTCATTCCAA GGTTTCTTCA 
701 ACAGTGCCTT GCAAAAAACA 
751 TTAAGATACA GTAGTGGACC 
801 TTTTATGTGG TTATTAAAAC 
851 ACAGGGTCTA GATTTTGTTA 
901 TTACAATTTG AAGTCTTGTG 
951 TTTTGAACTG AAAGCACACT 
1001 TAAGGTGCTT ATAAATGGAA 
1051 TTAGCATCTA AAAAGTTTTA 
1101 ATGCTTATAG CCACAACATC 
1151 CCTTGGATTT TGCATGAGTG 
1201 AACTTGATCG TTTTCTGACT 
1251 ACCGTGGTGG AGTGAAGTCA 
1301 TTTCACCAGA ACTATTTTAA 
1351 AATTCTAGGG AAAAATACTG 
1401 GTTGAGTCCA ATGTGCCATA 
1451 AATAGCAAAA AAAGGCACAT 
1501 GCTTTTTCTA GATTAATGAT 
1551 GCCTAAAGTG GCATCTGGAA 
1601 TTAGTCTTCC CTTTGTTATA 
1651 ACGTTTTACT AATGGTAAGG 
1701 CTAGTACTGT TGAAAACTGC 
1751 ACTTGGTGAA AAAAAACCTG 
1801 GAAAGCTGCT TGTGTTTGCT 
1851 AATAAGCTGT TTTAAGAGGA 
1901 CACAGCGTGA ACCTCACAGG 
1951 AGTAAGGGAG CAGAGTGGTT 
2 001 ATAAGGAATG AATCAACTGA 
2051 TTACTTGCCT TTCTCACCCA 
2101 TTGAAACAAG TGTCTTGGTT 
2151 TCATAGCAGG TGCCTTATTC 
2201 AATTTTCCTT GGTTTACTAT 
2251 TTTTTAATGT ACAATGTTCT 
2301 AGCAATCATT TTACATATGT 
2351 GTAATTCACC AATTAAGTGC 
2401 TAGGTAAACG AAAGCTGTGT 
2451 TCCCTGAATA TTTGAAAAAA 



TCACGGCGGC GGCCAAAGCG GCGGCGACGG 
GGCCAGTTCT CTTCCTCCTG CGCACCTGCC 
CGGCCGGCGC CCGGCTTGTG CTCAGACCTC 
CAGCGGCCGT AGCTAGCGTC TGGCCTGAGA 
CGGGCACCAC GAGCCGAGCC TCGCAGCGGC 
GAGCGAGTCC GAGGGGTGGC CGGGGCAGGT 
CGCCAAGCAA AGGATCCGTA TGGCCAACGA 
CCCAGCGCGG CAACGTCGCC AAGACCTCGA 
GCGTCTGTAG GACCCTGGTT ATTGGCTCTC 
TTCTGCAATT TTCCAGATTA TTCAAAGTAT 
ACTGACCTTA AGATGTTTCC ATTCTCCTGT 
TCCTGATGTT TGATACCCTG GTTGAAAACA 
TCAGAATGAC TTTCCTATCA TGCTTCATGT 
TGAGTCATTC CAAGTTTTCT AGTCCATACC 
CCACATGAAT AAAGCAATAA AATTTGATTG 
CTACTTATTC AGTCAATTAA GAGTAAGTTT 
AGTATGAACA ATTAGTCTAA CTCTGCATAG 
ACCCAAATGT ATAACTGCAG TTAGCTTAAA 
GTTTTTATAT AGCTAGGCAC TTTATTACTC 
CCCTTATAGG TTCATGTAAC TGTCCTGTAA 
CAACTACACA GCCTAGTTTT GCCACAACCT 
AAAGCTTCTA AATGTCTAAT ATAAAGGGAG 
TATTTTACCA ATATTGTTTC CATTACACTA 
AGTATAGTAA CCCAAGATGC CATAAAAAAA 
TAATTAGTTA CTGTGGTTTC ACTAAAAGCT 
GTCAGGGAAG GTTTGTTTAT GTTACATTTA 
TATATCAAAG GGGTTTACTA TGCCAAACAA 
CTAAAAATGG ATGCCTCATC AGAACATGCT 
AGACATTTTA GCATGTTAAA TAGCACTTTT 
CAACTGCGAA GTTATCCTTA GTTTGCAAAT 
TTTTCAATCA TTAGGGTACT AGACACATCA 
TTGAATGGAT TTACTGATAA TGATCAGTCT 
TGACTTTATA GGTTATGATT GATCAAATTT 
GTGAGGGTCA TAGGGCAGGT TTTGGGTTTT 
AAGTATTGGC TATTTGTATA CTTAGCCATA 
AGCAGTGTCT ATGTATTAAT GCGTTGGAAA 
TTGTTAATTG CCTCAGGATA TTTCTTTTAA 
ACAGAAGGGA AATCTGCTAC CTAGTCTATA 
GGGCTTCTGA TACCCTCAAA CATGGAGAAC 
AAGGACTTTC AGGAACTTAA CTATTCTGGA 
CCTTGGGCCA GCAGGTTTTT AACTAAATTG 
GTTAATCAGT CTCTGTACTT GTTTCCCTTT 
AACTAATTCT GTTTTATGGT TGTGCTAAAT 
TTTGCTTTTA GTCAAACCAT TCCATATCAG 
AGATATTTGG CTTTAAGTTG TTGTTTGTGT 
GATAAATTTG ACTGTTAAAT TGCTATAGCT 
AAAAAATTGC ATTCCCTTTG TATTTCATGT 
AGTTTATATT CAGGTTGGAT TATGCATGTT 
CTTACTTGAT TTATTCTTTA AAAATAAAGT 
AAAAAAAAAA AAAAAAAA 
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BLAST Results 



Entry HSCDN13 from database EMBL: 

H. sapiens (TL5) mRNA from LNCaP cell line 

Score = 1075, P = 5.8e-41, identities = 219/221 

Entry AF100470_1 from database TREMBLNEW: 

gene: "RAMP 4 " ; product: "ribosome attached membrane protein 4"; Rattus 
norvegicus ribosome attached membrane protein 4 (RAMP4) mRNA, complete 
cds . 

Score = 331, P = 3.9e-28, identities = 66/66, positives = 66/66, frame 
+ 1 

Entry HSG19910 from database EMBL: 
human STS A002B4 8. 
Score = 530, P = 2.1e-17, identities = 108/109 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 316 bp to 513 bp; peptide length: 66 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 



1 MVAKQRIRMA NEKHSKNITQ RGNVAKTSRN APEEKASVGP WLLALFI FVV 
51 CGSAIFQIIQ SIRMGM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21116, frame 1 

TREMBLNEW : RN0238236_1 gene: "ramp4"; product: "ribosome associated 
membrane protein RAMP 4 " ; Rattus norvegicus mRNA for ribosome 
associated membrane protein RAMP4, N = 1, Score = 331, P = 6.2e-30 

TREMBL:AF100470_1 gene: "RAMP4"; product: "ribosome attached membrane 
protein 4"; Rattus norvegicus ribosome attached membrane protein 4 
(RAMP4) mRNA, complete cds., N = 1, Score = 331, P = 6.2e-30 



>TREMBLNEW:RN0238236_1 gene: "ramp4"; product: "ribosome associated membrane 
protein RAMP4"; Rattus norvegicus mRNA for ribosome associated membrane 
protein RAMP4 

Length = 75 

HSPs: 

Score = 331 (49.7 bits), Expect = 6.2e-30, P = 6.2e-30 
Identities = 66/66 (100%), Positives = 66/66 (100%) 

Query: 1 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 60 

MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 
Sbjct: 10 MVAKQRIRMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 69 

Query: 61 SIRMGM 66 

SIRMGM 
Sbjct: 70 SIRMGM 75 

No Pedant data available 
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DKFZphtes3_21n23 



group: testes derived 

DKFZphtes3_15j 18 encodes a novel 148 amino acid protein with strong similarity to rat 7acomp 
protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic . 
genes . 



strong similarity to rat 7acomp protein 
on genomic level encoded by AF107885 
Sequenced by LMU 
Locus: /map="14q24 . 3" 
Insert length: 3122 bp 

Poly A stretch at pos . 3070, polyadenylation signal at pos . 3045 



1 GGAA7AACCTC GTGGGCTCAG CCCGGGAGAA AGGGCCAGGG AAGTTGGGTG 
51 GTTCTGTGCT TGGTCTGTCA ATGGAGGAGA TCAAAGTTTT ACGAAGGGTG 
101 AAGGAGGAGA ATGATCGGCG AGGTGGATTT ATTCGCATAT TTCCTACATC 
151 TGAGACATGG GAAATATATG GGTCCTACCT CGAGCATAAG ACCTCAATGA 
201 ACTATATGCT GGCAACACGC CTCTTCCAGG ACAGGGGAAA CCCAAGAAGA 
251 AGCTTATTGA CAGGAAGAAC ACGAATGACT GCTGATGGAG CGCCAGAATT 
301 GAAGATAGAG AGTCTGAATT CAAAGGCCAA GCTGCATGCT GCACTTTACG 
351 AGAGGAAGCT CCTGTCTCTG GAGGTGCGAA AACGTAGACG ACGGAGTAGC 
401 AGATTGAGGG CAATGAGGCC AAAATACCCA GTGATTACCC AACCAGCTGA 
451 AATGAATGTT AAA AC TG AG A CAGAGAGTGA AGAGGAGGAA GAAGTCGCAT 
501 TAGATAATGA AGATGAAGAA CAGGAGGCTT CCCAGGAGGA GTCTGCAGGA 
551 TTTCTTAGAG AAAATCAAGC CAAATATACA CCCTCATTGA CAGCTTTGGT 
601 AGAAAATACA CCCAAAGAAA ATTCCATGAA AGTTCGTGAA TGGAATAATA 
651 AAGGTGGACA CTGCTGCAAA CTTGAGACTC AGGAGCTAGA GCCTAAATTT 
701 AACCTGATGC AGATTCTTCA AGATAATGGC AATCTTAGCA AAATGCAGGC 
751 CCGAATAGCA TTCTCTGCCT ATCTCCAGCA TGTTCAAATT CGCCTGATGA 
801 AAGACAGTGG CGGTCAGACG TTCAGTGCCA GTTGGGCTGC CAAAGAGGAT 
851 GAACAGATGG AGCTGGTTGT TCGTTTCCTC AAGCGAGCAT CAAATAACCT 
901 CCAGCATTCA CTGAGGATGG TATTACCCAG TCGACGATTG GCACTTCTGG 
951 AACGCAGAAG AATCCTGGCC CACCAGCTGG GTGACTTTAT CATTGTATAC 
1001 AACAAGGAAA CAGAACAAAT GGCTGAAAAG AAATCAAAGA AGAAAGTTGA 
1051 GGAAGAAGAG GAAGATGGGG TGAATATGGA AAACTTTCAG GAGTTCATCA 
1101 GACAAGCAAG TGAGGCTGAA CTGGAGGAGG TGTTGACTTT TTATACCCAA 
1151 AAGAACAAGT CTGCTAGTGT CTTCCTGGGG ACTCACTCTA AAATTTCTAA 
1201 GAACAACAAC AATTATTCTG ATAGTGGGGC AAAAGGTGAT CACCCTGAGA 
1251 CTATAATGGA AGAAGTGAAA ATAAAGCCAC CTAAACAGCA ACAGACGACA 
1301 GAAATTCATT CTGATAAATT ATCTCGATTT ACCACTTCAG CAGAAAAAGA 
1351 GGCAAAATTA GTTTATAGCA ATTCCTCCTC TGGTCCTACT GCTACTCTGC 
1401 AGAAAATTCC CAACACCCAT TTGTCATCTG TTACAACCTC TGACCTCTCT 
1451 CCAGGGCCTT GCCACCATTC TTCTTTATCT CAAATTCCTT CAGCTATCCC 
1501 CAGCATGCCT CACCAGCCAA CAATTTTACT GAACACAGTC TCTGCCAGTG 
1551 CTTCTCCCTG CCTACATCCC GGGGCACAGA ACATCCCAAG CCCTACTGGC 
1601 CTGCCACGCT GTCGATCAGG AAGTCACACC ATTGGTCCCT TTTCTTCCTT 
1651 CCAAAGTGCT GCACACATCT ATAGCCAGAA ACTGTCTCGT CCCTCTTCAG 
1701 CAAAGGCAGG ATCGTGCTAT CTAAACAAGC ATCATTCAGG AATAGCCAAA 
1751 ACACAAAAAG AGGGAGAAGA TGCTTCTTTA TATAGCAAAC GGTACAACCA 
1801 AAGTATGGTT ACAGCTGAAC TTCAGCGGCT AGCTGAGAAG CAGGCAGCGA 
1851 GACAGTATTC TCCATCCAGC CACATCAACC TCCTCACCCA ACAGGTAACA 
1901 AACCTGAATT TGGCAACTGG CATCATAAAC AGAAGCAGTG CTTCAGCTCC 
1951 CCCAACCCTC CGACCCATCA TCAGTCCTAG TGGCCCGACA TGGTCTACAC 
2001 AGTCAGACCC CCAAGCTCCC GAGAATCACT CCAGCTCTCC TGGAAGCAGG 
2051 AGCCTGCAGA CAGGGGGATT TGCCTGGGAA GGAGAAGTAG AAAACAACGT 
2101 GTACAGCCAG GCTACAGGGG TGGTCCCCCA GCACAAGTAT CACCCCACAG 
2151 CAGGCAGCTA TCAGCTTCAA TTTGCCCTGC AGCAACTTGA ACAACAAAAA 
2201 CTTCAGTCCC GGCAGCTCCT GGACCAGAGT CGAGCCCGGC ACCAGGCAAT 
2251 CTTTGGCAGC CAGACACTAC CTAACTCCAA TTTATGGACA ATGAATAATG 
2301 GTGCAGGTTG TAGAATTTCC AGTGCCACAG CTAGTGGCCA GAAGCCAACC 
2 351 ACTCTGCCAC AAAAAGTGGT ACCACCTCCA AGTTCTTGCG CCTCCCTGGT 
2401 TCCCAAACCC CCACCCAACC ACGAACAAGT GCTCAGAAGG GCAACATCCC 
2 451 AGAAAGCTTC CAATACCCGC TTCAGATCCT CCTTTCAAAA CTATTTGTGG 
2501 TATTTCTTCC AAGCAGTCAG CTGAACTGAG GACGACAGCC TACAAACAAC 
2 551 TACATGCATC TGAACTGTCT CTTGTAAATG AGCTTTTTTC AGAGCCAGAA 
2601 TCATACTCTC CAGGAAATAT GGAGAAAGAA ACCTGAGGAG ATTGAAGTTT 
2 651 GCCAGGCACA AGGGCAAAAC TCAGACTGAA TGAATTTGAA AGGGTGGGGC 
2701 CAAAGATGTT GTAACCTGGG AGACTTCTCT GAAGAAAGAA AACTGTTTAA 
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27 51 GAAACACAGA CTGAACTGCA GTACTTTTCC TTAAATAGCT GAGATGACCT 
2801 TCTTTACCCT GGGCTTAGGT GATTCTCATC AGGGTGACCT GAGTGGAAGT 
2851 TGGTGGTAAC GACTGTTCTG TGTCAGCACC CAGGACAGTG GTGTCTGTTA 
2901 AGGCTGCCAG GGATTAGCAG GGAGGAAAGC CATCAGGACT GGGTAGCCTG 
2951 GTAGCACCAA ATCCCAATTA ATGTTACCTG AACATGTGGT GAGGTCAGCC 
3001 GTATGATGAA AGATGTTTAA GAG AT T AATG TCAGAAGAAT ATGAAAATAA 
3051 ACACCGGCTT AAAAAATGTT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AF107885 from database EMBL : 

Homo sapiens chromosome 14q24.3 clone BAC270M14 transforming growth 
factor-beta 3 (TGF-beta 3) gene, complete cds; and unknown genes. 
Score = 3042, P = 3.0e-219, identities = 610/612 
5 exons matching 1893-3070 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 71 bp to 2521 bp; peptide length: 817 
Category: strong similarity to known protein 



1 MEEIKVLRRV KEENDRRGGF I RI FPTSETW EIYGSYLEHK TSMNYMLATR 
51 LFQDRGNPRR SLLTGRTRMT ADGAPELKIE SLNSKAKLHA ALYERKLLSL 
101 EVRKRRRRSS RLRAMRPKYP VITQPAEMNV KTETESEEEE EVALDNEDEE 
151 QEASQEESAG FLRENQAKYT PSLTALVENT PKENSMKVRE WNNKGGHCCK 
201 LETQELEPKF NLMQILQDNG NLSKMQARIA FSAYLQHVQI RLMKDSGGQT 
251 FSASWAAKED EQMELVVRFL KRASNNLQHS LRMVLPSRRL ALLERRRILA 
301 HQLGDFIIVY NKETEQMAEK KSKKKVEEEE EDGVNMENFQ EFIRQASEAE 
351 LEEVLTFYTQ KNKSASVFLG THSKISKNNN NYSDSGAKGD HPETIMEEVK 
401 IKPPKQQQTT EIHSDKLSRF TTSAEKEAKL VYSNSSSGPT ATLQKIPNTH 
451 LSSVTTSDLS PGPCHHSSLS QIPSAIPSMP HQPTILLNTV SASASPCLHP 
501 GAQNIPSPTG LPRCRSGSHT IGPFSSFQSA AHIYSQKLSR PSSAKAGSCY 
551 LNKHHSGIAK TQKEGEDASL YSKRYNQSMV TAELQRLAEK QAARQYSPSS 
601 HINLLTQQVT NLNLATGIIN RSSASAPPTL RPIISPSGPT WSTQSDPQAP 
651 ENHSSSPGSR SLQTGGFAWE GEVENNVYSQ ATGVVPQHKY HPTAGSYQLQ 
701 FALQQLEQQK LQSRQLLDQS RARHQAIFGS QTLPNSNLWT MNNGAGCRIS 
751 SATASGQKPT TLPQKVVPPP SSCASLVPKP PPNHEQVLRR ATSQKASNTR 
801 FRSSFQNYLW YFFQAVS 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_21n23, frame 2 

TREMBL : AF0 64 8 5 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds., N = 1, Score = 1845, P = 2.2e-190 

TREMBL : AF1078 85_3 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N = 1, Score = 443, P - 5.3e-41 

TREMBL: AF107885_4 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N = 1, Score - 265, P = 8.2e-22 



>TREMBL:AF0 64 856_1 product: "7acomp protein"; Rattus sp . 7acomp protein 
mRNA, complete cds . 

Length = 436 



HSPs : 



Score = 1845 (276.8 bits), Expect - 2.2e-190, P = 2.2e-190 
Identities = 369/435 (84%), Positives = 395/435 (90%) 
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Query: 


115 


Sbjct: 


1 


Query: 


175 


Sbjct: 


61 


Query: 


235 


Sbjct: 


121 


Query: 


295 


Sbjct: 


181 


Query: 


355 


Sbjct: 


241 


Query: 


415 


Sbjct : 


300 


Query: 


471 


Sbjct: 


360 


Query: 


531 


Sbjct: 


419 



MRPKYPVIT PAEMN+KTETESEEEEEV LDNEDEEQEASQEESAG L ENQAKYTPSLT 



+VEN+P+EN+MKV EW NKG CCK+ETQE E KFNLMQILQDNGNLSK+QAR+AFSAY 



LQHVQ+RL KDSGGQT S SWAAKEDEQMELWRFLKRAS+NLQHSLRMVLPSRRLALLE 



RRRILAHQLGDFI+VYNKETEQMAEKKSKKK+EEEEEDGVN E+FQEFI RQASEAELEEV 



LTFYTQKNKSASVFLGTHSK SKN+++YSDSGAKGDHPETI +EVKIK PKQQQ TEIHS 



DKLSRFTTSA KEAKLVY+N SS GP A L Q++P+THLSS+ TTS LS GP HHSSLS 



QI AIPSMPHQ +LLN V SASP +HPG N+ SP GLPRCRSGS+TIGPFSSFQSA 



AHIYSQKLSRPSSAKAG 



Pedant information for DKFZphtes3_21n23, frame 2 
Report for DKFZphtes3_21n23 . 2 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

complete cds. 

[PRQSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



817 

91522.09 
9.32 

TREMBL:AF064856_1 product: "7acomp protein"; Rattus sp. 7acomp protein mRNA, 
le-166 
MYRISTYL 6 
CAMP_PHOSPHO_SITE 4 
CK2_PHOSPHO_SITE 12 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 15 
ASN_GLYCOS YLATION 7 
Alpha_Beta 

LOW COMPLEXITY 13.83 % 



SEQ 
SEG 
PRD 

SEQ 

SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 

SEG 
PRD 



MEEIKVLRRVKEENDRRGGFIRI FPTSETWEI YGSYLEHKTSMNYMLATRLFQDRGNPRR 

ccchhhhhhhhhhhccccceeeecccccceeeecceeeecccchhhhhhhhhhhcccccc 

SLLTGRTRMTADGAPELKIESLNSKAKLHAALYERKLLSLEVRKRRRRSSRLRAMRPKYP 

xxxxxxxxxxxxxxxxxxxx 

ccccccceeeccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

VITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLTALVENT 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ceeeccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhccccceeeeeccc 

PKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARIAFSAYLQHVQI 

cccccceeeeeccccccccchhhhhhhccchhhhhhhcccchhhhhhhhhhhhhhhhhhh 

RLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLERRRILA 

xxxxxxxxxxxxxxx . 

hhhhcccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhh 

HQLGDFIIVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEVLTFYTQ 

xxxxxxxxxxxxx 

hhccceeeeeehhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhh 

KNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHSDKLSRF 

ccccceeeecccccccccccccccccccccccchhhhhhhccccccceeeeecccccccc 
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SEQ TTSAEKEAKLVYSNSSSGPTATLQKI PNTHLSSVTTSDLSPGPCHHSSLSQI PSAI PSMP 

SEG 

PRD hhhhhhhheeeecccccccceeeecccccccccccccccccccccccccccccccccccc 

SEQ HQPTILLNTVSASASPCLHPGAQNIPSPTGLPRCRSGSHTIGPFSSFQSAAHIYSQKLSR 

SEG 

PRD cccceeeeccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccc 

SEQ PSSAKAGSCYLNKHHSGIAKTQKEGEDASLYSKRYNQSMVTAELQRLAEKQAARQYSPSS 

SEG 

PRD cccccccceeeecccccccccccccccceeeecchhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ HINLLTQQVTNLNLATGIINRSSASAPPTLRPIISPSGPTWSTQSDPQAPENHSSSPGSR 

SEG . . xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccceeeecccccccccccccccccccccccccc 

SEQ SLQTGGFAWEGEVENNVYSQATGVVPQHKYHPTAGSYQLQFALQQLEQQKLQSRQLLDQS 

SEG xxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RARHQAI FGSQTLPNSNLWTMNNGAGCRI SSATASGQKPTTLPQKVVPPPSSCASLVPKP 

SEG 

PRD hhhhhhhhccccccccceeeeccccceeeeeeeccccccccccceeecccccceeecccc 

SEQ PPNHEQVLRRATSQKASNTRFRSSFQNYLWYFFQAVS 

SEG 

PRD cccchhhhhhhhhhhcccccccccccceeeeeeeccc 



Prosite for DKFZphtes3_21n23 . 2 



PS00Q01 


221 


->225 


ASN GLYCOSYLATION 


pdocoooc: 


PS00001 


362 


->366 


ASN GLYCOSYLATION 


pnocoooc: 


psooaoi 


381 


->385 


ASN GLYCOSYLATION 


pdocoooc: 


PS00001 


434 


->438 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


576 


->580 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


620 


->624 


ASN GLYCOSYLATION 


PDOC000C1 


PS00001 


652 


->656 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


106- 


->110 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


107 


->111 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


271 


->275 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


789 


->793 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


64->67 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


109- 


->112 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


180' 


->183 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


185 


->188 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


280- 


->283 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


287- 


->290 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


322' 


->325 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


359- 


->362 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


414- 


->417 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


535 


->538 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


543' 


->546 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


561- 


->564 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


572- 


->575 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


629- 


->632 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


793 


->796 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


35->39 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


132- 


->136 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


134- 


->138 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


136 


->140 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


154- 


->158 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180- 


->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


347- 


->351 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394- 


->398 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


422- 


->426 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


455- 


->459 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


561 


->565 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


643- 


->647 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


563- 


->572 


TYR_PHOSPHO SITE 


PDOC00007 


PS00008 


195- 


->201 


MYRISTYL 


PDOC00008 


PS00008 


248- 


->254 


MYRISTYL 


PDOC00008 


PS00008 


510- 


->516 


MYRISTYL 


PDOC00008 


PS00008 


557- 


->563 


MYRISTYL 


PDOC00008 


PS00008 


746- 


->752 


MYRISTYL 


PDOC00008 


PS00008 


756- 


->7 62 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_21n23 .2) 
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DKFZphtes3_22c23 



group: testes derived 

DKFZphtes3_22c23 encodes a novel 223 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, 3 EST hits (two from a testis library) 
Sequenced by LMU 
Locus: /map="9q34" 
Insert length: 1113 bp 

Poly A stretch at pos. 1073, polyadenylation signal at pos . 1055 

1 GGTGGGCAAA GGCATCTTCC TCTGGGAAGG ACTGGCACAA GCACTTGGTC 
51 CCTGGGTTGT GTGCCTGGGA GGCCGGGATC AGGGCTGGCC CTCTTTCTCC 

101 CTGGCAAAGC AAAACCTCCC TTTTACTACT ATCAAGGGGA AGTAACTTGA 

151 AGGTGCCTGT GGCAGGCAGC ACCTTGAGCC AACAGGAACC ATTGACATGC 

201 GAGGCCCAGG GCAGGCAGAC TGTGCAGTGG CCATTGGGCG GCCCCTCGGG 

251 GAGGTGGTGA CCCTCCGCGT CCTTGAGAGT TCTCTCAACT GCAGTGCGGG 

301 GGACATGTTG CTGCTTTGGG GCCGGCTCAC CTGGAGGAAG ATGTGCAGGA 

351 AGCTGTTGGA CATGACTTTC AGCTCCAAGA CCAACACGCT GGTGGTGAGG 

4 01 CAGCGCTGCG GGCGGCCAGG AGGTGGGGTG CTGCTGCGGT ATGGGAGCCA 

4 51 GCTTGCTCCT GAAACCTTCT ACAGAGAATG TGACATGCAG CTCTTTGGGC 

501 CCTGGGGTGA AATCGTGAGC CCCTCGCTGA GTCCAGCCAC GAGTAATGCA 

551 GGGGGCTGCC GGCTCTTCAT TAATGTGGCT CCGCACGCAC GGATTGCCAT 

601 CCATGCCCTG GCCACCAACA TGGGCGCTGG GACCGAGGGA GCCAATGCCA 

651 GCTACATCTT GATCCGGGAC ACCCACAGCT TGAGGACCAC AGCGTTCCAT 

7 01 GGGCAGCAGG TGCTCTACTG GGAGTCAGAG AGCAGCCAGG CTGAGATGGA 

7 51 GTTCAGCGAG GGCTTCCTGA AGGCTCAGGC CAGCCTGCGG GGCCAGTACT 

801 GGACCCTCCA ATCATGGGTA CCGGAGATGC AGGACCCTCA GTCCTGGAAG 

851 GGAAAGGAAG GAACCTGAGG GTCATTGAAC ATTTGTTCCG TGTCTGGCCA 

901 GCCCTGGAGG GTTGACCCCT GGTCTCAGTG CTTTCCAATT CGAACTTTTT 

951 CCAATCTTAG GTATCTACTT TAGAGTCTTC TCCAATGTCC AAAAGGCTAG 
1001 GGGGTTGGAG GTGGGGACTC TGGAAAAGCA GCCCCCATTT CCTCGGGTAC 
1051 CAATAAATAA AACATGCAGG CTGAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1101 AAAAAAAAAA AAA 



BLAST Results 



Entry HSAC164 4 from database EMBL : 

Genomic sequence from Human 9q34, complete sequence. 
Score = 2072, P = 8.8e-225, identities = 422/430 
5 exons Bp 41969-38232 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 197 bp to 865 bp; peptide length: 223 
Category: putative protein 



1 MRGPGQADCA VAIGRPLGEV VTLRVLESSL NCSAGDMLLL WGRLTWRKMC 

51 RKLLDMTFSS KTNTLVVRQR CGRPGGGVLL RYGSQLAPET FYRECDMQLF 

101 GPWGEIVSPS LSPATSNAGG CRLFINVAPH ARIAIHALAT NMGAGTEGAN 

151 ASYILIRDTH SLRTTAFHGQ QVLYWESESS QAEMEFSEGF LKAQASLRGQ 
201 YWTLQSWVPE MQDPQSWKGK EGT 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22c23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22c23, frame 2 



Report for DKFZphtes3_22c23.2 



[LENGTH] 223 

[MW] 24546.19 

[pi] 8.57 

[PROSITE] MYR1STYL 4 

[PROSITE] CK2_PHOSPH0_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 6 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] Alpha_Beta 



SEQ MRGPGQADCAVAIGRPLGEVVTLRVLESSLNCSAGDMLLLWGRLTWRKMCRKLLDMTFSS 

PRD ccccccccceeeecccccceeeeehhhhhcccccchhhhhhchhhhhhhhhhhhhhhccc 

SEQ KTNTLVVRQRCGRPGGGVLLRYGSQLAPETFYRECDMQLFGPWGEIVSPSLSPATSNAGG 

PRD ccceeeeeecccccccceeeeccccccchhhhhhhhhccccccceeeecccccccccccc 

SEQ CRLFINVAPHARI AIHALATNMGAGTEGANASYI LI RDTHSLRTTAFHGQQVLYWESESS 

PRD ceeeeeecccceeehhhhhhhhccccccccceeeeeecccccceeecccceeeeeccccc 

SEQ QAEMEFSEGFLKAQASLRGQYWTLQSWVPEMQDPQSMKGKEGT 

PRD hhhhhhhcchhhhhhhhhhcccccccccccccccccccccccc 



Prosite for DKFZphtes3_22c23.2 



PS00001 


31->35 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


150->154 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


22->25 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


45->48 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


59->62 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


161->164 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


196->199 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


216->219 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


33->37 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


5->ll 


MYRISTYL 


PDOC00008 


PS00008 


145->151 


MYRISTYL 


PDOC00008 


PS00008 


148-M54 


MYRISTYL 


PDOC00008 


PS00008 


199->205 


MYRISTYL 


PDOC00G0S 



(No Pfam data available for DKFZphtes3_22c23 . 2) 
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DKFZphtes3_22g2 



group: nucleic acid management 

DKFZphtes3_22g2encodes a novel 1230 amino acid protein with nearly identical to rat TIP120. 

TATA-binding protein TBP is a central component for transcriptional regulation and is a target 
for various transcription regulators. TBP-interacting protein 120 (TIP120) is a protein 
interacting with the TATA-binding protein (TBP) . The novel protein is the human ortholog of 
rat TIP120. The novel TBP-binding protein is considered to participate in transcription 
regulation through the interaction with TBP. 

The new protein can find application in modulation of gene transcription. 



KIAA0829, complete cds, nearly identical to rat TIP120 
complete cDNA, complete cds, EST hits. 



Sequenced by LMU 



Locus: /map="387.3 cR from top of Chrl2 linkage group" 
Insert length: 5387 bp 

Poly A stretch at pos . 5352, polyadenylation signal at pos . 5335 



1 
51 
101 
151 

201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



GGGAGCGAGT 
CTGGCTCCCC 
TCTCACGCGA 
CCCGGGACAG 
CTCCCTAGTC 
CGCGAGCGAG 
GCGGGCAGCA 
TGGCGAGCGC 
AGCGACAAGG 
GCAGAAAGAT 
AAATGATTTT 
GCTGTCAAAT 
AGAGACAATT 
AACTTCGAGA 
CCTCCAGCTT 
TACTGGACGT 
AGCTAGAAGC 
CTTCTTGTTA 
GACCAGCCCT 
ATCTGGTTAT 
CTGTTGTCAG 
CATACAATGT 
AATACCTTGA 
GATGATGAAT 
AAGATGTCCT 
GTCTTAAATA 
GAAGATGAAA 
GAGTGATGAT 
GTGCAGCTGC 
CTTCCAGAAT 
AGAGCGTGAA 
TTTTGAAGCA 
ATGGAGCAGG 
CATTGTTAAA 
GACAGTGTTG 
GCCCTAACTC 
GAAT G ATAAA 
TATACGTAAT 
CAGGCTTTGG 
AATTACATCT 
GTCCTTTAGA 
CTATTTACCT 
AGTCAAGGAA 
GAGACAATTT 
AGACTAAAGA 
GATTGCTGGG 
GGGTTCCTAT 
CTGGGTACTC 
CTTGACAGCT 
TCAGCGAAAG 
ACTTTGGCAA 
TCTCAATGAA 



GCGGAGCGAG 
GTAGAGGCCC 
ACAGCGCCGT 
GCCCACGCCT 
AGCGTCGGCG 
AGGAGGAGCT 
GCTCCAGCAG 
CTCGTACCAC 
ACTTTAGGTT 
TCCATCAAGT 
GAAGTTATTG 
GTCTTGGTCC 
GTAGATACCC 
CATTTCAAGT 
CCAGTGGCTC 
CTTACAAGTG 
CTTGGATATT 
ATTTCCATCC 
AGACTTGCAG 
GAGCTGTGGA 
AGTTGTCCAA 
ATTGCTGCTA 
GAAGATAATT 
TAAGAGAGTA 
AAGGAAGTAT 
TCTTACCTAT 
ATGCAATGGA 
GAATACAGTG 
GAAGTGCTTG 
TCTACAAGAC 
GAGAATGTAA 
AACTCGTCCT 
GAGAAACACC 
GCTCTTCACA 
TTTTAACATG 
AACACATTCC 
TCAAGCTCAT 
CCTCTGTAAC 
TTCCTCCAGT 
GAAGCACTTC 
TCAGCCTTCC 
GTACCATTAA 
AGGGCTATTT 
GGGTTCTGAC 
ATGAAATTAC 
TCACCTTTGA 
CCTTGCTTCA 
TTTCTGCCCT 
GCCATGATTG 
TGATATGCAT 
AAGTATATCC 
CTTATTGGAC 



TGGGAGCGAG 
TTCTGTACGC 
CGTTAGGCTG 
CGCCAGGGAG 
TCGCGCTGCG 
CCAGTGGCGG 
CGCCAGCAGG 
ATTTCCAATT 
TATGGCTACA 
TGGATGATGA 
GAAGATAAAA 
TTTAGTGAGT 
TCTGCACTAA 
ATTGGTCTTA 
TGCATTAGCT 
CAATAGCAAA 
ATGGCTGATA 
TTCAATTCTG 
TGAGGAAAAG 
AATATAGTTT 
AAATGATTCT 
TTAGTAGGCA 
CCTTTGGTGG 
CTGTATTCAA 
ATCCTCATGT 
GATCCAAATT 
TGCTGATGGT 
ATGATGATGA 
GATGCTGTAG 
CGTCTCTCCT 
AGGCAGATGT 
GTACAAAGTT 
TTTAACAATG 
AACAGATGAA 
TTAACTGAGC 
TGTACTTGTA 
CGAATTTGAA 
CATTCTCCTC 
GGTGGCTTGT 
TTGTTACTCA 
TCGTTTGATG 
GAGATTAAAA 
CCTGTATGGG 
TTGCCTAATA 
CAGGTTAACT 
AGATAGATTT 
TTTCTTAGAA 
TGATATTCTA 
ATGCAGTTCT 
GTTTCACAAA 
CTCCTCCCTT 
TTGTGAGATC 



ACGGCCCTGA 
CCCGCCGCCC 
GCTCTGTAGC 
GGGGCAGCCC 
ACCCTGGAAG 
CGGCGGCGGC 
CGGGATCGAG 
TGCTGGAAAA 
AATGATTTGA 
TAGTGAAAGG 
ATGGAGAGGT 
AAAGTGAAAG 
CATGCTTTCT 
AAACAGTAAT 
GCTAATGTAT 
ACAGGAAGAT 
TGTTGAGCAG 
ACCTGTCTAC 
AACCATTATC 
TTGTAGATCT 
ATGTCAACAA 
AGCTGGTCAT 
TAAAATTTTG 
GCCTTTGAAT 
TTCTACCATT 
ATAATTACGA 
GGTGATGATG 
CATGAGTTGG 
TTAGCACAAG 
GCACTAATAT 
TTTTCACGCA 
GGCTATGTGA 
CTTCAGAGTC 
AGAAAAAAGT 
TGGTAAATGT 
CCAGGAATCA 
GATCGATGCT 
AAGTCTTCCA 
GTTGGAGACC 
ACAGCTTGTC 
CAACTCCTTA 
GCAGCTGACA 
ACAAATTATT 
CACTTCAGAT 
ACAGTAAAGG 
GAGGCCTGTT 
AAAACCAGAG 
ATAAAAAACT 
AGATGAGCTC 
TGGCCATCAG 
TCAAAGATAA 
ACCCTTATTG 



GTGGAAGTGT 
ATGAGCTCGT 
CTCGGCTTAC 
GTCGAGGCGC 
CGGGAGCCGC 
GGCAGCGGCA 
GCCGTCAACA 
AATGACATCC 
TGACGGAACT 
AAAGTAGTGA 
ACAGAATTTA 
AATACCAAGT 
GATAAAGAAC 
TGGAGAACTT 
GTAAAAAGAT 
GTCTCTGTTC 
GCAAGGAGGA 
TTCCCCAGTT 
GCTCTTGGCC 
TATTGAACAT 
CAAGAACCTA 
AGAATAGGTG 
CAATGTAGAT 
CATTTGTAAG 
ATAAATATTT 
TGATGAAGAT 
ATGATCAAGG 
AAAGTGAGAC 
GCATGAAATG 
CCAGATTTAA 
TACCTTTCTC 
CCCTGATGCA 
AGGTTCCCAA 
GTGAAGACCC 
ATTACCTGGG 
TTTTCTCACT 
TTGTCATGTC 
TCCTCACGTT 
CATTTTACAA 
AAAGTAATTC 
TATCAAAGAT 
TTGATCAGGA 
TGCAACCTTG 
TTTCTTGGAG 
CATTGACACT 
CTGGGAGAAG 
AGCTTTGAAA 
ATAGTGACAG 
CCACCTCTTA 
TTTTCTTACC 
GTGGATCCAT 
CAGGGGGGAG 
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2 601 CTCTTAGTGC CATGCTAGAC TTTTTCCAAG CTCTGGTTGT CACTGGAACA 
2651 AATAATTTAG GATACATGGA TTTGTTGCGC ATGCTGACTG GTCCAGTTTA 
2701 CTCTCAGAGC ACAGCTCTTA CTCATAAGCA GTCTTATTAT TCCATTGCCA 
2751 AATGTGTAGC TGCCCTTACT CGAGCATGCC CTAAAGAGGG ACCAGCTGTA 
2801 GTAGGTCAGT TTATTCAAGA TGTCAAGAAC TCAAGGTCTA CAGATTCCAT 
2851 TCGTCTCTTA GCTCTACTTT CTCTTGGAGA AGTTGGGCAT CATATTGACT 
2901 TAAGTGGACA GTTGGAACTA AAATCTGTAA TACTAGAAGC TTTCTCATCT 
2951 CCTAGTGAAG AAGTCAAATC AGCTGCATCC TATGCATTAG GCAGCATTAG 
3001 TGTGGGCAAC CTTCCTGAAT ATCTGCCGTT TGTCCTGCAA GAAATAACTA 
3051 GTCAACCCAA AAGGCAGTAT CTTTTACTTC ATTCCTTGAA GGAAATTATT 
3101 AGCTCTGCAT CAGTGGTGGG CCTTAAACCA TATGTTGAAA ACATCTGGGC 
3151 CTTATTACTA AAGCACTGTG AGTGTGCAGA GGAAGGAACC AGAAATGTTG 
3201 TTGCTGAATG TCTAGGAAAA CTCACTCTAA TTGATCCAGA AACTCTCCTT 
32 51 CCACGGCTTA AGGGGTACTT GATATCAGGC TCATCATATG CCCGAAGCTC 
3301 AGTGGTTACG GCTGTGAAAT TTACAATTTC TGACCATCCA CAACCTATTG 
3351 ATCCACTGTT AAAGAACTGC ATAGGTGATT TCCTAAAAAC TTTGGAAGAC 
34 01 CCAGATTTGA ATGTGAGAAG AGTAGCCTTG GTCACATTTA ATTCAGCAGC 
34 51 ACATAACAAG CCATCATTAA TAAGGGATCT ATTGGATACT GTTCTTCCAC 
3501 ATCTTTACAA TGAAACAAAA GTTAGAAAGG AGCTTATAAG AGAGGTAGAA 
3551 ATGGGTCCAT TTAAACATAC GGTTGATGAT GGTCTGGATA TTAGAAAGGC 
3601 AGCATTTGAG TGTATGTACA CACTTCTAGA CAGTTGTCTT GATAGACTTG 
3651 ATATCTTTGA ATTTCTAAAT CATGTTGAAG ATGGTTTGAA GGACCATTAT 
3701 GATATTAAGA TGCTGACATT TTTAATGTTG GTGAGACTGT CTACCCTTTG 
3751 TCCAAGTGCA GTACTGCAGA GGTTGGACCG ACTTGTTGAG CCATTACGTG 
3801 CAACATGTAC AACTAAGGTA AAGGCAAACT CAGTAAAGCA GGAGTTTGAA 
3851 AAACAAGATG AATTAAAGCG ATCTGCCATG AGAGCAGTAG CAGCACTGCT 
3901 AACCATTCCA GAAGCAGAGA AGAGTCCACT GATGAGTGAA TTCCAGTCAC 
3951 AGATCAGTTC TAACCCTGAG CTGGCGGCTA TCTTTGAAAG TATCCAGAAA 
4001 GATTCATCAT CTACTAACTT GGAATCAATG GACACTAGTT AGATGTTTGT 
4051 TCACCATGGG GACCATTACA TATGACCATA CAATGCACTG AATTGACAGG 
4101 TTAATCATAA GACATGGAAA GAGAAGTGTC TAAAAGCTTC AAAATGTTCC 
4151 ACTTTTTTTT CCTTCATGGA GACTGTTTGT TTGGCTTTCT TCCATTGTTG 
4201 TTTTTGTAGC ATTTATTTCA GAAATGTGTA TTTCCATAAT CCAGAGGTTG 
4251 TAAAACC ACT AGTGTTTTAG TGGTTACAGC AACATTTGAA ATGGAAACTA 
4301 AAAGTTAGGA TTTTATGGAG TATGGAGATA GGGTCCAGTA TCTATTTACC 
4351 CTGTAATGTT TAGGATTAAA ATGTTAAAAT TTTGTGACCA TGAATTTCTT 
4 401 TCTTTTATAA ATTTTCTCAT TTAAAAATCA AAAATCTTGC AAAACAAAAA 
4451 CCATGTTTCT TTTTCTTGTA TAACTTTTTG TTTTCAGCAA CATAAATTGA 
4501 TTTTTAGCTG GCAGACAAGA ATATCCATAT AAGATTTGTT AACCATTTCA 
4551 GAGAGTTTGG CAATTTTTAA AAGATAATAA GGTATCATTT TTAAGTATGA 
4601 AAATTAACAA TATCCCTGTT GCGCACACTA ATTTTGCATG AGTAAGTTTA 
4651 CAAATATGTA TCGTCTGTAA AGCAGCATGT GCAGATTATT CATAATATAG 
4701 AAGTTAAAAT AAGTATTAGT GCAATTTTCA GATATTTATT TTTGCACAGA 
47 51 AAACACATTA TCTGGAGAGA AAGAAAGGAG AATTTTTGAG ACTTGGGTTT 
4801 TCTTAATGCC AGTGTGAATT TGCAGATGTT TTCAGAAAAT CAAGTCACAG 
4851 TAACAATTTG CCACTTTTTT CTATTATAAA TCTTCTTACT TAAATTTTGA 
4901 ATATTTAGTT TTTCTCAGTT ACCCATTTGT GTGTGTGTGA TTCCACTTAG 
4951 AAATTCTTAA AACCAGATTT TTCTTTCATT CCGTTTGGAT GTCTACATTC 
5001 CTTATCAAAG GATATAAATA CTGTGTATGC TTTTGAATTT TATTTTTAGG 
5051 AAAATTCTGA AGCCAGCTAT CACAGGTTTG TTAGCTAATA ATAGTATTTT 
5101 CTTTTAGTTG AGTTAGGTTT TTCCCCATCT CCTGTAGAGC GAATTTACAT 
5151 ATTGTATTGG GTAAGTGTTC ACTACTTTTC CTGATTAAGG GATCTGTGCT 
5201 GGGGAACAAA GCTTTTGCAG TACCTTATAT TGTAGTTAAA ATTTTATTTA 
5251 ACATATCCTT CAGTGAGCTC ATTTCACACT GTAGCCTCTT CCTTAAAATT 
5301 TGTGGTGCTC CTGTAACAGT AAGAAC TAAT TCTGAAATAA AAGACATCTC 
5351 CTAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS793345 from database EMBL: 
human STS WI-12457. 
Score = 1985, P = 1.3e-83, identities = 433/460 



Medline entries 



97127450: 

Molecular cloning of a novel 120-kDa TBP-interacting 
protein. 



Peptide information for frame 2 
ORF from 350 bp to 4039 bp; peptide length: 1230 
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Category: known protein 
Classification: Nucleic acid management 



1 MASASYHISN LLEKMTSSDK DFRFMATNDL MTELQKDSIK LDDDSERKVV 

51 KMILKLLEDK NGEVQNLAVK CLGPLVSKVK EYQVETIVDT LCTNMLSDKE 

101 QLRDISSIGL KTVIGELPPA SSGSALAANV CKKITGRLTS AIAKQEDVSV 

151 QLEALD1MAD MLSRQGGLLV NFHPSILTCL LPQLTSPRLA VRKRTI IALG 

201 HLVMSCGNIV FVDLIEHLLS ELSKNDSMST TRTYIQCIAA ISRQAGHRIG 

251 EYLEKIIPLV VKFCNVDDDE LREYCIQAFE SFVRRCPKEV YPHVSTIINI 

301 CLKYLTYDPN YNYDDEDEDE NAMDADGGDD DDQGSDDEYS DDDDMSWKVR 

351 RAAAKCLDAV VSTRHEMLPE FYKTVSPALI SRFKEREENV KADVFHAYLS 

401 LLKQTRPVQS WLCDPDAMEQ GETPLTMLQS QVPNIVKALH KQMKEKSVKT 

451 RQCCFNMLTE LVNVLPGALT QHIPVLVPGI IFSLNDKSSS SNLKIDALSC 

501 LYVILCNHSP QVFHPHVQAL VPPVVACVGD PFYKITSEAL LVTQQLVKVI 

551 RPLDQPSSFD ATPYIKDLFT CTIKRLKAAD IDQEVKERAI SCMGQIICNL 

601 GDNLGSDLPN TLQIFLERLK NEITRLTTVK ALTLIAGSPL KIDLRPVLGE 

651 GVPILASFLR KNQRALKLGT LSALDILIKN YSDSLTAAMI DAVLDELPPL 

701 ISESDMHVSQ MAISFLTTLA KVYPSSLSKI SGSILNELIG LVRS PLLQGG 

751 ALSAMLDFFQ ALVVTGTNNL GYMDLLRMLT GPVYSQSTAL THKQSYYSIA 

801 KCVAALTRAC PKEGPAVVGQ FIQDVKNSRS TDSIRLLALL SLGEVGHHID 

851 LSGQLELKSV ILEAFSSPSE EVKSAASYAL GSISVGNLPE YLPFVLQEIT 

901 SQPKRQYLLL HSLKEIISSA SVVGLKPYVE NIWALLLKHC ECAEEGTRNV 

951 VAECLGKLTL I DPETLLPRL KGYLISGSSY ARSSVVTAVK FTISDHPQPI 

1001 DPLLKNCIGD FLKTLEDPDL NVRRVALVTF NSAAHNKPSL I RDLLDTVLP 

1051 HLYNETKVRK ELIREVEMGP FKHTVDDGLD IRKAAFECMY TLLDSCLDRL 

1101 DIFEFLNHVE DGLKDHYDIK MLTFLMLVRL STLCPSAVLQ RLDRLVEPLR 

1151 ATCTTKVKAN SVKQEFEKQD ELKRSAMRAV AALLTIPEAE KSPLMSEFQS 
1201 QISSNPELAA IFESIQKDSS STNLESMDTS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22g2, frame 2 

TREMBL:AB020636_1 gene: "KIAA0829"; product: "KIAA0829 protein"; Homo 
sapiens mRNA for KIAA0829 protein, partial cds . , N = 1, Score = 5986, P 
= 0 

TREMBL : RND671 1_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus 
mRNA for TIP120, complete cds., N = 1, Score = 6203, P = 0 



>TREMBL:RND6711_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA 
for TIP120, complete cds. 
Length = 1,230 

HSPs : 



Score = 6203 (930.7 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 1227/1230 (99%), Positives = 1228/1230 (99%) 



Query: 


1 


MAS AS YHI SNLLEKMTSSDKDFRFMATNDLMTELQKDS I KLDDDSERKVV KMILKLLEDK 


60 






MASASYHISNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 




Sbjct: 


1 


MAS AS YHI SNLLEKMTSSDKDFRFMATNDLMTELQKDSI KLDDDSERKVVKMILKLLEDK 


60 


Query: 


61 


NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDI SSIGLKTVIGELPPA 


120 






NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDI SSIGLKTVIGELPPA 




Sbjct: 


61 


NGEVQNLAVKCLGPLVSKVKEYQVETIVDTLCTNMLSDKEQLRDI SSIGLKTVIGELPPA 


120 


Query: 


121 


SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 


180 






SSGSALAANVCKKITGRLTSAI AKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 




Sbjct: 


121 


SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 


180 


Query: 


181 


LPQLTSPRLA VRKRTI IALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRT YIQCIAA 


240 






LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 




Sbjct: 


181 


LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 


240 


Query: 


241 


I SRQAGHRIGEYLEKI I PLVVKFCNVDDDELREYCIQAFESFVRRCPKEV YPHVSTIINI 


300 






I SRQAGHRIGEYLEKI IPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTI INI 




Sbjct: 


241 


I SRQAGHRIGEYLEKI I PLVVKFCNVDDDELREYCIQAFESFVRRCPKEV YPHVSTIINI 


300 


Query: 


301 


CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 


360 






CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 




Sbjct: 


301 


CLKYLTYDPN YNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 


360 


Query: 


361 


VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 


420 






VSTRHEMLPE FYKTVS PALI SRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 




Sbjct: 


361 


VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 


420 
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Query: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 
Sbjct: 421 GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 480 

Query: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 
Sbjct: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

Query: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQI ICNL 600 

LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 
SbjCt: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADI DQEVKERAISCMGQI ICNL 600 

Query: 601 GDNLGSDLPNTLQI FLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

GDNLG DL NTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 
Sbjct: 601 GDNLGPDLSNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

Query: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLI SESDMHVSQMAISFLTTLA 720 

KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 
Sbjct: 661 KNQRALKLGTLSALDI LI KNYSDSLTAAMIDAVLDELPPL I SESDMHVSQMAISFLTTLA 720 

Query: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

KVYPSSLSKISGSI LNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 
Sbjct: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

Query: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 840 

GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAWGQFIQDVKNSRSTDSIRLLALL 
Sbjct: 781 GPVYSQSTALTHKQSYYSI AKCVAALTRACPKEGPAVVGQFIQDVKMSRSTDSIRLLALL 840 

Query: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
Sbjct: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

Query: 901 SQPKRQYLLLHSLKEI ISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 960 

SQPKRQYLLLHSLKEI ISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 
Sbjct: 901 SQPKRQYLLLHSLKEI I SSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 960 

Query: 961 IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 
Sbjct: 961 IDPETLLPRLKGYLISGSS YARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 1020 

Query: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

NVRRVALVTFNSAAHNKPSLIRDLLD+VLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 
Sbjct: 1021 NVRRVALVTFNSAAHNKPSLIRDLLDSVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

Query: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 
SbjCt: 1081 IRKAAFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

Query: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTI PEAEKSPLMSEFQS 1200 

RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 
Sbjct: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 1200 

Query: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 

QISSNPELAAIFESIQKDSSSTNLESMDTS 
Sbjct: 1201 QISSNPELAAIFESIQKDSSSTNLESMDTS 1230 



Pedant information for DKFZphtes3_22g2, frame 2 



Report for DKFZphtes3_22g2 . 2 



[HOMOL] 
TIP120, 



[LENGTH] 



[MW] 
[pi] 



1230 

136376. 58 
5.52 

TREMBL : RND6711_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA for 



complete cds . 0.0 



[KW] 
[KW] 



TRANSMEMBRANE 1 

LOW COMPLEXITY 5.28 % 



SEQ 
SEG 
PRD 
MEM 



MASAS YHI SNLLEKMTSSDKDFRFMATNDLMTELQKDSIKLDDDSERKVVKMILKLLEDK 



cccccchhhhhhhhhcccccceeeeehhhhhhhhhcccccccccchhhhhhhhhhhhhcc 



SEQ 
SEG 
PRD 



NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMLSDKEQLRDISSIGLKTVIGELPPA 



xxxx 

ccccceeeeeeeeceeeeehhhhhhhhhhhhccchhhhhcccccccchhhhhhhhhcccc 
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MEM 

SEQ SSGSALAANVCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 

SEG xxxxxxxx 

PRD cccccchhhhhhhccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeeecchhhhhh 

MEM 

SEQ LPQLTSPRLAVRKRTIIALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 

SEG 

PRD hcccccchhhhhhhhhhhheeeeecccceeehhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ ISRQAGHRIGEYLEKIIPLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTIINI 

SEG 

PRD hhhhcccccccchhhhhhhhheeeeccchhhhhhhhhhhhhhhhccccceeecchhhhhh 

MEM 

SEQ CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ VSTRHEMLPEFYKTVSPALI SRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 

SEG 

PRD hhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeecccccccc 

MEM 

SEQ GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHIPVLVPGI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhcccc'nhhhhhhhhhhccccccccceeeecce 

MEM 

SEQ IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 

SEG xxxxxxxxxxxxxxxx 

PRD eeeeccccccccchhhhhhhheeeeecccccccccceeeeecceeeeecccchhhhhhhh 

MEM 

SEQ LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQIICNL 

SEG 

PRD hhhhhhhhhhcccccccccccccohhhhhhhhhhhhhhccchhhhhhhhhhhhheeeecc 

MEM 

SEQ GDNLGSDLPNTLQI FLERLKNEITRLTTVKALTLI AGSPLKIDLRPVLGEGVPILASFLR 

SEG 

PRD cccccccccchhhhhhhhhcchhhhhhhhhhhheeeeccccccccceeehhhhhhhhhhh 

MEM 

SEQ KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 

SEG 

PRD cccccceeecchhhhhhhhhhhccccccchhhhhhhhhhhheeeecccccchhhhhhhhc 

MEM 

SEQ GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSIRLLALL 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhh 

MEM 

SEQ SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 

SEG 

PRD hccccccccccccccccceeeeeeccccchhhhhhhhhhhccccccccccchhhhhhhhh 

MEM 

SEQ SQPKRQYLLLHSLKEIISSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 

SEG 

PRD cccchhhhhhhhhhhhhhcccceeehhhhhhhhhhtihhhhhcccccceeeeecccccccc 

MEM 

SEQ IDPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNCIGDFLKTLEDPDL 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhccccccccccchhhhhhhhhhhccccc 

MEM 

SEQ NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 

SEG 

PRD ccceeeeeeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccch 

MEM 
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SEQ IRKARFECMYTLLDSCLDRLDIFEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 

SEG 

PRD hhhhhhhhhhhhhhhccccccceeeecccccccccchhhhhhhhhhhhhhhhcccchhhh 

MEM 

SEQ RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 

SEG 

PRD hhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhh 

MEM 

SEQ QISSNPELAAIFESIQKDSSSTNLESMDTS 

SEG 

PRD hhhccchhhhhhhhhhhccccccccccccc 

MEM 



(No Prosite data available for DKFZphtes3_22g2 . 2) 
(No Pfam data available for DKFZphtes3_22g2 . 2) 
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DKFZphtes3_22nl3 
group: testes derived 

DKFZphtes3_22nl3 encodes a novel 677 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

dJ1042K10.3, complete 
Sequenced by LMU 
Locus: /map="22ql3.1-13.2" 
Insert length: 3353 bp 

Poly A stretch at pos. 3315, polyadenylation signal at pos . 3298 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 



ATGGAACCAC 
AGCCAACCCA 
GGAGCTGAAG 
CGGACCAGAA 
AAGATCCTGC 
GCAGCAGCAG 
AGTCAGCAGG 
CTCTCCACTA 
GCTGGCACGT 
CGGCCAACCT 
TTGCGATCAC 
TCGAGCCTAT 
CTGCCGCCAC 
CCAGCGGCCC 
TCCAGCTGAG 
TTGGCAGCAC 
TCACTGCTCA 
TGAGATGGTG 
TGCAGATCCT 
AGCCCTGGGG 
GCAGGAGAAA 
AGCAGCAGCT 
GCCCAGCAGC 
GGAGAACAGC 
CTCACCCATT 
CCTTGTGCTG 
CTTGCAGCCT 
CTCAGGGCCC 
GACTCCACAG 
CAGCCCTGGC 
GCTCTCCAGC 
CAGCCCCTCT 
CTATGAGGAA 
CAAGCCAGCA 
ATTTCAGCAG 
ATCCCCGAAG 
CTGCTGAGCT 
CCTGGACGCC 
GACCAGTGGG 
ATAGCCAGAT 
ATGGACACCT 
CCTGGACCTG 
CGTCAGGTGG 
AGCCTCTTCT 
GGATTCCTGC 
GGAGCCAGGG 
ACATGGTTGT 
GGAGGCTAGA 
AGAGGAGAGC 
GGACACACGG 
GTCAGGGGGC 
CTGGCCAGGC 
GGGTGAATCT 
AGCCTGGGCC 
AGGGCTCCCT 
TGGTGGCAAC 
GACATACATA 



TATCCCCACT 
AGTCTGCCAG 
CCAAAGGTGA 
GCAGGACAGG 
AGCAGCAGCA 
CAGCACCACA 
CGAGGCCCTG 
CCAATAGCAG 
CAGAACAGCA 
GGACGACATG 
TGCCTGTCTC 
CAAGACCAAA 
CTCTATCCTC 
GGCTGAGCAC 
GTGGTGGTGG 
GGGCTCCACG 
GCACGGGCGA 
ACATCACCTC 
CGTGAAGGAG 
GGCGGGCGGA 
GACAAGCAGA 
GGTGGAGCGG 
CCGCCCCCGC 
TTCTCCAGCT 
CAACCCCAGC 
TGGCCCCAGG 
GAGCCCGAGC 
CGGCCTCATC 
GGACCCACCT 
CTGTCCAGTG 
GCCTGCCCCC 
TTGGGACCCC 
GCCATGAGCC 
GATGGACGAC 
ATTTCAAGGA 
ACAGTCTGTG 
CCCCCAGGCT 
TGGAGGACTT 
CATGACGGGC 
GCTGAGCAGC 
CGGAATTGCA 
GCTGATGGCC 
TCCCGTGCTG 
CCACAGACTT 
TTGTAGCTCT 
TACTCCAATG 
GAGTCTTGAC 
ACAGAGAAGC 
AGCTGTCAAG 
TCAGGGTCAG 
ACTGTCTGTC 
GGGAGGCTTC 
CCTTCCTTTC 
TCTACCCCCT 
GCCATTTTAG 
AATTTTATGT 
TATATTTTTG 



GCCAAGTCCA 
TGAGAAGTCA 
AGAAGCTCAA 
GGGGCACCCC 
GCTCTTCCTC 
ACTACCAGGC 
GGAAGCAGCG 
CTCCAGCTCG 
CCTCACTGAC 
AAGGTGGCAG 
GGGCACCAAA 
TCAGCCCTGT 
CACAAGGCTG 
GGGGCCAGCC 
CCACGGTGGC 
CCCCCCGTGT 
TGAAAACTCC 
TGACGCAGCT 
GAGGGCCCCC 
GCTAGAGGGG 
TCGAGGCGCT 
CTCAAGCTGC 
CCCCGCCCCC 
GCCAGCTGAG 
CTGGCGGCCC 
GCCCCCGTCC 
CGGTCCCCGC 
AAGGGGGTTG 
TGTCCTCACC 
GGAGCCCCCA 
TCTGCCCAGA 
CACTTCTCTG 
AGCAGCCCAA 
CTGTTTGACA 
GCCGCCATCC 
GGTCCCCCCT 
GCCCCACCTC 
CCTGGAGAGC 
CAGAGCCCCT 
ACTGCCATCC 
CTTTGTTCCT 
ACCTGGACAG 
AGCCTAGCCC 
CCTCGATGGC 
CTGGCTCAAG 
CGTGGCTCTC 
AATCACAGCC 
CCTTACTCCT 
AAGCAGCCCT 
GGCCATTTCA 
TGGCTACAAT 
TCTTCTGACC 
TCTCCCTGCT 
ATTCCCTGTG 
TGTCTTGGTG 
ACAGGTGTAT 
GGGGGGGGCG 



CCCCCACACT 
CAGCGCAGCA 
GTACCACCAG 
CCATGGACTC 
CAGCTGCAGA 
CATCCTGCCT 
GGACCCCCCC 
GGCGCCCCTG 
TGGCAAGCCG 
AGCTGAAGCA 
ACTGAGCTGA 
GCCAGGAGCC 
GCGAGGTGGT 
CTGGTGGCAG 
CAGCAGTGGG 
CTCCCACCCC 
ACCCCCGGGG 
GACCCTGCAG 
GGGCCGGGTC 
CGCGACAAGG 
GACGCGCATG 
AGCTGGAGCA 
CTCGGCACCC 
CCAGCAGCCC 
CAGCCACCAA 
GTGGTGGTGA 
CCCCCAGTTG 
CACCTCCCAC 
GTGACCAATA 
GCAGCCCTCG 
TGGACCTGGA 
CTGAAGAAGG 
ACAGCAGGAA 
TTCTCATTCA 
CTGCCAGGGA 
GGCAGCACAG 
CTCCAGGCTC 
AGCACGGGGC 
TTCCCTCATT 
TGGACCACCC 
GAGCCCAGCA 
CATGGACTGG 
CCCTCAGCAC 
CATGATTTGC 
ACGGGGTGGG 
CTGCGTGATT 
CCTGCTTTTT 
GGTTCAGTGC 
GGCTCTCACG 
GCTTGACCTC 
TTGGCTAAGG 
CAGGGCTGAG 
TTGCTGTGAA 
TCTGCCAACC 
TAGTGTAACC 
ATACCTCTAT 
GACAGGAGAT 



CATTAAGCAA 
AGAAGGCCAA 
TACATCCCCC 
ATCCTACGCC 
TCCTCAACCA 
GCCCCGCCAA 
AGTACGCAGC 
GGCCCTGTGG 
GGAGCCCTGC 
GGAGCTGAAG 
TTGAGCGCCT 
CCCAAGGCCC 
GGTAGCCTTC 
CAGGCCTGGC 
GTGGTGAAGT 
CTCGGAGCGC 
ACACCTTTGG 
GCCTCGCCAC 
CTGTTGCCTG 
ACCAGATGCT 
CTCCGGCAGA 
GGAGAAGCGA 
CCGTGAAGCA 
CTGGGCCCCG 
CCACATAGAC 
AGCAGGAAGC 
CTTCTGGGGC 
CCTCATCACC 
AGAATGCAGA 
TCCCAGCCTG 
GCACCCACTG 
AACCACCTGG 
AATGGTTCCT 
GAGCGGAGAA 
AGGAGAAGCC 
CCATCACCTT 
ACCCTCCCTC 
TGCCCCTGCT 
GACGACCTCC 
CCCGTCACCC 
GCACCATGGG 
CTGGAGCTGT 
CACAGCCCCC 
AGCTGCACTG 
GAAGGGGCTG 
CGGCCTCTCC 
CCCTTCCCTG 
CACGCAGGGC 
CTGGGGTTTT 
CTTTTTTGAG 
TAGGTGAAGC 
ACAGGTTAAG 
GGGAGAAATT 
CCAGGATCCC 
ATTTAGTGGT 
ATT ATAT AT C 
GGGTGCAACT 
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2851 CCCTCCCATC CTACTCTCAC AGAAGGGCCT GGATGCAAGG TTACCCTTGA 

2901 GCTGTGTGCC ACAGTCTGGT GCCCAGTCTG GCATGCAGCT ACCCAGGCCC 

2951 ACCCATCACG TGTGATTGAC ATGTAGGTAC CCTGCCACGG CCTATGCCCC 

3001 ACCTGCCCTG CTTCCTGGCT CCTTATCAGT GCCATGAGGG CAGAGGTGCT 

3051 ACCTGGCCTT CCTGCCAGGA GCTCTCCACC CACTCACATT CCGTCCCCGC 

3101 CGCCTCACTG CAGCCAGCGT GGCCCTAGGA CAGGAGGAGC TTCGGGCCCA 

3151 GCTTCACCCT GCGGTGGGGC TGAGGGGTGG CCATCTCCTG CCCTGGGGCC 

3201 ACTGGCTTCA CATTCTGGGC TGACTCATAG GGGAGTAGGG GTGGAGTCAC 

3251 CAAAACCAGT GCTGGGACAA AGATGGGGAA GGTGTGTGAA CTTTTTAAAA 

3301 TAAACACAAA AACACAGGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AAG 



BLAST Results 



Entry HS1042K10 from database EMBL: 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . 
Contains the ADSL gene for Adenylosuccinate lyase (EC 4.3.2.2, 
Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3). Contains ESTs, STSs, GSSs and a 
putative CpG island. 

Score = 7997, P = 0.0e+00, identities = 1617/1645 
7 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 183 bp to 2213 bp; peptide length: 677 
Category: similarity to unknown protein 
Classification: unclassified 



1 MDSSYAKILQ QQQLFLQLQI LNQQQQQHHN YQAILPAPPK 
51 TPPVRSLSTT NSSSSSGAPG PCGLARQNST SLTGKPGALP 
101 LKQELKLRSL PVSGTKTELI ERLRAYQDQI SPVPGAPKAP 
151 EVVVAFPAAR LSTGPALVAA GLAPAEVVVA TVASSGVVKF 
201 PTPSERSLLS TGDENSTPGD TFGEMVTSPL TQLTLQASPL 
251 AGSCCLSPGG RAELEGRDKD QMLQEKDKQI EALTRMLRQK 
301 LEQEKRAQQP APAPAPLGTP VKQENSFSSC QLSQQPLGPA 
351 ATNHI DPCAV APGPPSVVVK QEALQPEPEP VPAPQLLLGP 
401 PPTLITDSTG THLVLTVTNK NADSPGLSSG SPQQPSSQPG 
451 DLEHPLQPLF GTPTSLLKKE PPGYEEAMSQ QPKQQENGSS 
501 LIQSGEISAD FKEPPSLPGK EKPSPKTVCG SPLAAQPSPS 
551 PGSPSLPGRL EDFLESSTGL PLLTSGHDGP EPLSLIDDLH 
601 DHPPSPMDTS ELHFVPEPSS TMGLDLADGH LDSMDWLELS 
651 LSTTAPSLFS TDFLDGHDLQ LHWDSCL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22nl3, frame 3 

TREMBL:HS1042K10_6 gene: "dJ1042K10 . 3" ; product: "dJ1042K10.3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3 . 1-13 . 2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 
4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable 
rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs 
and a putative CpG island., N = 1, Score = 1285, P = 4.9e-131 

TREMBL : CEUK06A9_3 gene: "K06A9.1a"; Caenorhabditis elegans cosmid 
K06A9., N = 2, Score = 149, P = 1.3e-09 

TREMBLNEW:SSI132828_1 product: "p210 protein"; Spermatozopsis similis 

mRNA for p210 protein, partial, N = 1, Score = 171, P = 2.8e-09 



SAGEALGSSG 
ANLDDMKVAE 
AATSILHKAG 
GSTGSTPPVS 
QILVKEEGPR 
QQLVERLKLQ 
HPFNPSLAAP 
QGPGLIKGVA 
SPAPAPSAQM 
SQQMDDLFDI 
AELPQAAPPP 
SQMLSSTAIL 
SGGPVLSLAP 



>TREMBL:HS1042K10_6 gene: "dJ1042K10 . 3"; product: "dJ1042K10.3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3 . 1-13 . 2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 
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4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a 
putative CpG island. 
Length = 243 

HSPs : 

Score = 1285 (192.8 bits). Expect = 4.9e-131, P = 4.9e-131 
Identities = 243/243 (100%), Positives = 243/243 (100%) 

Query: 435 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 494 

PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSOOPKQQENGSSSQQM 
SbjCt: 1 PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 60 

Query: 495 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 554 

DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 
SbjCt: 61 DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 120 

Query: 555 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 614 

SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 
Sbjct: 121 SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 180 

Query: 615 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 674 

VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 
Sbjct: 181 VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 240 

Query: 675 SCL 677 
SCL 

Sbjct: 241 SCL 243 



Pedant information for DKFZphtes3_22nl3, frame 3 



Report for DKFZphtes3_22nl3 . 3 



[LENGTH] 677 

[MW] 70743.01 

[pi] 4.93 

[HOMOL] TREMBL:HS1042K10_6 gene: "dJ1042K10 . 3"; product: "dJ1042K10.3 (novel protein)"; 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 . 1-13 . 2 . Contains the ADSL gene for 
Adenylosuccinate lyase (EC 4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with 
probable rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a putative 
CpG island, le-111 



[KW] TRANSMEMBRANE 1 

[KW] LOW_C0MPLEXITY 21.57 % 

[KW] COILED_COIL 4.58 % 

SEQ MDSSYAKILQQQQLFLQLQILNQQQQQHHNYQAILPAPPKSAGEALGSSGTPPVRSLSTT 

SEG xxxxxxxxxxxxxxxxxxx xxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhcceeeeeecccccceeeecccccccceeecccc 

COILS 

MEM 

SEQ NSSSSSGAPGPCGLARQNSTSLTGKPGALPANLDDMKVAELKQELKLRSLPVSGTKTELI 

SEG xxxxxx 

PRD cccccccccccceeecccccccccccccccccccchhhhhhhhhhhhhhcccccchhhhh 

COILS 

MEM 

SEQ ERLRAYQDQISPVPGAPKAPAATSILHKAGEVVVAFPAARLSTGPALVAAGLAPAEVVVA 

SEG xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcccccccccccceeeeeeeccceeeeccccccccccccccccccceeeeee 

COILS 

MEM MMMMMMMMMMMMMMMMMMMMMM 

SEQ TVASSGVVKFGSTGSTPPVSPTPSERSLLSTGDENSTPGDTFGEMVTSPLTQLTLQASPL 

SEG xxxxxxxx . . xxxxxxxxxxxxxx 

PRD eeecccccccccccccccccccccceeeeccccccccccccccceeecccceeeecccce 

COILS 

MEM M 

SEQ QILVKEEGPRAGSCCLSPGGRAELEGRDKDQMLQEKDKQIEALTRMLRQKQQLVERLKLQ 

SEG 

PRD eeeeeccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LEQEKRAQQPAPAPAPLGTPVKQENSFSSCQLSQQPLGPAHPFNPSLAAPATNHIDPCAV 
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SEG xxxxxxxxxx 

PRD hhhhhhhhhcccccccccccccccccceeeeecccccccccccccceeeccccccccccc 

COILS CCCCCCC 

MEM 

SEQ APGPPSVVVKQEALQPEPEPVPAPQLLLGPQGPGLIKGVAPPTLITDSTGTHLVLTVTNK 

SEG xxxxxxxxxxxx 

PRD cccccceeeeeccccccccccccceeeccccccceeeeecccccccccccceeeeeeecc 

COILS 

MEM 

SEQ NADS PGLSSGSPQQPSSQPGS PAP APS AQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQ 

SEG . . .xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccccccccccc 

COILS 

MEM 

SEQ QPKQQENGSSSQQMDDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPS 

SEG xxxxxxxxxxx 

PRD ccccccccccccchhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ AELPQAAPPPPGSPSLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAIL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhcccceee 

COILS 

MEM 

SEQ DHPPSPMDTSELHFVPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFS 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeeeccccceeeeeecccccccccc 

COILS 

MEM 

SEQ TDFLDGHDLQLHWDSCL 

SEG 

PRD cccccccceeecccccc 

COILS 

MEM 

(No Prosite data available for DKFZphtes3_22nl3 . 3) 
(No Pfam data available for DKFZphtes3_22nl3 . 3) 
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DKFZphtes3_23111 



group: intracellular transport and trafficking 

DKFZphtes3_23111 encodes a novel 186 amino acid protein nearly identical to mouse ADP- 
ribosylation-like factor homolog 6 (Arl6) - 

Protein secretion through the endoplasmic reticulum and the Golgi vesicular trafficking syst 
is initiated by the binding of ADP-ribosylation factors 

(ARFs) to donor membranes, leading to recruitment of cocatomer, bud formation, and eventual 
vesicle release. ARFs are approximately 20-kDa GTPases that are active with bound GTP and 
inactive with GDP bound. The novel protein contains an ATP/GTP-binding site motif A (P-loop) 
and seems to be a novel ARF. It seems to have an important role in vesicular transport and 
vesicular trafficking. 

The new protein can find application in modulating vesicle transport and trafficking in cell 

nearly identical to mouse Arl6, ADP-ribosylation-llke factor homolog 
start at Bp 15 matches kozak consensus ANNatgG 
Sequenced by LMU 
Locus: unknown 
Insert length: 717 bp 

Poly A stretch at pos. 689, no polyadenylation signal found 

1 ATTTGAATCA CATTATGGGA TTGCTAGACA GACTTTCAGT CTTGCTTGGC 
51 CTGAAGAAGA AGGAGGTTCA TGTTTTGTGC CTTGGGCTAG ATAATAGTGG 
101 CAAAACGACG ATCATTAACA AACTTAAACC TTCAAATGCT CAATCTCAAA 
151 ATATCCTTCC AACAATAGGA TTCAGCATAG AGAAATTCAA ATCATCCAGT 
201 TTGTCATTTA CAGTGTTTGA CATGTCAGGT CAAGGAAGAT ACAGAAATCT 
251 CTGGGAACAC TATTATAAAG AAGGCCAAGC TATTATTTTT GTCATTGATA 
301 GTAGTGAIAG ATTAAGAATG GTTGTGGCCA AAGAAGAACT CGATACTCTT 
351 CTGAATCATC CAGATATTAA ACACCGTCGA ATTCCAATCT TATTCTTTGC 
401 AAATAAAATG GATCTTAGAG ATGCAGTGAC ATCTGTAAAA GTGTCTCAGT 
451 TGCTGTGTTT AGAGAACATC AAAGATAAAC CCTGGCATAT TTGTGCTAGT 
501 GATGCCATAA AAGGAGAAGG CTTGCAAGAA GGTGTAGACT GGCTTCAAGA 
551 TCAGATCCAG ACTGTGAAGA CATGAAAAGA TAATAGTTGG AAACCTCAGC 
601 AATTTTCAAT TCAAGGAATC TATCTAAGAC AAATAGAATA CATTTTGTAA 
651 AAGATGTTTA TGCATCAAAA AATATAATTT TCTGCTTGCA AAAAAAAAAA 
701 AAAAAAAAAA AAAAAAG 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 15 bp to 572 bp; peptide length: 186 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: ATP GTP A (24-32) 



1 MGLLDRLSVL LGLKKKEVHV LCLGLDNSGK TTIINKLKPS NAQSQNILPT 

51 IGFSIEKFKS SSLSFTVFDM SGQGRYRNLW EHYYKEGQAI IFVIDSSDRL 

101 RMVVAKEELD TLLNHPDIKH RRIPILFFAN KMDLRDAVTS VKVSQLLCLE 

151 NIKDKPWHIC ASDAIKGEGL QEGVDWLQDQ IQTVKT 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_23111, frame 3 

TREMBL : AFO 3 1 9 0 3_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 
homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds., N = 1, Score = 923, P = l.le-92 

TREMBL :CEC38D4_5 gene: "C38D4.8"; Caenorhabditis elegans cosmid C38D4, 
N = 1, Score = 418, P = 3.6e-39 

PIR:S66337 ADP-ribosylation factor 1 - Chlamydomonas reinhardtii, N = 
1, Score = 373, P = 2.1e-34 

SWISSPR0T:ARF1_CHLRE ADP-RIBOSYLATION FACTOR 1., N = 1, Score = 372, P 
= 2.7e-34 



>TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 

homolog ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 
(Arl6) mRNA, complete cds. 
Length = 186 

HSPs: 



Score 


= 923 


(138.5 bits). Expect = l.le-92, P = l.le-92 




Identities = 


= 178/186 (95%), Positives = 184/186 (9B%) 




Query : 


1 


MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 


60 






MGLLDRLS LLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQ+I +PTIGFSIEKFKS 




Sbjct: 


1 


MGLLDRLSGLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQDIVPTIGFSIEKFKS 


60 


Query: 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 


120 






SSLSFTVFDMSCQGRYRNLWEHYYK+GQAIIFVIDSSD+LRMVVAKEELDTLLNHPDIKH 




Sbjct: 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKDGQAI I FVIDSS DKLRMVVAKEELDTLLNHPDIKH 


120 


Query : 


121 


RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPKHICASDAIKGEGLQEGVDWLQDQ 


180 






RRIPILFFANKMDLRD+VTSVKVSQLLCLE+IKDKPWHICASDAIKGEGLQEGVDWLQDQ 




Sbjct: 


121 


RRIPILFFANKMDLRDSVTSVKVSQLLCLESIKDKPWHICASDAIKGEGLQEGVDWLQDQ 


180 


Query : 


181 


IQTVKT 186 








IQ VKT 




Sbjct : 


181 


IQAVKT 186 





Pedant information for DKFZphtes3_23111, frame 3 



Report for DKFZphtes3_23111 . 3 



[LENGTH] 186 

[MM] 21097.69 

[pi] 8.72 

[ HOMOL ] TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor homolog 

ARL6"; Mus musculus ADP-ribosylation-like factor homolog ARL6 (Arl6) mRNA, complete cds. 4e-94 



[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 

le-36 

[FUNCAT] 
YDL137W] 
[FUNCAT] 
palmityl 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 

[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-04 
[FUNCAT] 
4e-04 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 



30.08 organization of golgi [S. cerevisiae, YDL192w] le-36 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w] le-36 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL192w] 



2e-36 
ation, 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



06.07 protein modification (glycolsylation, acylation, myristylation, 



farnesylation and processing) [S. 

30.03 organization of cytoplasm [S. 
03.22 cell cycle control and mitosis [S. 

30.04 organization of cytoskeleton [S. 
r general function prediction [M . 
30.02 organization of plasma membrane 



cerevisiae, 
cerevisiae , 
cerevisiae , 
cerevisiae , 



YBR164c] 2e-32 

YBR164C] 2e-32 

YMR138w] 4e-19 

YMR138w] 4e-19 



jannaschii, MJ1339] 2e-05 

[S. cerevisiae, YHR005c] 4e-05 



03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YHR005c] 4e-05 

10.05.07 g-proteins [S. cerevisiae, YHR005c] 4e-05 
08.13 vacuolar transport [S. cerevisiae, YKR014c] 2e-04 

08.19 cellular import [S. cerevisiae, YKR014c] 2e-04 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YKR014c] 
03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL005w] 
BL01288C 

BL01020C SARI family proteins 

BL01019C ADP-ribosylation factors family proteins 
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[BLOCKS] BL01019B ADP-ribosylation factors family proteins 

[BLOCKS] BL01019A ADP-ribosylation factors family proteins 

[SCOP] dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai 2e-45 

[SCOP] dlmhl 3.29.1.4.2 Racl [Human (Homo sapiens) 2e-46 

[SCOP] d5p21 3.29.1.4.1 cH-p21 Ras protein [human (Homo sapiens) 5e-37 

[SCOP] dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 (ARF1) [human (Horn 4e-61 

[SCOP] dla2kc_ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) [Do 4e-33 

[PIRKW] glycoprotein 2e-33 

[PIRKW] monomer 3e-31 

[PIRKW] P-loop 2e-35 

[PIRKW] lipoprotein 2e-33 

[PIRKW] GTP binding 2e-35 

[SUPFAM] ADP-ribosylation factor 2e-35 

[PROSITE] ATP_GTP_A 1 

[PFAM] ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

[KW] Alpha_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 5.91 % 

SEQ MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNILPTIGFSIEKFKS 

SEG . .xxxxxxxxxxx 

IhurA CCCCEEEEEETTTTCHHHHHHHHCCCCEEEE— EEETTEEEEEEEE 



SEQ SSLSFTVFDMSGQGRYRNLWEHYYKEGQAI IFVIDSSDRLRMVVAKEELDTLLNHPDIKH 

SEG 

IhurA TTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHHTTTT-- 

SEQ RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 

SEG 

IhurA TTTEEEEEEETTTTTTTCCHHHHHHHHCGGGTTTTCEEEEECBTTTTBTHHHHHHHHHHH 

SEQ IQTVKT 

SEG 

IhurA HHHHC. 



Prosite for DKFZphtes3_23111.3 
PS00017 24->32 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_23111.3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

* GMgWf s I Fr kMWGlWNKEMRI LMLGLDNAGKTTI LYMLK1 gE . . IVTTI 
MG++ ++ ++GL +KE+++L LGLDN+GKTTI+++LK+ ++ 
1 -MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNIL 4 8 

PTIGFNVETVeYKNIKFNVWDVGGQdsIRPYWRHYYpNTDGIIWVVDSaD 
PTIGF +E+ + ++F+V+D GQ + R +W HYY + ++II+V+DS+D 
49 PT I GFS I EKFKS SSLSFTVFDMSGQGRYRNLWEHYYKEGQAI I FVIDSSD 98 

RDRMeEaKqELHaMLNEEEL. . rDAPlLIFANKQDLPgAMSesEIREaLG 
R RM AK+EL+ +LN+ ++ R+ P+L FANK DL++A+++ +++ +L 
99 RLRMVVAKEELDTLLNHPDIKHRRIPILFFANKMDLRDAVTSVKVSQLLC 148 

LHelRCnRPWYIQMCCAVtGEGLYEGMDWLSNYIr.KRkK* 
L++I+ + PW+I +++A++GEGL+EG DWL ++I+ K 
149 LENIK-DKPWHICASDAIKGEGLQEGVDWLQDQIQTVKT 186 
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DKFZphtes3_23nl9 



group: testes derived 

DKFZphtes3_23nl9 encodes a novel 387 amino acid protein with similarity to rat protein kinase 
C-interacting RBCC protein 1. 

The novel protein contains not the RING-B box-coiled coil (RBCC) motif of RBCC protein 1, and 

thus is not a member of this subgroup of RING finger proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCK1 

Sequenced by LMU 

Locus : unknown 



Insert length: 1579 bp 

Poly A stretch at pos . 1535, polyadenylation signal at pos. 1515 



1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
4 01 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
4 51 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 
1451 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 

Peptide information for frame 2 
ORF from 209 bp to 1369 bp; peptide length: 387 
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Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LG AG P DAE AQ LRRLQLSADP 
51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 
101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 
151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 
201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 
251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 
301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 
351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score = 353, P = 2.8e-32 

TREMBL:AB0113 69_1 product: "RBCK2" ; Rattus norvegicus mRNA for RBCK2, 
complete cds . , N = 1, Score = 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N = 1 , Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, P = 9.3e-34 



>TREMBLNEW : AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length = 498 

HSPs: 



Score = 367 (55.1 bits), Expect = 9.3e-34, P = 9.3e-34 
Identities = 95/212 (44%), Positives - 129/212 (60%) 

Query: 175 LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATKLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 56 

Query: 235 ASS AHV ALQVHPHCTV AALQEQVFSELGFPPAVQRW VI GRCLCVPERS LAS YGVRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG — RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 346 SPLQP— SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 

+P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 

Pedant information for DKFZphtes3_23nl9, frame 2 



Report for DKFZphtes3_23nl9.2 

[LENGTH] 387 

[MW] 39949.29 

[pi] 5.53 

[HOMOL ] TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 
[BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAAS DLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 



PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 
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SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 



(No Prosite data available for DKFZphtes3_23nl9 . 2) 
(No Pfam data available for DKFZphtes3_23nl9.2) 



similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCK1 



Sequenced by LMU 
Locus : unknown 



Insert length: 1579 bp 

Poly A stretch at pos. 1535, polyadenylation signal at pos . 1515 



1 CGGAGACCCT CGGGCCGTGT CCAITTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 
2 51 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 
351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 
4 01 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
4 51 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 
701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 
751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
8 01 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 
1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 
1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 
1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 
1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 
1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 
1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 
1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 
1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 
14 01 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 

14 51 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 

15 51 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 
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BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MAPPAGGAAA AAS DLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 

51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 

101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 

151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAT AGGDEKG AAQVAAVLAQ 

201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 

251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 

301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 

351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score = 353, P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2"; Rattus norvegicus mRNA for RBCK2, 
complete cds . , N — 1 , Score = 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, P - 9.3e-34 



>TREMBLNEW: AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 2 8 mRNA, complete cds. 
Length = 498 

HSPs: 



Score 


= 367 


(55.1 bits), Expect = 9.3e-34, P - 9.3e-34 




Identities ■ 


= 95/212 (44%), Positives = 129/212 (60%) 




Query: 


175 


LAGSLARAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASA 


234 






+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRL V++EDA 




Sbjct: 


1 


MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRLCVSVEDAYM 


56 


Query: 


235 


ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 


294 






+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 




Sbj ct : 


57 


-HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGIRRNGDG 


115 


Query: 


295 


AFLYLLSAPREAPATGPSPQHPQK MDGELG — RLFPPSLG-LPPG-PQPAASSLP 


345 






A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 




Sbjct: 


116 


AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 


171 


Query: 


346 


SPLQP— SWSCPSCTFINAPDRPGCEMCSTQRPCTW 37 9 








+P P W CP CTFIN P RPGCEMC RP T+ 




Sbjct: 


172 


GQPDAAPESPPVGWQCPGCTFINKPTRPGCEMCCRARPETY 212 





Pedant information for DKFZphtes3_23nl9, frame 2 
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Report for DKFZphtes3_23nl9 . 2 

[LENGTH] 387 

[MW] 39949.29 

[pi] 5.53 

[HOMOL] TREMBLNEW: AF124663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 
[ BLOCKS ] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 

SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtes3_23nl9.2) 
(No Pfam data available for DKFZphtes3_23nl9.2) 
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DKFZphtes3_2 6g22 



group: intracellular transport/trafficking 

DKFZphtes3_2 6g22 encodes a novel 898 amino acid protein with similarity to kinesins. 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a kinesin motor domain 
signature. Kinesin is a microtubule-associated force-producing protein that play a role in 
organelle transport. It is an oligomeric complex composed of two heavy chains and two light 
chains. The kinesin motor activity is directed toward the microtubule's plus end. The heavy 
chain contains a large globular N-terminal domain which is responsible for the motor activity 
of kinesin, which is known to hydrolyze ATP and to bind and move on microtubules. Several 
proteins involved in chromosome segregation and cell divsion contain this motor domain, such 
as drosophila claret segregational protein (ncd) , Drosophila kinesin-like protein (nod), human 
CENP-E and human mitotic kinesin-like protein-1 (MKLP-1) . The novel protein is a new kinesin 
like proptein. 

The new protein can find application in modulating chromosome transport in mitosis and meiosis 
and modulation of cell division. 



strong similarity to kinesins 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3032 bp 

No poly A stretch found, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 



CTGAAGCGCT 
CCTGGGCCTG 
TATACAGATA 
CTGTGCCACC 
AGAAAAAGCA 
TCCTAGTTTT 
AAAACTACAA 
TGTATTTGAT 
AACACACTAC 
ACAGTACTTG 
AGGATCAGCT 
ACAAATGCAT 
TCATATCTGG 
AGGGCCACTT 
GACTTACTTT 
GATAATGGAA 
ATCTTCTCGT 
AAACAGCAAG 
GACCTGGCAG 
ATTTGTAGAA 
TCATCAATGC 
AGAAATAGTA 
TCAAACTATA 
AC AC AT AT AA 
TCTTTGAAGA 
AAAGATCTGT 
TAAAAGCCTA 
AAGTTAATGA 
AATCCTGAAC 
TGAAGTTGGA 
CAACAGTGCC 
AAAGGCCACT 
GCTCCTACCT 
AATACTAATT 
AAACGGTCAT 
TGCACCTCCA 
CTAGCTTGTC 
TGCTTTACTT 
GCCTGTCAAA 
GTAGAGAGGA 
AAAGCAAAAC 
TTGGACCAGT 
CTGGTTAAGA 
TCCCTTGAAA 
AGCTCAATGA 
GAAGACTGTA 
ACCATCATCA 
GTGATAATTG 



GGGAGGCGGA 
AAGTGAGTGA 
GGAATCAAGA 
ATATGAAAGT 
GCTGGATTTC 
TGATCCCAAA 
ATCAAAATGT 
GCTGTTTTTG 
TAAGCCAATT 
CCTATGGTGC 
GATGAACCTG 
GGATGAGATT 
AGGTATATAA 
GCTGTCCGGG 
ACACCAGCCC 
ACAAAAACAG 
TCTCATGCTG 
TATCAATCAA 
GATCTGAGCG 
GGCACAAATA 
CTTAGCAGAT 
AGCTTACTCG 
ATGATAGCTG 
CACTCTTAAG 
GCAATGTTCT 
AATGAGCAGA 
TGAAGAACAG 
TTTCAAACCC 
TGCTTGTTCC 
AATGTTACTT 
ATAAACAAAT 
GGAAAACGAG 
GGAGAAAAGG 
GGCTCCATCG 
ATTCCAAAGG 
GAACAAAGAT 
TTCAGGAACA 
CCAACCCTAA 
TGCTGCTTTT 
AAAAAGTGGT 
GATCTACCAG 
TCAGCCTATT 
TTCCTACAGA 
GGACAGCATA 
TTCTCTTAGC 
GAAAAGCTTT 
TTTACTACAA 
TCTGAAAATG 



CATTAAAGTG 
GAGGCACATG 
TAATCAACAA 
AGTAGTTCGT 
ATAAAGTGGT 
CAAGAAGAAG 
TATAAAGAAA 
ATGAAACGTC 
CTTCGTAGTT 
CACTGGTGCT 
GAGTGATGTA 
AAAGAAGAGA 
TGAACAGATT 
AAGATACCCA 
AAATCCTCAG 
GACACAACAT 
TTTTCCAAAT 
AATGTCCGTA 
AGCAAGTACT 
TTAATAGATC 
TCAAAGAGAA 
CTTGTTAAAG 
CTGTTAGTCC 
TATGCTAACC 
TAATGTCAAT 
AGGCAGAGAT 
AAAGCCTTCA 
TCAGGAAAAA 
AGAATCGAGA 
AAAGAAAATG 
AGAAATGATG 
ATCATAGACT 
AGGGAGGAGG 
TGTCGAAAAA 
AACTCAAGAA 
TTGAAAGCAC 
GCAACACAGG 
GAAAACAATA 
GAATCTGACT 
AGTTTGGGCT 
GGATTTCTGT 
CCTTGTTGCT 
AAAAAGAACT 
CTCTAAAGTC 
AAAGAACTTC 
TCAAAATCCG 
GTTTTCAGGC 
TTGTGTGAAG 



AAGTGGTTGC 
AAGAGAAGTA 
TGTCIGTCAC 
GTACGTCCGG 
TCATGTTGTG 
TCAGTTTTTT 
CAAAATAAGG 
AACTCAGTCA 
TTTTGAATGG 
GGGAAGACCC 
TCTAACAATG 
AAATATGTAG 
CGTGATCTCT 
AAAAGGGGTG 
AAGAAATTTT 
CCCACTGATA 
TTACTTGCGA 
TTGCCAAGAT 
TCCGGTGCTA 
ACTTTTAGCT 
AGAATCAGCA 
GATTCTCTTG 
TTCCTCTGTA 
GGGCAAAGGA 
AATCATATAA 
TTTATTGTTA 
CTAATGAAAA 
GAAATCGAAA 
AGAAATTAGA 
AACTTAAATC 
TGTTCTGAAG 
TGCAATGTTG 
AATTGAAGCA 
GAAATGGGAC 
AGATCTTCAT 
AAATTACACA 
CAGACTGAAG 
TTGCACATTA 
TCAAAGAGAT 
GACCAAACTG 
TCTTATGACC 
CATCTTCAGG 
CGGAGAAAAC 
TCCACCATCT 
AGCCTATTGT 
TCTACAGTAA 
TATCAGCTCA 
TAGCTATCCC 



GGTAACCTGG 
TTCAAGTATT 
TGAGGAAGAC 
AAAACACTAA 
GATAAACATA 
CCATGGAAAG 
ATCTTAAATT 
GAAGTTTTTG 
ATATAATTGC 
ACACTATGCT 
TTACACCTTT 
TACTGCAGTT 
TAGTAAATTC 
GTCGTTCATG 
ACATTTATTG 
TGAATGCCAC 
CAACAAGACA 
GTCACTCATT 
AGGGGACCCG 
CTTGGGAATG 
TATCCCTTAC 
GAGGAAACTG 
TTCTACGATG 
CATTAAATCT 
CTCAATATGT 
AAAGAAAAAC 
TGACCAAGCA 
GGTTTCAAGA 
CAAGAATATC 
ATTCTACCAA 
ACAAAGTAGA 
AAAACTCGTC 
ATTTGATGAG 
TCTTAAGTCA 
TGTCACCATT 
TATGATGGAT 
CAGTATTGAA 
AAAGAAGCCG 
CGAACATTTG 
CCGAACAACC 
TTTCCACAAC 
TGGAACTAAT 
TAATGCCATC 
CAAAGTGTGC 
ATATACACCA 
CCTTAATGAA 
AACATAAACA 
TCATAATAGA 
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2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



AGAAAAGAAT 
AGACATCAAG 
ATGATAACAA 
AAGCATTCTA 
TACTGCTGCC 
CGTTAACTGC 
AATTCAAGTG 
AAGAAACATC 
ATATTTCAAA 
AAGTTGATCA 
TATTTAAAAT 
TACTTTCAGC 
CTAAAAAAAT 



GTGGACAGGA 
AGCTCGAAGT 
AGACATTTTA 
TGCCTGTACC 
AAAAGGAAAC 
AGACGTAAAT 
AGAAGCACTT 
TGTAAAATAA 
AGGAAATCTA 
AATCTGCTTT 
CTTTGAAAGA 
AAGCAGAAAA 
AAAATTTCAA 



GGACTTGGAC 
GTAAATTACC 
CAACGGCTTG 
AAGCATGGTG 
GGAAATTAAC 
TCTGGATTTG 
ACAAGAAAAC 
ATCCAAGCAT 
AGATAAATCA 
TCAAAGTTTA 
AGACCCATCT 
ATGAAACTCT 
AAGAAAAAAA 



TCTACATTTA 
CGAACAAGAA 
ATCCTTCTTC 
CCATCCTACA 
AAGTTCTACA 
CCAAACGTGT 
AAACCAACAA 
GGTTAGAAAA 
CTTCAAAACC 
TCAATACCCT 
TAAAGCTAAG 
TTGTTTTCTT 
AA 



CTATATGTGA 
TCACTACCAA 
ATTCTCAACT 
TGGCAATGAC 
TCAAACAGTT 
TCGACAAGAT 
TGGAACATAA 
TTTGGAAGAA 
AAGCAAAATG 
TTCAAAAATA 
TTTACCCAAG 
CTTTTGTGTT 



No BLAST result 



No Medline entry 



BLAST Results 



Medline entries 



Peptide information for frame 1 



ORF from 130 bp to 2823 bp; peptide length: 898 
Category: strong similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (113-121) 
KINESIN_MOTOR_DOMAINl (252-264) 



1 MSVTEEDLCH 
51 VSFFHGKKTT 
101 FLNGYNCTVL 
151 KICSTAVSYL 
201 EEILHLLDNG 
251 IAKMSLIDLA 
301 KNQHIPYRNS 
351 RAKDIKSSLK 
401 TNENDQAKLM 
451 ELKSFYQQQC 
501 ELKQFDENTN 
551 QIRHMMDLAC 
601 FKEIEHLVER 
651 SSSGGTNLVK 
701 QPIVYTPEDC 
7 51 VAI PHNRRKE 
801 DPSSFSTKHS 
851 AKRVRQDNSS 



HMKVVVRVRP 
NQNVIKKQNK 
AYGATGAGKT 
EVYNEQIRDL 
NKNRTQHPTD 
GSERASTSGA 
KLTRLLKDSL 
SNVLNVNNHI 
ISNPQEKEIE 
HKQIEMMCSE 
WLHRVEKEMG 
LQEQQHRQTE 
KKVVVWADQT 
IPTEKRTRRK 
RKAFQNPSTV 
CGQEDLDSTF 
MPVPSMVPSY 
EKHLQENKPT 



ENTKEKAAGF 
DLKFVFDAVF 
HTMLGSADEP 
LVNSGPLAVR 
MNATSSRSHA 
KGTRFVEGTN 
GGNCQTIMIA 
TQYVKICNEQ 
RFQEILNCLF 
DKVEKATGKR 
LLSQNGHIPK 
AVLNALLPTL 
AEQPKQNDLP 
LMPSPLKGQH 
TLMKPSSFTT 
TICEDIKSSK 
MAMTTAAKRK 
MEHKRNICKI 



HKVVHVVDKH 
DETSTQSEVF 
GVMYLTMLHL 
EDTQKGVVVH 
VFQI YLRQQD 
INRSLLALGN 
AVSPSSVFYD 
KAEILLLKEK 
QNREEIRQEY 
DHRLAMLKTR 
ELKKDLHCHH 
RKQYCTLKEA 
GISVLMTFPQ 
TLKSPPSOSV 
SFQAISSNIN 
CKLPEQESLP 
RKLTSSTSNS 
NPSMVRKFGR 



ILVFDPKQEE 
EHTTKPI LRS 
YKCMDEIKEE 
GLTLHQPKSS 
KTASINQNVR 
VINALADSKR 
DTYNTLKYAN 
LKAYEEQKAF 
LKLEMLLKEN 
RSYLEKRREE 
LHLQNKDLKA 
GLSNAAFESD 
LGPVQPIPCC 
QLNDSLSKEL 
SDNCLKMLCE 
NDNKDILQRL 
SLTADVNSGF 
NISKGNLR 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_26g22, frame 1 

SWISSPROT: YB3D_SCHPO PUTATIVE KINES IN-LIKE PROTEIN C2F12.13., N = 3, 
Score = B74, P = 9e-93 

TREMBL:DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
melanogaster kinesin like protein 67a mRNA, complete cds., N — 1, Score 
= 880, P = 4 .2e-88 

TREMBL:SPBC64 9_1 gene: "SPBC64 9 . 01c" ; product: "putative kinesin-like 
protein"; S.pombe chromosome II cosmid c649., N = 3, Score = 814, P = 
9.8e-86 

PIR:S64238 kinesin-related protein KIP3 - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 802, P = 2.5e-83 



>TREMBL : DMU89264_1 product: "kinesin like protein 67a" 



Drosophila 
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melanogaster kinesin like protein 67a mRNA, complete cds. 
Length = 814 

HSPs: 



Score = 880 (132.0 bits), Expect = 4.2e-88, P = 4.2e-88 
Identities = 181/345 (52%), Positives = 238/345 (68%) 



Query: 


11 


HMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFF-HGKKTTNQNVIKKQN 


69 






++KV VRVRP N +E ++ V+D+ L+FDP +E+ FF G K +++ K+ N 




Sbjct: 


8 


NIKVAVRVRPYNVRELEQKQRSIIKVMDRSALLFDPDEEDDEFFFQGAKQPYRDITKRMN 


67 


Que ry : 


70 


KDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKTHTMLGSADE 


129 






K L FD VFD ++ ++FE T P++ + LNGYNC+V YGATGAGKT TMLGS 




Sbjct: 


68 


KKLTMEFDRVFDIDNSNQDLFEECTAPLVDAVLNGYNCSVFVYGATGAGKTFTMLGSEAH 


127 


Query: 


130 


PGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVREDTQKGVVV 


189 






PG+ YLTM L+ + + + VSYLEVYNE + +LL SGPL +RED GVVV 




Sbjct: 


128 


PGLTYLTMQDLFDKIQAQSDVRKFDVGVSYLEVYNEHVMNLLTKSGPLKLREDNN-GVVV 


186 


Query: 


190 


HGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNV 


249 






GL L S+EE+L +L GN +RTQHPTD NA SSRSHA+FQ+++R ++ + V 




Sbjct: 


187 


SGLCLTPI YSAEELLRMLMLGNSHRTQHPTDANAESSRSHAI FQVHIRITERKTDTKRTV 


246 


Query : 


250 


RIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKRKNQHIPYRN 


309 




K+S+IDLAGSERA+++ G RF EG +IN+SLLALGN IN LAD + HIPYR+ 




Sbjct: 


247 


KLSMI DLAGSERAASTKGIGVRFKEGASINKSLLALGNC I NKLADGLK HIPYRD 


300 


Query: 


310 


SKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYANRAKDI 355 








S LTR-t-LKDSLGGNC+T+M+A VS SS+ Y+DTYNTLKYA+RAK I 




Sbjct: 


301 


SNLTRILKDSLGGNCRTLMVANVSMSSLTYEDTYNTLKYASRAKKI 346 





Pedant information for DKFZphtes3_26g22, frame 1 



Report for DKFZphtes3_26g22 . 1 



[LENGTH] 
[MWJ 
[pi] 
[HOMOL] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 

[ 

[FUNCAT] 

[FUNCAT] 

4e-28 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 



[S. 



102281.63 
9.09 

SWISSPROT: YB3D_SCHPO PUTATIVE KINESIN-LIKE PROTEIN C2F12.13. 3e-97 

30.04 organization of cytoskeleton [S. cerevisiae, YGL216w] 2e-88 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL216w] 2e-88 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YGL216w] 2e-88 

30.10 nuclear organization [S. cerevisiae, YGL216w] 2e-88 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c] 5e-42 

06.10 assembly of protein complexes [S. cerevisiae, YPR141c] 5e-42 

03.13 meiosis [S. cerevisiae, YPR141c] 5e-42 

11.01 stress response [S. cerevisiae, YPR141c] 5e-42 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPR141C] 5e-42 

30.05 organization of centrosome [S. cerevisiae, YPR141c] 5e-42 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKL079w] 

BL00411H 
BL00411G 
BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 



d2kin.l 3.29.1.5.3 Kinesin 

d3kar 3.29.1.5.4 Kinesin 

nucleus 6e-87 
heterodimer 4e-68 
DNA binding 9e-60 
heterotetramer 2e-54 
mitosis 9e-60 
microtubule binding 4e-68 
ATP 6e-87 

phosphoprotein 5e-59 

heterotrimer 4e-68 

purine nucleotide binding le-26 

P-loop 6e-87 

coiled coil 4e-68 

heptad repeat 3e-62 

methylated amino acid 2e-54 

hydrolase 2e-54 

GTP binding le-60 



[Rat (Rattus norvegicus) le-117 
[Baker's yeast (Saccharomyce le-112 
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[PIRKW] cell division 5e-57 

[SUPFAM] kinesin-related protein KIP1 3e-50 

(SUPFAM] kinesin-related protein C1N8 7e-33 

[SUPFAM] kinesin heavy chain 2e-54 

[SUPFAM] suppressor protein SMY1 le-26 

[SUPFAM] kinesin-related protein KIF3 4e-68 

[SUPFAM] kinesin-related protein KIF2 le-46 

[SUPFAM] kinesin-related protein unc-104 7e-60 

[SUPFAM] unassigned kinesin-related proteins 6e-87 

[SUPFAM] centromere protein E 3e-54 

[SUPFAM] kinesin-related protein KLP61F 5e-57 

[SUPFAM] kinesin-related protein MKLP-1 2e-28 

[SUPFAM] pleckstrin repeat homology 7e-60 

[SUPFAM] kinesin-related protein KI FIB 4e-61 

[SUPFAM] kinesin motor domain homology 6e-87 

[SUPFAM] kinesin-related protein KLPA le-43 

[SUPFAM] kinesin-related protein nodA le-30 

[SUPFAM] kinesin-related protein Eg5 5e-59 

[PROSITE] ATP_GTP_A 1 

[PROSITE] KINESIN_MOTOR_DOMAINl 1 

[ PFAM] Kinesin motor domain 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 8.57 i 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 



MSVTEEDLCHHMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFFHGKKTT 

TBEEE 

NQNVIKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKT 
EEEETTTTTTEEEEEETEEETTTTCHHHHHHHHHH-HHHGGGGCCCEEEEEECTTTTCHH 
HTMLGSADEPGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVR 
HHHHTTTT— THHHHHHHHHHHHHHHHGGGCEEEEEEEEEEEETTEEEETT-TCCCCEEE 
EDTQKGVVVHGLTLHQPKSSEEILHLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQD 
EETTTEEEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEEEEE 
KTASINQNVRI AKMSLI DLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKR 
TTTTCEE EEEEEEEECCCCCCCCCC HHHHHHHHHHHHHHHHHHHHHHHHTTTT 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 



KNQHI PYRNSKLTRLLKDSLGGNCQTIMIAAVS PSSVFYDDT YNTLKYANRAKDIKSSLK 

xxxxx 

TTTCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHHH 

SNVLNVNNHITQYVKICNEQKAEILLLKEKLKAYEEQKAFTNENDQAKLMISNPQEKEIE 
xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 



SEQ 
SEG 
3kar- 



RFQEILNCLFQNREEIRQEYLKLEMLLKENELKSFYQQQCHKQIEMMCSEDKVEKATGKR 
xxxxxxxxxxxxx 



SEQ 
SEG 
3kar- 



DHRLAMLKTRRSYLEKRREEELKQFDENTNWLHRVEKEMGLLSQWGHI PKELKKDLHCHH 
xxxxxxxxxxx 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 



LHLQNKDLKAQIRHMMDLACLQEQQHRQTEAVLNALLPTLRKQYCTLKEAGLSNAAFESD 
XXX 



FKEIEHLVERKKVVVWADQTAEQPKQNDLPGISVLMTFPQLGPVQPIPCCSSSGGTNLVK 



IPTEKRTRRKLMPSPLKGQHTLKSPPSQSVQLNDSLSKELQPIVYTPEDCRKAFQNPSTV 



TLMKPSSFTTSFQAISSNINSDNCLKMLCEVAIPHBRRKECGQEDLDSTFTICEDIKSSK 



CKLPEQESLPNDNKDILQRLDPSSFSTKHSMPVPSMVPSYMAMTTAAKRKRKLTSSTSNS 

XXXXXXXXXXXXX 
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SEQ SLTADVNSGFAKRVRQDNSSEKHLQENKPTMEHKRNICKINPSMVRKFGRNT SKGNLR 

SEG xxx 

3kar- 



Prosite for DKFZphtes3_26g22 . 1 

PS00017 113->121 ATP_GTP_A PDOC00017 

PS00411 252->264 KINESIN MOTOR DOMAIN1 PDOC00343 



Pfam for DKFZphtes 3_2 6g22 . 1 



HMMNAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds 

R+RP N +E+++G +VV + + + + +++E S 

Query n rvrpentkekaagfhkvvhvvd-khilvfdpkqeevsffhgkkttnqnv 64 

HMM phks Ft FDHVFWWncTQedVYdtvAHPI VDDcFhGYNCT I FAYGQ 

+ F+FD VF+ ++TQ +V++ + PI+ ++++GYNCT++AYG 
Query 65 IKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGA 114 

HMM TGSGKTYTMMGpggehPDHmGIIPRcCHDIFdrldkfqekDhdFWhVkCS 
TG+GKT+TM G + D+ G+ + +++++ D + + + +S 
Query 115 TGAGKTHTMLG SADEPGVMYLTMLHLYKCMDEIK-EEKIC-STAVS 158 

HMM YMEI YNEe I YDLLCPnPqhMkpLnlKEHPNMGpYVqGCTEf HVcSYeDac 
Y+E+YNE+I+DLL+ N ++PL+++E+ G+ V G+T+ +S E+++ 
Query 159 YLEVYNEQIRDLLV-N SGPLAVREDTQKGVVVHGLTLHQPKSSEEIL 204 

HMM hWIWqGnknRHVAaTnMNdhSSRSHtlFTIHVeQrHk. .qcdehvcHSKM 

H +++ GHKNR+ +T MN + -KSSRSH+ + F+I ++Q K + V++ KM 

Query 205 HLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNVRIAKM 254 

HMM NLVDLAGSERvnrTGAEGQRlKEGcNINqSLttLGnVInaLaDgqTKYmY 

+L+DLAGSER++ +GA G+R+ EG+NIN+SL++LGNVINALAD + 
Query 255 SLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSK 299 

HMM gghgHI PYRDSKLTW1 LQDSLGGNcKTcMI ACIWPadWNYEETLSTLRYA 

+++HIPYR SKLT+LL+DSLGGNC T MIA+++P+ + Y++T +TL+YA 
Query 300 RKNQHI PYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYA 349 

HMM dRAKnlkNkPQINEDPcamalWRrYheQIqdMKhqL* 

+RAK+IK +N++ ++Y+ + K++ 
Query 350 NRAKDIKSSLKSNVLNVN-NHITQYVKICNEQKAEI 384 
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DKFZphtes3_27dl 



group: metabolism 

DKFZphtes3_27dl encodes a novel 712 amino acid protein similar to ubiquitin-specif ic proteases 
(EC 3.1.2.15) . 

The novel protein contains both, a ubiquitin carboxyl-terminal hydrolases family 2 signature 1 
and signature 2. Pfam predicts a new member of the ubiquitin carboxyl-terminal hydrolases 
family 2. The ubiquitin system is responsible for the turn over of proteins. Ubiquitin 
carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The novel protein is a new member of the ubiquitin carboxyl-terminal hydrolases family 2, 
represented by proteins such as yeast UBP1-16, human tre-2, human isopeptidase T and others. 

The novel protein can find application in modulation of ubiquitin- and protein metabolism in 
cells . 



similarity to ubiquitin-specif ic proteases 

complete cDNA, complete cds, 4 EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2871 bp 

Poly A stretch at pos. 2836, no polyadenylation signal found 



1 CCAAACCTGA AAGAGGTTGA TTTGTAATGA TTTGCAGGGG GGCACTGGAG 
51 GCAGCGGCCA GGACTTTTCA CTTAGGAGAT CAGCATTTGC CCTGATGGAA 
101 ACTGGGCGAT CCTGCAGGGA CTGACCTCTG AGTTATCCAA AGGCCGACCT 
151 GGGGAAAGAC TGATTTTGAG GTTTTAATAG TTTTCAGATG CTTCAAGTGT 
201 TGTGAACAGA GACTTGTTTG GATTATGCAT TTCTCAGCTA GACTAAATAA 
251 ATGCTAGCAA TGGATACGTG CAAACATGTT GGGCAGCTGC AGCTTGCTCA 
301 AGACCATTCC AGCCTCAACC CTCAGAAATG GCACTGTGTG GACTGCAACA 
351 CGACCGAGTC CATTTGGGCT TGCCTTAGCT GCTCCCATGT TGCCTGTGGA 
4 01 AGATATATTG AAGAGCATGC ACTCAAGCAC TTTCAAGAAA GCAGTCATCC 
451 TGTTGCATTG GAGGTGAATG AGATGTACGT TTTTTGTTAC CTTTGTGATG 
501 ATTATGTTCT GAATGATAAC GCAACTGGAG ACCTGAAGTT ACTACGACGT 
551 ACATTAAGTG CCATCAAAAG TCAAAATTAT CACTGCACAA CTCGTAGTGG 
601 GAGGTTTTTA CGGTCCATGG GTACAGGTGA TGATTCTTAT TTCTTACATG 
651 ACGGTGCCCA ATCTCTGCTT CAAAGTGAAG ATCAACTGTA TACTGCTCTT 
701 TGGCACAGGA GAAGGATACT AATGGGTAAA ATCTTTCGAA CATGGTTTGA 
7 51 ACAATCACCC ATTGGAAGAA AAAAGCAAGA AGAACCATTT CAGGAGAAAA 
801 TAGTAGTAAA AAGAGAAGTA AAGAAAAGAC GGCAGGAATT GGAGTATCAA 
851 GTTAAAGCAG AATTGGAAAG TATGCCTCCA AGAAAGAGTT TACGTTTACA 
901 AGGGCTCGCT CAGTCGACCA TAATAGAAAT AGTTTCTGTT CAGGTGCCAG 
951 CACAAACGCC AGCATCACCA GCAAAAGATA AAGTACTCTC TACCTCAGAA 
1001 AATGAAATAT CTCAAAAAGT CAGTGACTCC TCAGTTAAAC GAAGGCCAAT 
1051 AGTAACTCCT GGTGTAACAG GATTGAGAAA TTTGGGAAAT ACTTGCTATA 
1101 TGAATTCTGT TCTTCAGGTG TTGAGTCATT TACTTATTTT TCGACAATGT 
1151 TTTTTAAAGC TTGATCTGAA CCAATGGCTG GCTATGACTG CTAGCGAGAA 
1201 GACAAGATCT TGTAAGCATC CACCAGTCAC AGATACAGTA GTATATCAAA 
1251 TGAATGAATG TCAGGAAAAA GATACAGGTT TTGTTTGCTC CAGACAATCA 
1301 AGTCTGTCAT CAGGACTAAG TGGTGGAGCA TCAAAAGGTA GAAAGATGGA 
1351 ACTTATTCAG CCAAAGGAGC CAACTTCACA GTACATTTCT CTTTGTCATG 
14 01 AATTGCATAC TTTGTTCCAA GTCATGTGGT CTGGAAAGTG GGCGTTGGTC 
14 51 TCACCATTTG CTATGCTACA CTCAGTGTGG AGACTCATTC CTGCCTTTCG 
1501 TGGTTACGCC CAACAAGACG CTCAGGAATT TCTTTGTGAA CTTTTAGATA 
1551 AAATACAACG TGAATTAGAG ACAACTGGTA CCAGTTTACC AGCTCTTATC 
1601 CCCACTTCTC AAAGGAAACT CATCAAACAA GTTCTGAATG TTGTAAATAA 
1651 CATTTTTCAT GGACAACTTC TTAGTCAGGT TACATGTCTT GCATGTGACA 
1701 ACAAATCAAA TACCATAGAA CCTTTCTGGG ACTTGTCATT GGAGTTTCCA 
1751 GAAAGGTATC AATGCAGTGG AAAAGATATT GCTTCCCAGC CATGTCTGGT 
1801 TACTGAAATG TTGGCCAAAT TTACAGAAAC TGAAGCTTTA GAAGGAAAAA 
1851 TCTACGTATG TGACCAGTGT AACTCAAAGC GTAGAAGGTT TTCCTCCAAA 
1901 CCAGTTGTAC TCACAGAAGC CCAGAAACAA CTTATGATAT GCCACCTACC 
1951 TCAGGTTCTC AGACTGCACC TCAAACGATT CAGGTGGTCA GGACGTAATA 
2001 ACCGAGAGAA GATTGGTGTT CATGTTGGCT TTGAGGAAAT CTTAAACATG 
2051 GAGCCCTATT GCTGCAGGGA GACCCTGAAA TCCCTCAGAC CAGAATGCTT 
2101 TATCTATGAC TTGTCCGCGG TGGTGATGCA CCATGGGAAA GGATTTGGCT 
2151 CAGGGCACTA CACTGCCTAC TGCTATAATT CTGAAGGAGG GTTCTGGGTA 
2201 CACTGCAATG ATTCCAAACT AAGCATGTGC ACTATGGATG AAGTATGCAA 
22 51 GGCTCAAGCT TATATCTTGT TTTATACCCA ACGAGTTACT GAGAATGGAC 
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2301 ATTCTAAACT TTTGCCTCCA GAGCTCCTGT TGGGGAGCCA ACATCCCAAT 

2351 GAAGACGCTG ATACCTCGTC TAATGAAATC CTTAGCTGAT CCAAAGACAA 

24 01 TGGGGTTTTC TTCCTGTGAT TTATATATAT ACTTTTTAAA AGACTGATGT 

24 51 ACCATTTTAA ACTTCATTTT TTCTTGTGAA TCAGTGTATA CTACATTTAT 

2501 ACATTTTATA TCTAACAATT TTTTTTTTTT ACAAAGTATA AATGTATATA 

2551 TCAACTGAAG GTAACTACTT TTTTCATATT TGGAGTTTTA AACTTTTGGT 

2601 GTTTACCTCA GACTGATGTT ACCTCTTTTA TATTTTTATG TCTTAATTGG 

2651 CTCGGATGAT GAACTTGTGC AATCTTCTAC CAACAAAGTT CAAGTGGCAT 

2701 CATTTTATAT ACATGTATCT TTTTCAGGTA TTTTCTATAC AAATTCTTAA 

2751 TAGATGGAAA ATTAGACTCT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2801 AAAAAAAAAA AAAAAAAAAA AAGGGGCGGC CGCTCTAAAA AAAAAAAAAA 

2851 AAAAAAAAAA AAAAAAAAAG G 



BLAST Results 



No BLAST result 



Medline entries 



98072201 : 

Regulation of ubiquitin-dependent processes by deubiquitinating 
enzymes . 

98431658: 

The ubiquitin system. 



Peptide information for frame 2 



ORF from 251 bp to 2386 bp; peptide length: 712 
Category: similarity to known protein 
Prosite motif s : UCH_2_1 (274-290) 
UCH_2_2 (619-638) 
UCH 2_2 (619-638) 



1 MLAMDTCKHV 
51 RYIEEHALKH 
101 TLSAIKSQNY 
151 WHRRRILMGK 
201 VKAELESMPP 
251 NEISQKVSDS 
301 FLKLDLNQWL 
351 SLSSGLSGGA 
401 SPFAMLHSVW 
451 PTSQRKLIKQ 
501 ERYQCSGKDI 
551 PVVLTEAQKQ 
601 EPYCCRETLK 
651 HCNDSKLSMC 
701 EDADTSSNEI 



GQLQLAQDHS 
FQESSHPVAL 
HCTTRSGRFL 
IFRTWFEQSP 
RKSLRLQGLA 
SVKRRPIVTP 
AMTASEKTRS 
SKGRKMELIQ 
RLIPAFRGYA 
VLNVVNNIFH 
ASQPCLVTEM 
LMICHLPQVL 
SLRPECFI YD 
TMDEVCKAQA 
LS 



SLNPQKWHCV 
EVNEMYVFCY 
RSMGTGDDSY 
IGRKKQEEPF 
QSTIIEIVSV 
GVTGLRNLGN 
CKHPPVTDTV 
PKEPTSQYIS 
QQDAQEFLCE 
GQLLSQVTCL 
LAKFTETEAL 
RLHLKRFRWS 
LSAVVMHHGK 
YILFYTQRVT 



DCNTTESIWA 
LCDDYVLNDN 
FLHDGAQSLL 
QEKIVVKREV 
QVPAQTPASP 
TCYMNSVLQV 
VYQMNECQEK 
LCHELHTLFQ 
LLDKIQRELE 
ACDNKSNTIE 
EGKIYVCDQC 
GRNNREKIGV 
GFGSGHYTAY 
ENGHSKLLPP 



CLSCSHVACG 
ATGDLKLLRR 
QSEDQLYTAL 
KKRRQELEYQ 
AKDKVLSTSE 
LSHLLIFRQC 
DTGFVCSRQS 
VMWSGKWALV 
TTGTSLPALI 
PFWDLSLEFP 
NSKRRRFSSK 
HVGFEEILNM 
CYNSEGGFWV 
ELLLGSQHPN 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_27dl, frame 2 

PIR:S57591 hypothetical protein YMR223w - yeast (Saccharomyces 
cerevisiae), N = 4, Score = 218, P = 8.4e-38 

SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIC PROCESSING 
PROTEASE 13) (DEUBIQUITINATING ENZYME 11) (KIAA0055)., N = 2, Score = 
300, P = 9.3e-31 

TREMBL : AF07 95 65_1 gene: "Ubp41"; product: "ubiquitin-specif ic protease 
UBP41"; Mus musculus ubiquitin-specif ic protease UBP41 (Ubp41) mRNA, 
complete cds . , N = 3, Score = 187, P = 8.7e-30 

PIR: 158376 hypothetical protein unp - mouse, N = 3, Score = 214, P = 
1.2e-28 
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>SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 
(UBIQUITIN THIOLESTERASE 11) (UBIQOITIN-SPECIFIC PROCESSING PROTEASE 13) 
(DEUBIQUITINATING ENZYME 11) (KIAA0055). 

Length = 1,118 

HSPs: 



Score = 300 (45.0 bits), Expect = 9.3e-31, Sum P(2) = 9.3e-31 
Identities = 95/301 (31%), Positives = 149/301 (49%) 



Query. 


381 


LCHELHTLFQVMWSGKWALVSPFAMLHSVWRLIPAFRGYAQQDAQEFLCELLDKIQREL- 


439 






+ E + + +W+G++ +SP ++ ++ F GY+QQD+QE L L+D + +L 




Sbjct: 


826 


VAEEFGIIMKALWTGQYRYISPKDFKITIGKINDQFAGYSQQDSQELLLFLMDGLHEDLN 


885 




4 40 


ETTGTSLPALIPTSQRKLIKQVLN VVNNIFHGQLLSQVTCLACDNKSNT 


488 






E L + LN ++ +F GQ S V CL C KS T 




Sbjct: 


886 


KADNRKRYKEENNDHLDDFKAAEHAWQKHKQLNESII V7ALFQGQFKSTVQCLTCHKKSRT 


945 


Query: 


489 




548 




E F LSL +C+ +D CL + +K E + + + C C ++R 




Sbjct: 


946 


FEAFMYLSLPLASTSKCTLQD CL — RLFSK — EEKLTDNNRFYCSHCRARR 


992 


Query: 


549 


QK"D17WT TITZlOPfOT MTfHT PHUT RT HT K"R FR W^f^RMMR TTV T rtVHt/f^FF' -F T T WMFPYrf* 


605 




++ K++ I LP VL +HLKRF + GR ++K+ V F E L++ Y 




Sbjct: 


993 


DSLKKIEIWKLPPVLLVHLKRFSYDGRW-KQKLQTSVDFPLENLDLSQYVIGP 


1044 


Query: 


606 


RETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMCTMDEV 


665 






+ LK Y+L +V H+G G GHYTAYC N+ W +D ++S ++ V 




Sbjct: 


1045 


KMHT.KK YHT F^V^MHYG GT.DGGH YTA YCKNA ARORWFKFDDHFVSnT ^VS^V 


1096 


Query: 


666 


CKAQAYILFYTQ RVTE 681 








+ AYILFYT RVT+ 




Sbjct: 


1097 






Score 


= 126 


(18.9 bits), Expect = 9.3e-31, Sura P(2) = 9.3e-31 




Identities ■ 


= 41/116 (35%), Positives = 63/116 (54%) 




Query: 


200 


QVKAELESMPPR— KSLRLQGLAQSTIIEIVSVQVPAQTPASPAKDKVLSTSENEISQKV 


257 






Q+ AE + P + +S - Q+ 1+ + P TP ++K + ETS ++ 




Sbjct: 


701 


QIPAERDREPSKLKRSYSSPDITQA--IQEEEKRKPTVTPTVNRENKPTCYPKAEIS-RL 


757 


Query: 


258 


SDSSVKR-RPIVT PGVTGLRNLGNTCYMNSVLQVLS HLLIF — RQCFLKLDLNQ 


308 






S S ++ P+ P +TGLRNLGNTCYMNS+LQ L HL + R C+ D+N+ 




Sbjct: 


758 


SASQIRNLNPVFGGSGPALTGLRNLGNTCYKNSILQCLCNAPHLADYFNRNCYQD-DINR 


816 


Score 


= 50 


(7.5 bits), Expect = 8.3e-23, Sum P(2) = 8.3e-23 




Identities = 


= 29/106 (27%), Positives - 51/106 (48%) 




Query: 


173 


RKKQEEPFQEKIVVKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQV 


232 






+ KQE+ +E+ +++ K R++E E + K + E+ + QA+++SQ 




Sbjct: 


475 


KNKQEKELRERQQEEQKEKLRKEEQEQKAKKKQEA-EENEITEKQQKAKEEMEKKESEQA 


533 


Query: 


233 


PAQ TPASPAKD KVLSTSENEIS — QKVSDSSVKRRPI VTPGV 272 






+ T A K+ K S SE+E S +K + KR P IP t 




Sbjct: 


534 


KKEDKETSAKRGKEITGVKRQSKSEHETSDAKKSVEDRGKRCP — TPEI 580 




Score 


= 42 


(6.3 bits), Expect = 5.7e-22, Sum P(2) = 5.7e-22 





Identities = 13/58 (22%), Positives = 27/58 (46%) 

Query: 167 EQSPIGRKKQEEPFQEKIVVKREVKKRRQELEY-QVKAELESMPPRKSLRLQGLAQST 223 

EQ +KKQE E +++ K+ ++ E Q K E + ++ + G+ + + 

Sbjct: 498 EQEQKAKKKQEAEENEITEKQQKAKEEMEKKESEQAKKEDKETSAKRGKEITGVKRQS 555 



Pedant information for DKFZphtes3_27dl , frame 2 



Report for DKFZphtes3_27dl .2 



[LENGTH] 712 

[MW] 81155.71 

[pi] 8.21 

[HOMOL] SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 

(UBIQUITIN THIOLESTERASE 11) ( UBIQUITIN-SPECIFIC PROCESSING PROTEASE 13) (DEUBIQUITINATING 

ENZYME 11) (KIAA0055) . 4e-32 

[FUNCAT] 06.13.01 cytoplasmic degradation [S. cerevisiae, YMR223w] 5e-33 

[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 



palmitylation, f arnesylation and processing) [S. cerevisiae, YMR223w) 5e-33 
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[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

[ FUNCAT] 

[ FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[ PFAM] 

[KW] 

[KW] 



06.13 proteolysis [S. cerevisiae, YBL067c] 3e-19 

10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 2e-17 

03.10 sporulation and germination [S. cerevisiae, YDR069c] 2e-17 

30.10 nuclear organization [S. cerevisiae, YDR069c] 2e-17 

30.03 organization of cytoplasm [S. cerevisiae, YDR069c] 2e-17 

09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 2e-17 

04.05.01.04 transcriptional control [S. cerevisiae, YNL186w] 4e-17 

99 unclassified proteins [S. cerevisiae, YHLOlOc] 3e-12 

BL00970A Nuclear transition protein 2 proteins 

BL00972D 

BL00972C 

BL00972B 

BL00972A 

3.1.2.15 ubiquitin thiolesterase 5e-06 
alternative splicing 2e-ll 
thiolester hydrolase 5e-06 
hydrolase le-14 
RING finger homology 7e-ll 
deubiquinating enzyme SSV7 5e-16 
MYRISTYL 5 
AMIDATION 2 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 10 
TYR_PHOSPHO_SITE 2 
UCH_2_2 1 

PKC_PHOSPHO_SITE 17 
ASN_GLYCOSYLATION 4 
UCH_2_1 1 

Ubiquitin carboxyl-terminal hydrolases family 2 
Ubiquitin carboxyl-terminal hydrolases family 2 
Alpha_Beta 

LOW COMPLEXITY 4.92 % 



SEQ MLAMDTCKHVGQLQLAQDHSSLNPQKWHCVDCNTTESIWACLSCSHVACCRYIEEHALKH 

SEG 

PRD ccccccccchhhhhhhhcccccccccceeecccceeeeeeeccccccccchhhhhhhhhh 

SEQ FQESSHPVALEVNEMYVFCYLCDDYVLNDNATGDLKLLRRTLSAIKSQNYHCTTRSGRFL 

SEG 

PRD hhhhccceeecccceeeeeeccccccccccccchhhhhhhhhhhhhcccceeeccccccc 

SEQ RSMGTGDDSYFLHDGAQSLLQSEDQLYTALWHRRRILMGKIFRTWFEQSPIGRKKQEEPF 

SEG 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhh 

SEQ QEKIVVKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTIIEIVSVQVPAQTPASP 

SEG xxxxxxxxxxxxxxxx 

PRD hheeehhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccccccc 

SEQ AKDKVLSTSENEISQKVSDSSVKRRPIVTPGVTGLRNLGNTCYMNSVLQVLSHLLIFRQC 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ FLKLDLNQWLAMTASEKTRSCKHPPVTDTVVYQMNECQEKDTGFVCSRQSSLSSGLSGGA 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhchhhhhhhhhhhhhhccccccceeehhhhhcccccccccccccccccccccccc 

SEQ SKGRKMELIQPKEPTSQYISLCHELHTLFQVMWSGKWALVSPFAMLHS VWRLI PAFRGYA 

SEG xxxxx 

PRD ccccceeecccccccchhhhhhhhhhhhhhhhhccceeeeccchhhhhhhhhhhccccch 

SEQ QQDAQEFLCELLDKIQRELETTGTSLPALIPTSQRKLIKQVLNVVNNIFHGQLLSQVTCL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhccccchhhhhhhhc 

SEQ ACDNKSNTIEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKIYVCDQC 

SEG 

PRD cccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhccceeecccc 

SEQ NSKRRRFSSKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFEEILNM 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccccceeeeccccccc 

SEQ EPYCCRETLKSLRPECFI YDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMC 

SEG 

PRD ccccccccccccccceeeeeeeeeeeecccccccccceeeeccccccceeeecccccccc 

SEQ TMDEVCKAQAYILFYTQRVTENGHSKLLPPELLLGSQHPNEDADTSSNEILS 

SEG 

PRD cchhhhhhhhhhhhhheeeecccccccccccccccccccccccccccccccc 
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Prosite for DKFZphtes3_27dl . 2 



PS00001 


33->37 


ASN GLYCOSYLATE ON 


PDOCOOOOl 


PS00001 


90->94 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


484- 


•>488 


ASN GLYCOSYLATION 


PDOCOOOOl 


PS00001 


653- 


■>657 


ASN GLYCOSYLATION 


PDOCOOOOl 


PS00004 


545- 


->549 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




6->9 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


113- 


■>116 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116- 


->119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


213- 


->216 


PKC _ PHOSPHO _ SITE 


PDOC00005 


PS00005 


254- 


■>257 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


261- 


->264 


PKC _ PHOSPHO~SITE 


PDOC00005 


PS00005 


315- 


•>318 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


320- 


->323 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


394- 


■>397 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


453- 


■>456 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


506- 


■>509 


PKC^PHOSPHO SITE 


PDOC00005 


PS00005 


542- 


■>545 


PKC~ PHOSPHO _ SITE 


PDOC00005 


PS00005 


548- 


■>551 


PKC _ PHOS PHO~S I TE 


PDOC00005 


PS00005 


580- 


>583 


PKC _ PHOSPHO - SITE 


PDOC00005 


PS00005 


608- 


•>61 1 


PKC - PHOS PHO - SITE 


PDOC00005 


PS00005 


611- 


>614 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


67 6- 


>679 


PKC - PHOS PHO - S ITE 


PDOC00005 


PS00006 


125- 


■>129 


CK2 — PHOSPHO - SITE 


PDOC00006 


PS00006 


164- 


>168 


CK2 — PHOSPHO - SITE 


PDOC00006 


PS00006 


223- 


>227 


CK2 — PHOSPHO SITE 


PDOC000 0 6 


PS00006 


247- 


■ >251 


CK2 — PHOSPHO - SITE 


PDOC00006 


PS00006 


24 9- 


■>253 


CK2 ~ PHOS PHO~S ITE 


PDOC00006 


PS00006 


313- 


>317 


CK2 _ PHOS PHO - S I TE 


PDOC00006 


PS00006 


506- 


>510 


CK2 _ PHOSPHO - SITE 


PDOC00006 


PS00006 


525- 


■>529 


CK2 — PHOSPHO - SITE 


PDOC00006 


PS00006 


661- 


>665 


CK2 - PHOSPHO - SITE 


PDOC00006 


PS00006 


706- 


■>710 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


193- 


•>200 


TYR PHOSPHO SITE 


PDOC00007 


PS0Q007 


192- 


•>200 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


' 218- 


■>224 


MYRISTYL 


PDOC00008 


PS00008 


355- 


•>361 


MYRISTYL 


PDOC00008 


PS00008 


359- 


•>365 


MYRISTYL 


PDOC00008 


PS00008 


471- 


>477 


MYRISTYL 


PDOC00008 


PS00008 


589- 


•>595 


MYRISTYL 


PDOC00008 


PS00009 


171- 


>175 


AMI DATION 


PDOC00009 


PS00009 


362- 


•>366 


AMI DAT ION 


PDOC00009 


PS00972 


274- 


>290 


UCH 2 1 


PDOC00750 


PS00973 


619- 


>638 


UCH 2 2 


PDOC00750 



Pfara for DKFZphtes3_27dl . 2 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM *GIqNlGNTCYMNSIIQCL* 

G++NLGNTCYMNS++Q+L 
Query 274 GLRNLGNTCYMNSVLOVL 291 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * YdLYgVICHYGntldyGHYWaYVKNenhHRWkWYYFDDEtV* 

YDL +V+ H+G + ++GHY+AY++N + ++W+ +D++ 
Query 619 YDLSAWMHHGKGFGSGHYTAYCYNSE— GGFWVHCNDSKL 657 
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group: transmembrane protein 

Summary DKFZphtes3_27k4 encodes a novel 490 amino acid protein with similarity to two 
hypothetical C.elegans proteins. 

The novel protein contains 10 transmembrane regions and a leucine zipper. It is a member of 
the new 10 trans-membrane domain containing protein family which is specific for multicellular 
eukariotes . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



strong similarity to C.elegans K07H8 .2/ZK185 . 2 
membrane regions: 10 

complete cDNA, complete cds potential start at Bp 109, few EST hits 
Sequenced by GBF 
Locus : unknown 



Insert length: 1901 bp 

Poly A stretch at pos . 1866, no polyadenylation signal found 



1 GTGATTTACC AGAAAAACCA AGAAGACAGG CACAAAAAAG CAAACGGCAT 
51 TTGGCAAGAT GGATTATCAA CTGCAGTACA GACTTTTAGT AATAGATCTG 
101 AGCAACACAT GGAGTATCAC AGTTTCTCAG AGCAGTCTTT TCATGCCAAT 
151 AATGGGCACG CATCATCAAG CTGCAGCCAA AAGTATGATG ACTATGCCAA 
201 TTATAATTAC TGTGATGGAA GGGAGACTTC AGAAACCACT GCCATGTTAC 
251 AAGATGAAGA TATATCTAGT GATGGTGATG AAGATGCTAT TGTAGAAGTG 
301 ACCCCAAAAT TACCAAAGGA ATCCAGTGGC ATCATGGCAT TGCAAATACT 
351 TGTGCCCTTT TTGCTAGCTG GTTTTGGAAC AGTTTCAGCT GGCATGGTAC 
4 01 TGGATATAGT ACAGCACTGG GAGGTGTTCA GAAAAGTTAC AGAAGTTTTC 
451 ATTTTAGTCC CTGCACTTCT TGGTCTCAAA GGGAACTTGG AAATGACATT 
501 GGCATCCAGA TTATCCACTG CAGTAAATAT TGGGAAGATG GATTCACCCA 
551 TTGAAAAGTG GAACCTAATA ATTGGCAACT TGGCTTTAAA GCAGGTTCAG 
601 GCAACAGTAG TGGGTTTTCT AGCAGCTGTG GCAGCAATTA TATTGGGCTG 
651 GATTCCAGAA GGAAAATATT ACCTTGATCA TTCCATACTT CTGTGCTCTA 
7 01 GCAGTGTGGC AACTGCCTTC ATTGCATCTC TTCTGCAGGG AATAATAATG 
751 GTTGGGGTTA TCGTTGGTTC AAAGAAGACT GGTATAAATC CTGATAATGT 
801 TGCTACACCC ATTGCTGCTA GTTTTGGCGA CCTTATAACT CTTGCCATAT 
851 TGGCTTGGAT AAGTCAGGGC TTATACTCCT GTCTTGAGAC CTATTACTAC 
901 ATTTCTCCAT TAGTTGGTGT ATTTTTCTTG GCTCTAACCC CTATTTGGAT 
951 TATAATAGCT GCCAAACATC CAGCCACAAG AACAGTTCTC CACTCAGGCT 
1001 GGGAGCCTGT CATAACAGCT ATGGTTATAA GTAGCATTGG GGGCCTTATT 
1051 CTGGACACAA CTGTATCAGA CCCAAACTTG GTTGGGATTG TTGTTTACAC 
1101 GCCAGTTATT AATGGTATTG GTGGTAATTT GGTGGCCATT CAGGCTAGCA 
1151 GGATTTCTAC CTACCTCCAT TTACATAGCA TTCCAGGAGA ATTGCCTGAT 
1201 GAACCCAAAG GTTGTTACTA CCCATTTAGA ACTTTCTTTG GTCCAGGAGT 
1251 AAATAATAAG TCTGCTCAAG TTCTACTGCT TTTAGTGATT CCTGGACATT 
1301 TAATTTTCCT CTACACTATT CATTTGATGA AAAGTGGTCA TACTTCTTTA 
1351 ACTATAATCT TCATAGTAGT GTATTTATTT GGCGCTGTGT TACAGGTATT 
14 01 TACCTTGCTG TGGATTGCTG ACTGGATGGT CCATCACTTC TGGAGGAAAG 
14 51 GAAAGGACCC GGATAGTTTC TCCATCCCCT ACCTAACAGC ATTGGGTGAT 
1501 CTGCTCGGGA CAGCTCTGTT AGCCTTAAGT TTTCATTTTC TTTGGCTTAT 
1551 TGGAGATCGA GATGGAGATG TTGGAGACTA ATAAATTCTA CAAACTGCTC 
1601 TCAAGTTACC AAGGAAGAAA ATACACGACA ACCACTTATG GCTCTTTTTC 
1651 AAAACTCTTA AATCAGTAGT TTGACTTTTG CCAGGGTAAT CTTCAGTTGG 
1701 CCCTGATTCA ATTAAATGGC CTTAATTTTT TTTTAAGGAA TTTGTGTCAA 
1751 AACCAGAATG AAGAGTATTC GTGCTGCTTT TCATAGAATA AATGATAATT 
1801 TGACATAGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1851 AAAAAAAAAA AAGGGGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
1901 G 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 109 bp to 1578 bp; peptide length: 490 
Category: similarity to unknown protein 



1 MEYHSFSEQS FHANNGHASS SCSQKYDDYA NYNYCDGRET SETTAMLQDE 
51 DI SSDGDEDA IVEVTPKLPK ESSGIMALQI LVPFLLAGFG TVSAGMVLDI 
101 VQHWEVFRKV TEVFILVPAL LGLKGNLEMT LASRLSTAVN IGKMDSPIEK 
151 WNLIIGNLAL KQVQATVVGF LAAVAAIILG WIPEGKYYLD HSILLCSSSV 
201 ATAFIASLLQ GIIMVGVIVG SKKTGINPDN VATPIAASFG DLITLAILAW 
251 ISQGLYSCLE TYYYISPLVG VFFLALTPIW IIIAAKHPAT RTVLHSGWEP 
301 VITAMVISSI GGLILDTTVS DPNLVGIVVY TPVINGIGGN LVAIQASRIS 
351 TYLHLHSIPG ELPDEPKGCY YPFRTFFGPG VNNKSAQVLL LLVIPGHLIF 
401 LYTIHLMKSG HTSLTIIFIV VYLFGAVLQV FTLLWIADWM VHHFWRKGKD 
451 PDSFSIPYLT ALGDLLGTAL LALSFHFLWL IGDRDGDVGD 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_27k4, frame 1 

TREMBL:AF036704_2 gene: "ZK185.2"; Caenorhabditis elegans cosmid 
ZK185., N - 1, Score = 730, P = 3 . le-72 

TREMBL:AF04 7 659_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid 
K07H8., N = 1, Score = 940, P = 1.7e-94 



>TREMBL:AF047 65 9_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid K07H8. 
Length - 507 

HSPs: 

Score = 940 (141.0 bits), Expect - 1.7e-94, P = 1.7e-94 
Identities = 204/412 (49%), Positives = 271/412 (65%) 

Query: 68 LPKESSGIMALQILVPFLLAGFGTVSAGMVLDIVQHWEVFRKVTEVFILVPALLGLKGNL 127 

+P ESS ++ Q+L PF +AG G V AG+VL IV W +F ++ E+ ILVPALLGLKGNL 
Sbjct: 82 IPAESSYVLFFQVLFPFAVAGLGMVFAGLVLSI VVTWPLFEEIPEILILVPALLGLKGNL 141 

Query: 128 EMTLASRLSTAVNIGKMDSPIEKWNLI IGNLALKQVQATVVGFLAAVAAIILGWI PEGKY 187 

EMTLASRLST N+G MDS ++ +++I NLAL QVQATVV FLA+ A L +IP G + 
Sbjct: 142 EMTLASRLSTLANLGHMDSSKQRKDVVIANLALVQVQATVVAFLASAFAAALAFIPSGDF 201 

Query: 188 YLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFGDLITLAI 247 

H L+C+SS+ATA ASL+ ++MV VIV S+K INPDNVATPI AAS GDL TL + 
Sbjct: 202 DWAHGALMCASSLATACSASLVLSLLMVVVI VTSRKYN INPDNVATPI AAS LGDLTTLTV 261 

Query: 248 LAWISQGLYSCLETYYYISPLVGVFFLALTPIWIIIAAKHPATRTVLHSGWEPVITAMVI 307 

LA+ T +++ +V V FL L P WI IA ++ T+ L++GW PVI +M+I 

Sbjct: 262 LAFFGSVFLKAHNTESWLNVI VI VLFLLLLPFWIKI ANENEGTQETLYNGWTPVIMSMLI 321 

Query: 308 SSIGGLILDTTVSDPNLVGIVVYTPVINGIGGNLVAIQASRISTYLHLHSI PGELPDEPK 367 

SS GG IL+T V + + Y PV+NG+GGNL A+QASR+STY H G LP+E 

Sbjct: 322 SSAGGFILETAVRRYH--SLSTYGPVLNGVGGNLAAVQASRLSTYFHKAGTVGVLPNEWT 379 

Query: 368 GCYYPF--RTFFGPGVNNKSAQVLLLLVIPGHLIFLYTIHLM KSGHTSLTIIFIVV 421 

+ R FF +++SA+VLLLLV+PGH+ F + I L K+ T +F + 

Sbjct: 380 VSRFTSVQRAFFSKEWDSRSARVLLLLVVPGHICFNFLIQLFTLTSKNNVTPHGPLFTSL 439 

Query: 422 YLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSF 475 

Y+ A++QV LL++ +V W+ DPD+ IPYLTALGDLLGT LL + F 
Sbjct: 440 YMIAAIIQVVILLFVCQLLVALLWKWKIDPDNSVIPYLTALGDLLGTGLLFIVF 493 



Pedant information for DKFZphtes3_27k4, frame 1 



Report for DKFZphtes3_27k4 . 1 



[LENGTH] 490 

[MW] 53266.39 
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[pi ] 


5 2 9 






f HOMOI, 1 


TRFMRT. ■ A FT) A 1 £> "^Q Q 


rrpnp.- "K07H8 2"; 


- Caenorhabditis elegans cosmid K07H8 . 4e— 94 


[ PROSITE] 


T.FtTrTNE ZT PPER 1 






[PROSITE] 


MYRISTYL 7 






[PROSITE] 


CAMP PHOSPHO SITE 


1 




[PROSITE] 


CK2 PHOSPHO SITE 


7 




[PROSITE] 


PROKAR LIPOPROTEIN 


2 




[PROSITE] 


TYR PHOSPHO_SITE 


1 




[PROSITE] 


PKC PHOSPHO SITE 


3 




[PROSITE] 


ASN GLYCOSYLATION 


1 




[KW] 


TRANSMEMBRANE 10 






[KW] 


LOW COMPLEXITY 


3.06 % 





SEQ MEYHSFSEQSFHANNGHASSSCSQKYDDYANYNYCDGRETSETTAMLQDEDISSDGDEDA 

SEG 

PRD cccccccceeeccccccccccccccccccceeecccccccchhhhhhhhcccccccccee 

MEM 

SEQ I VEVTPKLPKESSGIMALQILVPFLLAGFGTVSAGMVLDI VQHWEVFRKVTEVFILVPAL 

SEG 

PRD eeeeeccccccchhhhhhhhhhhhhhhcccchhhhhhhhhcchhhhhcccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMVMMMMMMMM . MMMMMMMMMMMMMMM 

SEQ LGLKGNLEMTLASRLSTAVNIGKMDSPIEKWNLIIGNLALKQVQATVVGFLAAVAAHLG 

SEG 

PRD ccccchhhhhhhhhhhhhhccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMM MMMMMMMMMMMMMMMMM 

SEQ WIPEGKYYLDHSILLCSSSVATAFIASLLQGIIMVGVIVGSKKTGINPDNVATPIAASFG 

SEG 

PRD hcccceeecccceeehhhhhhhhhhhhhhhhhhhhheeeecccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMKMMMMMMM MMMMMM 

SEQ DLITLAI LAWISQGLYSCLETYYYISPLVGVFFLALT ?IWI 1 1 AAKHPATRTVLHSGWEP 

SEG 

PRD cchhhhhhhhhhhhhhhhcceeeeehhhhhhhhhhchhhhhhhhccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMM .... MMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ VITAMVISSIGGLILDTTVSDPNLVGIVVYTPVINGIGGNLVAIQASRISTYLHLHSIPG 

SEG 

PRD hcchhhhhhcceeeeccccccccceeeeeeceeeecccccceeeeehhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ ELPDEPKGCYYPFRTFFGPGVNNKSAQVLLLLVIPGHLIFLYTIHLMKSGHTSLTIIFIV 

SEG 

PRD cccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhcccccccceeeehhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . . .MMMMMMM 

SEQ VYLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSI PYLTALGDLLGTALLALSFHFLWL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccceeeeeecchhhhhhhhhhhhheeee 

MEM MMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IGDRDGDVGD 

SEG 

PRD eecccccccc 

MEM MM 



Prosite for DKFZphtes3_27k4 . 1 



PS00001 


383->387 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


108->112 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


23->26 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


65->68 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


221->224 


PKC PHOSPHO_ 


"site 


PDOC00005 


PS00006 


5->9 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


54->58 


CK2 PHOSPHO_ 


|SITE 


PDOC00006 


PS00006 


146->150 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


238->242 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


257->261 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


296->300 


CK2 PHOSPHO" 


'site 


PDOC0000 6 


PS00006 


318->322 


CK2 PHOSPHO 


site 


PDOC00006 


PS00007 


25->33 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


90->96 


MYRISTYL 




PDOC00008 


PS00008 


122->128 


MYRISTYL 




PDOC00008 


PS00008 


216->222 


MYRISTYL 




PDOC00008 


PS00008 


220->226 


MYRISTYL 




PDOCO0008 
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PS00008 254->260 MYRISTYL PDOC00008 

PS00008 336->342 MYRISTYL PDOC00008 

PS00008 339->345 MYRISTYL PDOC00O08 

PS00013 12->23 PROKAR_LIPOPROTEIN PDOC00013 

PS00013 248->259 PROKAR_LIPOPROTEIN PDOC00013 

PS00029 459->481 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_27k4 . 1 ) 
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DKFZphtes3_27ol4 



group: testes derived 

DKFZphtes3_27ol4 encodes a novel 358 amino acid protein with similarity to C. elegans cosmid 
C55A6. 

The new protein contains a C3HC4 zinc finger (RING finger) signature. The ring finger 
structure binds two atoms of zinc, and is involved in mediating protein-protein interactions. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C. elegans C55A6.1 
complete cDNA, complete cds, EST hits 
Sequenced by GEF 
Locus: /map="6" 
Insert length: 2158 bp 

Poly A stretch at pos . 2137, polyadenylation signal at pos. 2120 



1 CCGAGGCCAG AGAGAAAAGA CTGCGAGGTG GCCGCAGCTG TGGCCGGAGA 
51 GCACAAAGAA TGAACCAGCA GTGGAAGAGA AAATACTGTA AGCTGGCTGA 
101 CTGCTGGTGA AGAAAATGCT TTATTTTTGT GGCAGGCATC TGTGGGATCT 
151 GTAATAGAAA TATATTGGAG TAATTCAAGA TTCTGTGGTT GGCCCTTTTG 
201 ACTGCTCTCT CTACAGGTTT AATTTGGGCA TTTACTCATT TTCATGGCTC 
251 CAAGGACCAT GTATGTGTTG GGGATCTTCA ATATTCATGT TATTTTCTCC 
301 TTTGGTCTTA TATGATTGTT ACCTTTATGA AGCTTTAGTG ATTACAAAGC 
351 ACTTTTTTTG TCCATTTTTA CCTGAGCTTT GTAAACTCTG ATTTGCAGGA 
401 TGGCTGGCTG TGGTGAAATT GATCATTCAA TAAACATGCT TCCTACAAAC 
451 AGGAAAGCGA ACGAGTCCTG TTCTAATACT GCACCTTCTT TAACCGTCCC 
501 TGAATGTGCC ATTTGTCTGC AAACATGTGT TCATCCAGTC AGTCTGCCCT 
551 GTAAGCACGT TTTCTGCTAT CTATGTGTAA AAGGAGCTTC ATGGCTTGGA 
601 AAGCGGTGTG CTCTTTGTCG ACAAGAAATT CCCGAGGATT TCCTTGACAA 
651 GCCAACCTTG TTGTCACCAG AAGAACTCAA GGCAGCAAGT AGAGGAAATG 
701 GTGAATATGC ATGGTATTAT GAAGGAAGAA ATGGGTGGTG GCAGTACGAT 
751 GAGCGCACTA GTAGAGAGCT GGAAGATGCT TTTTCCAAAG GTAAAAAGAA 
801 CACTGAAATG TTAATTGCTG GCTTTCTGTA TGTCGCTGAT CTTGAAAACA 
851 TGGTTCAATA TAGGAGAAAT GAACATGGAC GTCGCAGGAA GATTAAGCGA 
901 GATATAATAG ATATACCAAA GAAGGGAGTA GCTGGACTTA GGCTAGACTG 
951 TGATGCTAAT ACCGTAAACC TAGCAAGAGA GAGCTCTGCT GACGGAGCGG 
1001 ACAGTGTATC AGCACAGAGT GGAGCTTCTG TTCAGCCCCT AGTGTCTTCT 
1051 GTAAGGCCCC TAACATCAGT AGATGGTCAG TTAACAAGCC CTGCAACACC 
1101 ATCCCCTGAT GCAAGCACTT CTCTGGAAGA CTCTTTTGCT CATTTACAAC 
1151 TCAGTGGAGA CAACACAGCT GAAAGGAGTC ATAGGGGAGA AGGAGAAGAA 
1201 GATCATGAAT CACCATCTTC AGGCAGGGTA CCAGCACCAG ACACCTCCAT 
1251 TGAAGAAACT GAATCAGATG CCAGTAGTGA TAGTGAGGAT GTATCTGCAG 
1301 TTGTTGCACA GCACTCCTTG ACCCAACAGA GACTTTTGGT TTCTAATGCA 
1351 AACCAGACAG TACCCGATCG ATCAGATCGA TCGGGAACTG ATCGATCAGT 
14 01 AGCAGGGGGT GGAACAGTGA GTGTCAGTGT CAGATCTAGA AGGCCTGATG 
14 51 GACAGTGCAC AGTAACTGAA GTTTAAATAA AAATGTCTTC AGCTCCATGC 
1501 TCAAGGTTGA AAGGGTTACC TGTAAATTTC TGCCCACATA ACATTATACT 
1551 CATCCCTAGT AGTGCATTTT GGGAGTTGGG GTGGGAAGGG GTATGGGAAG 
1601 GATAGACTCA TAATTAAAAT GTCTAACATG TCTCTGTTGA GAAATTTATT 
1651 TAATGTAAGG AACTTGGGTG TTAATAGTTG AGAGCTGTTT AGTAATAACC 
1701 CAGTTTTCTT GAGGTCTGTT TACTTTATAC TTTTTAAAAA CTTCTGTAGT 
1751 TCTTTTGGCC AGTGTGTTTG TATTATCTGT GCATTAATGG TCCTCATCTG 
1801 ACTCCTGCAT TGTGTCTTAT TTTTCTGCAT GGATTGGCAT AAGACCATTA 
1851 CTAAAATTTG GCACCTGTGA GATGTTTGAT ATTATGAACA GGAAACATAA 
1901 TTTAATGTAT GAATAGATGT GAATTTGGGA TTTCAAAATA GATGAATAAC 
1951 AACTATTTTA TAGTAAAGTT ATTGAAATGG AAATGAAAAC AGCCAGTAAC 
2001 TTATGTTTCA GAATGTTTGT AACACACTTC ATGGTGTTCC CATAGGCTTT 
2051 GCTGTCTAGT CTTATAGTTT GAGGTTTTTT TGGTCTGCAT TTTTCTTTTT 
2101 GATTACAAAA TTTATAATTT AATAAATACT AGAGTTTATC AAAAAAAAAA 
2151 AAAAAAAG 



BLAST Results 



Entry HSG117 from database EMBL: 
human STS SHGC-3 6270. 
Score = 1148, P = 8.9e-45, identities = 240/250 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 400 bp to 1473 bp; peptide length: 358 
Category: similarity to unknown protein 
Prosite motifs: ZINC_FINGER_C3HC4 (51-61) 



1 MAGCGEIDHS INMLPTNRKA NESCSNTAPS LTVPECAICL QTCVHPVSLP 
51 CKHVFCYLCV KGASWLGKRC ALCRQEIPED FLDKPTLLSP EELKAASRGN 
101 GEYAWYYEGR NGWWQYDERT SRELEDAFSK GKKNTEMLIA GFLYVADLEN 
151 MVQYRRNEHG RRRKIKRDI I DIPKKGVAGL RLDCDANTVN LARESSADGA 
201 DSVSAQSGAS VQPLVSSVRP LTSVDGQLTS PATPSPDAST SLEDSFAHLQ 
251 LSGDNTAERS HRGEGEEDHE SPSSGRVPAP DTSIEETESD ASSDSEDVSA 
301 VVAQHSLTQQ RLLVSNANQT VPDRSDRSGT DRSVAGGGTV SVSVRSRRPD 
351 GQCTVTEV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27ol4 , frame 1 

TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6, 
N = 2, Score = 165, P = 4.2e-15 

SWISSPROT: YWZ6_CAEEL HYPOTHETICAL 39.3 KD PROTEIN C02B8 . 6 IN CHROMOSOME 
X., N - 2, Score = 136, P = 3.1e-ll 



>TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 
Length = 484 

HSPs: 



Score = 165 (24.8 bits], Expect = 4.2e-15, Sum P(2) = 4.2e-15 
Identities - 42/106 (39%), Positives = 61/106 (57%) 

Query: 75 QEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRN-GWWQYDERTSRELEDAFSKGKK 133 

Q +P LD ++ PEE K Y W Y G+N GWW+++ R RE+E+A++ GK 

SbjCt: 93 QNVPALDLDA-SICDPEERK Y-WIYSGKNQGWWRFEPRNEREIEEAYNAGKC 142 

Query: 134 NTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKR DI I D-I PKKGVAGL 180 

+ E++I G YV D +QY R + R +KR DDI KG+AG+ 
SbjCt: 143 HCEVVICGRPYVI DFHQFLQYPRGVPNQARHVKRVSADDFDGIGVKGLAGI 193 

Score = 96 (14.4 bits), Expect = 4.2e-15, Sum P(2) = 4.2e-15 
Identities = 19/54 (35%), Positives = 30/54 (55%) 

Query: 35 ECAICLQTCVHPVSLP-CKHVFCYLCVKGASW — LGKRCALCRQEI PEDFLDKPT 86 

EC IC + P ++P C H FC++C+KG +G C +CR I + +P+ 

SbjCt: 11 ECPICQCKMI VPTTIPACGHKFCFICLKGVYMNDMGG-CPMCRGPIDSNI FAQPS 64 

Pedant information for DKFZphtes3_27ol4 , frame 1 



Report for DKFZphtes3_27ol4 . 1 

[LENGTH] 358 

[MW] 38818.90 

[pi] 5.17 

[HOMOL] TREMBL:CEC55A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 2e-12 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YCR066w] 3e-04 
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[FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmitylation, f arnesylation and processing) [S. cerevisiae, YCR066w] 3e-04 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YDR265w] 4e-04 

[FUNCAT] 30.19 peroxisomal organization [S. cerevisiae, YDR265w] 4e-04 

[BLOCKS] BL00518 Zinc finger, C3HC4 type, proteins 



[PROSITE] 


MYRISTYL 2 




[PROSITE] 


AMIDATION 3 




[ PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO_SITE 


12 


[ PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


ZINC FINGER C3HC4 


1 


[PROSITE] 


PKC PHOSPHO SITE 


9 


[PROSITE] 


ASN_GLYCOSYLATION 


2 


[PFAM] 


Zinc finger, C3HC4 


type 1 


[KWJ 


Irregular 




[KW] 


3D 





[KW] 



LOW COMPLEXITY 



19.83 % 



SEQ MAGCGEIDHSINMLPTNRKANESCSNTAPSLTVPECAICLQTCVHPVSLPCKHVFCYLCV 

SEG 

Irmd- TTTTTEETTTEEEETTTEEEEHHHH 

SEQ KGASWLGKRCALCRQEIPEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRNGWWQYDERT 

SEG 

Irmd- HHHHHHCCBTTTTTCBCGGG-CBCC 

SEQ SRELEDAFSKGKKNTEMLI AGFLYVADLENMVQYRRNEHGRRRKIKRDI IDI PKKGVAGL 

SEG xxxxxxxxxxxxxxx 

1 rmd- 

SEQ RLDCDANTVNLARESSADG ADS VSAQSGASVQPLVSSVRPLTSVDGQLT SPAT PSPDAST 

SEG xxxxxxxxxxxx 

1 rmd- 

SEQ SLEDSFAHLQLSGDNTAERSHRGEGEEDHESPSSGRVPAPDTSIEETESDASSDSEDVSA 

SEG X xxxxxxxxxxxxxxxxxxxx 

Irmd- 

SEQ VVAQHSLTQQRLLVSNANQTVPDRSDRSGTDRSVAGGGTVSVSVRSRRPDGQCTVTEV 

SEG XXX xxxxxxxxxxxxxxxxxxxx 

1 rmd- 



Prosite for DKFZphtes3_27ol4 . 1 



PS00001 


21->25 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


318->322 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


132->136 


CAMP PHOSPHO SITE 


PCOC00004 


PS00005 


16->19 


PKC PHOSPHO 


SITE 


PDOCOOOOb 


PS00005 


120->123 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


217->220 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


260->263 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


274->277 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


325->328 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


330->333 


PKC PHOSPHO" 


"site 


PDOC0 0 00 5 


PS00005 


343->346 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


346->349 


PKC PHOSPHO 


site 


PDOC00005 


PS00006 


32->36 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


89->93 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


120->124 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


195->199 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


222->226 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


240->244 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


282->286 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


287->291 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


293->297 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


320->324 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


328->332 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


354->358 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


98->107 


TYR PHOSPHO" 


SITE 


PDOC00007 


PS00008 


329->335 


MYRISTYL 




PDOC00008 


PS00008 


337->343 


MYRISTYL 




PDOC00008 


PS00009 


66->70 


AMIDATION 




PDOC00009 


PS00009 


130->134 


AMIDATION 




PDOC00009 


PS00009 


159-M63 


AMIDATION 




PDOC00009 


PS00518 


51->61 


ZINC FINGER 


_C3HC4 


PDOC0044 9 
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Pfam for DKFZphtes3_27ol4 . 1 

HMMNAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypClrrW CPmC* 

C+IC L + P++LPC+H+FCY C++ C +C 

Query 36 CAIC LQT CVHPVSLPCKHVFCYLCVKGASWLGKRCALC 73 



758 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_28dl4 



group: testes derived 

DKFZphtes3_28dl4 encodes a novel 97 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfara or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1279 bp 

Poly A stretch at pos . 1232, no polyadenylation signal found 



1 GGAGCTCAGA AGTTGGGCAA 

51 ACTGAGGAAC ACAGTGGAGA 

101 GACGCAGGAC CTCTCCCAAA 

151 GCTGCGGCGA CAGCAGCTCA 

201 ATTCAGAGCT AAGTTCAAAA 

251 AAAACCCGGT GTGGGAGTCT 

301 TTGATATTCT TAAATTCCTA 

351 CAGAGTGAGG AGGAGACAGG 

401 CACTTCAGTC TGGGTTCAAG 

451 AAAAACACCC TCCTCCCTTC 

501 TGTGATCCCA ACAGAAACAG 

551 CAGGATCAGA AGTAACCAGT 

601 CCCCCGACTC CTTCTGGATA 

651 CAAGGGGACA TACGATGCAT 

701 GGCGACCTCA CTGTTCCTTA 

751 AGAACCCCAT CAGCTGTGAC 

801 CTGACCCCAC ACACAGGACA 

851 AAACCTTCTG AGTAAGAAGA 

901 GTGCTTGGAA GATGCAGACA 

951 CGCTGCGCTT TGCTGCGGTC 

1001 TGCTTTTATT TCTTAAACGG 

1051 TTTCATACCT TTCAATGGGC 

1101 GCTGAGGGCC GACACAATTC 

1151 TGTAAATAAT AAATGTTTTA 

1201 AAAAAAAAAA AAAAAAAAAA 

1251 AAAAAAAAAA AAAAAAAAAA 



AGGTCACAGC AGACTTCCTG AAAAGCAGAC 
GCGGGAGTTC ACAGCGACGC AGCTGAGGAC 
GGTGCTGCAG CTCCAGCACC AGGGGCCAGG 
GCAACCCTTG CTGTGCTCAA GTTCTTGGGG 
TTTAGAAACA GTGCCTTAAA GACGGGCAAG 
GCTCATCTAT GGTTTGTTAC TGCTCTCGCT 
GGTACCAATG AAAAAGCCAA GTGAACGTGG 
AGCGTGTGCA CCTTCCATCT GTGAGAGGCA 
ATGCAGAATG GIGCCTACAG CAAAAAAAAA 
TTTACCATTT GAATGGACAT TTTCCTTACC 
ATCCAGACCT ATCATGTGAA GTCCACGTTC 
TTATGGACTG AGCTTACACG GGAAAGTCTA 
GTAACATACA CAGCTGCATA AAAACGTCTC 
TTGCTTGGTG TCCCAGCCAA GCTCCCCACC 
GAGCTCGAGA GCTCGTCTCC TATCAATCAG 
CAACAGAGCT GGAGCCCTCT GTGGAGGGAG 
GAGCAGAATC CTGATTATTT TACAAACTGC 
CAAAAATATA CATTCCAAGG TATCTGTAAA 
GCTGCACCGA GGGGCTCTGA TCCATCCACA 
ACACACACGG TCTCAGTCAC GTGATGGTTT 
CTGAGTGATA ATCCAGCTAG TGTGCAGTCA 
GTCACCGCAG TGACGCTGCC CCAGCCCCAT 
ACGGAACAGA TTCATCATAT TTGGTCTTTA 
AAATTGCCTA AATATAAAAA AAAAAAAAAA 
AAAGGGCGGC CGAAAAAAAA AAAAAAAAAA 
GGGCGGCCG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 328 bp to 618 bp; peptide length: 97 
Category: putative protein 



1 MKKPSERGRV RRRQERVHLP SVRGTLQSGF KMQNGAYSKK KKNTLLPSLP 
51 FEWTFSLPVI PTETDPDLSC EVHVPGSEVT SLWTELTRES LPPTPSG 



BLASTP hits 



No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_28dl4, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_28dl4 , frame 1 



Report for DKFZphtes3_28dl4 . 1 



[LENGTH] 97 

[MW] 10945.56 

[pi] 9.80 

[PROSITE] MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 12.37 % 

SEQ MKKPSERGRVRRRQERVHLPSVRGTLQSGFKMQNGAYSKKKKNTLLPSLPFEWTFSLPVI 
SEG xxxxxxxxxxxx 



PRD cccccchhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccc 

SEQ PTETDPDLSCEVHVPGSEVTSLWTELTRESLPPTPSG 

SEG 

PRD ccccccccceeeecccccchhhhhhhhhhcccccccc 



Prosite for DKFZphtes3_28dl4 . 1 



PS00004 


2->6 


PS00004 


41->45 


PS00005 


5->8 


PS00005 


21->24 


PS00005 


38->41 


PS00006 


62->66 


PS00006 


64->68 


PS00008 


24->30 


PS00008 


76->82 



CAMP_PHOS PHO_S I TE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFzphtes3_28dl4 . 1) 
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DKFZphtes3_2all 



group: testes derived 

DKFZphtes3_2all encodes a novel 1048 amino acid protein with very weak similarity to mucins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to mucin 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4082 bp 

Poly A stretch at pos . 4060, polyadenylation signal at pos. 4034 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



GAGGACTGCG 
CGGGGCTGGA 
GCGCCCGCCG 
CCAGGCCGGC 
AGGTGGAAAT 
GGGCTGAGCC 
GATAAACCCA 
TCAGTGCCAG 
GAGAAGCAAG 
GTCGACACAC 
CGCCAGCACA 
ATGAAGCCGC 
TCCACCTTCT 
TTACCATGGA 
ATCAGTGGAC 
AAATGTGCAA 
ACATTGGAGC 
TCCAGTTCTA 
TGCTGCTACT 
AGTCTCGGCC 
GCAACTGTCT 
GGCGCATGCT 
ATCCTCCATC 
GTCACAACAA 
ACAGCAGCTT 
CAGTGGCTGC 
TCAGCGACCA 
CATTGTTACC 
CCACCTCAAA 
ACTTCTCCTC 
TCCCATCTCC 
GAAGTGACAA 
ACTTACCCCC 
CACCAGTTCC 
CAAACTCTGC 
CACCTAAACC 
TATTCAAGGG 
CCCCCATTGG 
ATTCACTCAG 
GGGTACTCAG 
CAGATGGAGC 
GCTCCAGCAG 
CAACGCTCCC 
AACCTGCCAC 
GCCACTCCGG 
TCAGCCTACC 
TTCCAACTAT 
CTTTCAACCA 
TGCAGCTGCA 
TCTTGGGCCC 
ATGGATATCA 
TGTGTCTCCA 
GTGACCTACC 
CATGTGATCT 
TGATGAGAAG 



AGCACAGCGG 
CAGCAGCGGG 
AGCACATCGC 
GAGATAGAAG 
GAGTTCTCAA 
AGGCCCCTTC 
GCTGCTACAG 
GGAGCACATG 
AGCCTGTTGT 
CATGCTGTCG 
CCTGACGCCA 
CCCCGAAGCC 
ACCCTGTCAC 
GAGTAGCATC 
AACAGGGCCA 
ATGTCTATCA 
TTCTCATTTA 
AAGTAACCAC 
GCTCAGCCAG 
ACCTGTGACC 
CAGCCACCAG 
ACTGATTCAG 
TGCAGCAATC 
GAATCACACT 
CATACAATGG 
AGCCACAGTA 
CAGCTGGATC 
ATGACAGTAC 
CATCCCAGTC 
GGATCCAGCC 
GGACATCGGG 
CAGACCGTCT 
CTTCTGCATA 
GTGTCCACTA 
CATCACAGCT 
CCATGCAGTT 
ATCCAGCCAG 
GACCCCAGGG 
CAACCCCAAT 
CAGCCTCAGC 
CACAATTGTG 
CAACAACCGT 
GCCCAGGGCT 
AGATGGTGCC 
TCACTGTGTC 
ATTGCCGTCC 
GATTGCAGCA 
TTCCTGGAGC 
CCACCTCCAT 
TCCCGTTCCT 
TGAGGCCAGT 
TCTCTTGCAT 
ACCTGGTGCC 
CAACAGAAGA 
TCCACTGCCA 



CGGCCGGGTG 
CCCCGGGCGC 
CGCCGCCGAG 
CGGGCGGTGC 
CAGTTTCCTC 
TCAGATTGCA 
TCAATGATGA 
AGTTCCAGCA 
GGTAAGGCCC 
CATCAGCCAC 
GCAGTGCCAC 
CACCATGCCT 
TTCCCCCCAA 
CCTCAAGCTT 
TCCCAGTAAC 
TCCGCAGCAA 
CCTCGAGGTG 
AGTCCTGAGG 
CAGTACAGCA 
ACCTCCAATG 
AGCTCAGTCT 
CACTTAGTAG 
AGTATTCAGC 
ACCATCTCAC 
CTCAGAAAAC 
GCACCTATTT 
TGTGTCACAC 
CCTCCCATTC 
GCCAAGGTGG 
AGACTACCCT 
CCTCTCCCAA 
GTTCCCGTTC 
CCCACTGGCG 
TCCGACAGTA 
CAGACTGGTG 
GATGACAGTG 
CACCCATCAG 
ATACAGCCTG 
CAACACACAA 
CTGAAGGAAA 
GCCAACCCTA 
GGTGCAGACC 
CATCGCCACG 
AAACCCAAGT 
CATGGAGACT 
CTCCAACTGC 
GCCAGTCCCC 
GGTCCCCATC 
CAGTCACTGT 
GAAATTAAAG 
TTCTGCAGTT 
TGCTGGCAAA 
TCCCCAAGGA 
AGGTGACATG 
AGAGTCTTCT 



GCGGGGGTGA 
CGCCGCCGCG 
ATGGGCCCTC 
GGGCGGCGGG 
GGTTAGGAGC 
AACAGTGGTT 
ATCTGGTCGA 
GCTCCCTCCA 
TATCCACAGG 
ACCTGTTGCA 
TTTCATTTTC 
AGCCGTCCCA 
GGTICCAGGG 
CAGCCATTCC 
CTGCATCACA 
TGCTCCTGGG 
CAGCTGCTGC 
CCGACCTCAC 
CATCATTCAC 
CCATCCCTCC 
CCAGTCATCA 
GCCAACCTTG 
GTCCTGCCCA 
CCTGCATTAG 
AATCTTCAGT 
TGGCAACCAA 
ACGCAAGCTC 
CTCCCATGCT 
TGCCCCAGCA 
GCCGAGAGGA 
TCCTGTGGCC 
AGTTCCAATA 
GCACATACCT 
TCCAGTTTCA 
TTGGGGTAGC 
GATGCATCGC 
TACCCAGGGT 
CACCACTTGG 
GGGCTTCAGC 
GAC TTCAGCA 
TTAGCAATCC 
CACAGCCAGA 
GCCAAGCATA 
CTGAAATCCA 
GTATCCAATC 
CCAGCAGCCC 
CGTCACAACC 
ACTCCACCCA 
GGGTGGCAGT 
TGAAAGAAGA 
CCTCCACTGG 
CAACTTGTCC 
AAAAGCCTCG 
ATGGAGACAA 
GGTGAAGGCT 



GTGGGGCCAG 
ATCCCTCCCC 
CGCGGCACCC 
CGGCGGCTAC 
CCCTTCTACC 
CTGCTGGATT 
GATTCTGAAG 
GTCCCGGGAG 
TGCAGATGTT 
GTGACAGCCC 
GGAGGGACTT 
TTGCTCCTGC 
CAGGTTACCG 
TCTGGCAACA 
TCATGACTAG 
CCCCCTCTTC 
TGCTGTGATG 
AGCTGCCAAA 
CAACCAATCC 
TGCTGTGGTA 
CTACGACAGC 
TCTATCCAGC 
GTCACGAGAT 
GGACGCCAAA 
ACTGGCACGC 
CACCATTCCT 
CCACAAGTAC 
ACTGCTGTGA 
GATCACGCAC 
GTAGCCTGAT 
ATGGAAACCC 
TTTTTTGCCA 
ACACCCCAAT 
GCTCAGGCTC 
GTCTACCGTC 
ATGCTCGACA 
ATCCAGCCGG 
CACACAGGGA 
CTGCACCTAT 
GTGGTGTTGG 
ATTCAGTGCT 
GTGCTAGCAC 
CTCCGGAAGA 
CGTGTCTATG 
AAAATAATGA 
CCACCGACCA 
AGCCGTTGCC 
TCACCACCAT 
CTTTCCTCCG 
AGTAGAACCA 
CTACCAACAC 
ATGCCTACAA 
AAAGCAACAG 
ACAGCACTGA 
GAGAAGCGCA 
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2751 AGTCTCCTCC CAAGGAGTAT ATTGATGAGG AAGGTGTGAG ATATGTCCCA 
2801 GTGCGTCCAA GACCCCCCAT TACTTTGCTT CGTCACTATC GGAACCCCTG 
2851 GAAAGCTGCT TACCACCACT TTCAGAGGTA CAGTGACGTC CGGGTCAAAG 
2901 AGGAGAAGAA AGCTATGCTG CAGGAAATAG CTAATCAGAA AGGAGTATCC 
2951 TGTCGTGCTC AAGGCTGGAA AGTCCACCTC TGTGCTGCCC AGTTACTACA 
3001 GCTGACGAAT CTAGAACATG ATGTCTATGA AAGACTTACT AACCTGCAGG 
3051 AAGGGATTAT CCCAAAGAAA AAAGCAGCAA CAGATGATGA TCTCCACCGA 
3101 ATAAACGAAC TGATACAGGG AAATATGCAG AGGTGTAAAC TTGTGATGGA 
3151 TCAAATCAGT GAAGCCAGAG ACTCCATGCT TAAGGTTTTA GATCATAAAG 
3201 ACCGTGTCCT GAAGCTGCTT AACAAGAACG GGACTGTCAA AAAAGTGTCC 
32 51 AAATTGAAGC GAAAGGAAAA AGTCTAGACC CAGAACAATC AGGAGATTGG 
3301 AAGCAAATTT ATGAAGAATG ATGGTGGGGG TGGGGGGAGG GTTTTGGTTT 
3351 TTTCCAAAGT GGAACATTGA AATAAAGGAA GTGTTCCTTA GTTCCCGTGT 
3401 GAAAGCAGAG GAACCCATGA CATCCAAGGG CGTGAAAGGA TCAGAGCTGA 
3451 CTGGACATAG TGAGCTGCCT TCTTGCGTTC GGGTGCACCC CTGTTAAACC 
3501 TGATCTGTGT CATAAGTGAC TCCGGATGCA TCAGTGTCCA CCAGTTGGAA 
3551 GCAATGACAA GGATGGCTGG CTGGTGTTTT TCAGCCTTCC GGTTTATAGA 
3601 CTGTATTTAT CTAGTGGATT CCTGCAGGCC CCATACTGAG CCTGGACTGA 
3651 AAGTATCCAC TCGGACCATC TGTTATCTCT CTACACTGAA AATAAAACCT 
3701 CTTCCACCCA CCCCATTCGG TTCTTCTGCC TGACCTTCAA ATGCCCATGT 
3751 TGGCCTTTTA CAGCAGTGCC ACGGCACCAA GCGAGCTGCC ACATCTCACA 
3801 CTCTAAAGGG TTTGAACTAT TAGTTCTTGT CATTTTTTAA AAAAAACCAT 
3851 TCCCAAGTGA AATTGTTATA TCGTCTGTCT TGCGTGTGTC AGAACTGGGT 
3901 TTTTGTGGAG GTTCAGAGCA GGCAACACCA TAAGTTGCTC TCAGATCCTT 
3951 GTTCTGAAGT ACATTCTTGG TTATCTGTAC TTCTGTAGCT GGTGTGATGC 
4001 TGTTAATTGT ATGTACCACA CATCTCCAGA CGTTAATAAA GGACTCAAAG 
4051 AGGTTTTTGT AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 131 bp to 3274 bp; peptide length: 1048 
Category: similarity to known protein 



1 MGPPRHPQAG EIEAGGAGGG RRLQVEMSSQ QFPRLGAPST GLSQAPSQIA 

51 NSGSAGLINP AATVNDESGR DSEVSAREHM SSSSSLQSRE EKQEPVVVRP 

101 YPQVQMLSTH HAVASAT PVA VTAPPAHLTP AVPLSFSEGL MKPPPKPTMP 

151 SRPIAPAPPS TLSLPPKVPG QVTVTMESSI PQASAIPVAT ISGQQGHPSN 

201 LHHIMTTNVQ MSIIRSNAPG PPLHIGASHL PRGAAAAAVM SSSKVTTVLR 

251 PTSQLPNAAT AQPAVQHIIH QPIQSRPPVT TSNAIPPAVV ATVSATRAOS 

301 PVITTTAAHA TDSALSRPTL SIQHPPSAAI SIQRPAQSRD VTTRITLPSH 

351 PALGTPKQQL HTMAQKTIFS TGTPVAAATV APILATNTIP SATTAGSVSH 

401 TQAPTSTIVT MTVPSHSSHA TAVTTSNIPV AKVVPQQITH TSPRIQPDYP 

451 AERSSLIPIS GHRASPNPVA METRSDNRPS VPVQFQYFLP TYPPSAYPLA 

501 AHTYTPITSS VSTIRQYPVS AQAPNSAITA QTGVGVASTV HLNPMQLMTV 

551 DASHARHIQG IQPAPISTQG IQPAPIGTPG IQPAPLGTQG IHSATPINTQ 

601 GLQPAPMGTQ QPQPEGKTSA VVLADG AT I V ANPISNPFSA APAATTVVQT 

651 HSQSASTNAP AQGSSPRPSI LRKKPATDGA KPKSEIHVSM ATPVTVSMET 

701 VSNQNNDQPT IAVPPTAQQP PPTIPTMIAA ASPPSQPAVA LSTIPGAVPI 

751 TPPITTI AAA PPPSVTVGGS LSSVLGPPVP EIKVKEEVEP MDIMRPVSAV 

801 PPLATNTVSP SLALLANNLS MPTSDLPPGA SPRKKPRKOQ HVISTEEGDM 

851 METNSTDDEK STAKSLLVKA EKRKSPPKEY IDEEGVRYVP VRPRPPITLL 

901 RHYRNPWKAA YHHFQRYSDV RVKEEKKAML QEIANQKGVS CRAOGWKVHL 

951 CAAQLLQLTN LEHDVYERLT NLQEGIIPKK KAATDDDLHR INELIQGNMQ 

1001 RCKLVMDQIS EARDSMLKVL DHKDRVLKLL NKNGTVKKVS KLKRKEKV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2all, frame 2 

SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2)., N = 1, 
Score = 334, P = 2 . 4e-25 



762 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



PIR:A43932 mucin 2 precursor, intestinal - human (fragments), N = 1, 
Score = 321, P = 3.2e-24 

TREMBL:D88440_1 product: "high molecular mass nuclear antigen"; Gallus 
gallus mRNA for high molecular mass nuclear antigen, partial cds . , N = 
1, Score = 312, P = 8.3e-24 

PIR:S48478 glucan 1, 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N = 1, Score - 300, P - 2.1e-22 



>SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). 
Length = 5,179 

HSPs: 

Score = 334 (50.1 bits), Expect = 2.4e-25, P = 2.4e-25 
Identities = 184/770 (23%), Positives = 263/770 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 3471 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3530 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3531 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3589 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3590 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3649 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3650 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3706 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3707 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTT PI TTTTTVT PTPT PTGTQTPTTT PITT 3766 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3767 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3825 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ -t- +P P +T + + P+ + PT P+ 

Sbjct: 382 6 QTPTTTPI TTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPT PTG — TQTP 3874 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 3875 TTT PI TTTTTVT PTPT PTGTQTPTTT PI TTTTTVTPTPTP — TGTQTPTTT PI TTTTTVT 3932 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3933 PTPTPTGTQTPTTTP I TTTTTVT PTPTPTGTQ-TPTTT PI TTTTTVTPTPTPTGTQTPTT 3991 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P-t- TTT +Q+ +T ++ P+ 

Sbjct: 3992 TP I TTTTTVT PTPT PTGTQTPTTT PI TTTTTVT PTPTPTGTQTPTTTP I TTTTTVT PTPT 4051 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP — PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 4052 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTT PITT 4111 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPS VTVGGSLSSVLGP-PVPEI 782 

P+ TP PIT TT+ PP+ T +++ + PPP 

Sbjct: 4112 TTTVTPTPTPTGTQT-PTTTPITTT-TTVT PTPTPTGTQTPTTTP I TTTTTVTPTPTPTG 4169 

Query: 783 KVKEEVEPMDIMRPVSAVP-PLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

P+ V+ P P T T P+ A + TS+ PP +S + R 

Sbjct: 4170 TQTPTTTP I TTTTTVT PTPTPTGTQTGPPTHTSTAPIAELTTSNPPPES ST PQT5RSTSS 4229 

Query: 842 VISTEEGDMMET 853 

+ TE ++ T 
Sbjct: 4230 PL-TESTTLLST 4240 

Score = 328 (49.2 bits), Expect = 1.0e-24, P = 1.0e-24 
Identities = 180/745 (24%), Positives = 254/745 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 3540 VTPTPTPTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTT PI TTTTTVT PTPTPTGTQTPT 3599 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3600 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3558 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

SbjCt: 3659 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3718 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T+T+PTT T + T + + P 
SbjCt: 3719 TTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3775 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3776 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3835 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3836 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3894 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3895 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 3943 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 

Sbjct: 3944 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 4001 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4002 PTPT PTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4060 

Query: 614 -PEGKTSAVVLADGATI VANPISNPFS AAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 40 61 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4120 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T+ TPPTQPTP 

Sbjct: 4121 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4180 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAA-PPPSVTVGGSLSSVLGPPVPEIKVKEE 787 

P+ TP T PI + + PPP + + S P + 

Sbjct: 4181 TTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSSPLTESTTLLST 4240 

Query: 788 VEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMP — TSDLPPGASPR 833 

+ P M S PP +T T +P+ + LS P T+ PPG R 

SbjCt: 4241 LPPAIEM--TSTAPP-STPT-APTTTSGGHTLSPPPSTTTSPPGTPTR 4284 

Score = 325 (48.8 bits), Expect = 2.2e-24, P = 2.2e-24 
Identities = 186/782 (23%), Positives - 261/782 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-7PAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 34 94 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3553 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3554 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3612 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 2 68 

+ P P+ + P +++ +TT T TPI 

Sbjct: 3613 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3672 

Query: 2 69 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3673 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3729 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3730 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3789 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3790 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3848 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +p P +T + + P+ + PT P+ 

Sbjct: 3849 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3897 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 5 60 
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T TPIT++ + T P QP+ITTV T QT 
Sbjct: 3898 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3955 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3956 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4014 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 4015 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4074 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 4 07 5 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 4134 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P I V 
Sbjct: 4135 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQT PTTTPITTTTTV 4184 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQHVISTEEG 848 

P PP T-t-T +P L+N PSP+ P + + + 

Sbjct: 4185 TPTPTPTGTQTGPPTHTST-APIAELTTSM-PPPESSTPQTSRSTSSPLTESTTLLSTLP 4242 

Query: 849 DMMETNSTDDEKSTAKSLLVKAEKRKSPP 877 

+E ST + SPP 

Sbjct: 4243 PAIEMTSTAPPSTPTAPTTTSGGHTLSPP 4271 

Score = 324 (48.6 bits). Expect = 2.8e-24, P = 2.8e-24 
Identities = 170/717 (23%), Positives = 248/717 (34%) 

Query: 95 PVVVRPYPQVQMLSTHHAVASATP — VAVTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSR 152 

P P P +T + +P T PP TP+ P++ + + P P+ P 

Sbjct: 1401 PPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPL-PTTTPSPPIS 1459 

Query: 153 PIAPAPPSTLSLPPKVPGQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

PP-t-T PP TS +PT+ P I + 

Sbjct: 1460 TTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTPITPPASTTT 1516 

Query: 213 I IRSNAPGPPLHTGASHLPRGAAAAAVMSSSKVTTVLRPTSQ — LPNAATAQPAVQH 1 1 H 270 

+ + PPP +P S T + PTS LP T P 

Sbjct: 1517 LPPTTTPSPPTTTTTTPPP TTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTT 1571 

Query: 271 QPIQSRP-PVTTSNAIPPAVVATVSA-TRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

P + P P TT+ PP + T T SP TTT + S PT + PP++ 

Sbjct: 1572 EPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTS 1631 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPILATNT 388 

+ + T T P P TP T I +T TP T + + T 

SbjCt: 1632 TTTLPPTTTPSPPPTTTTTP — PPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTTP 1689 

Query: 389 IPSATTAGSVSHTQAPTSTIVTMTVPSHSSHATAV-TTSNIPVAKVVPQQITHTSPRIQP 4 47 

P TT + S T P+S ITTPS+++ TT P P TT +P 

Sbjct: 1690 SPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPV-QFQYFLPTYPPSAY-P — LA 500 

+ + P+ P T + P VP+ + +L + P+ + P L 

Sbjct: 1750 TTTSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPN FHKPGGDTELI 1809 

Query: 501 AHTYTPITSSVSTIR--QYP-VSAQAPNSAITAQTGVG-VASTVIILNPMQLMTVDASHAR 556 

P ++ + R YP V + VG + P ++ + A 

Sbjct: 1810 GDVCGPGWAANISCRATMYPDVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPM-AFCLN 1868 

Query: 557 HIQGIQPAPISTQGIQEAPIGTPGIQ- PAPLGTQGIHSATPINTQGLQPAPMGTQQPQ — 613 

+ +Q TQ P+T +PPTI+T+ PP GTQ P 

Sbjct: 18 69 YEINVQCCECVTQ PTTMTTTTTENPTPPTTTPITTTTTVTPT PTPTGTQTPTTT 1922 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 1923 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 1982 

Query: 673 KKPAT DGAKPKS E I H VSMAT P VT VSMETVSNQNNDQPTI AVP PTAQQPPPT I PTMIA 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 1983 TGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTT 2042 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2043 TTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPT 2096 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + E T TV+P+ 

Sbjct: 2097 PTGTQTPTTT-PITTTTTVTPT 2117 
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Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 2068 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2127 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 2128 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2186 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 2187 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2246 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2247 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2303 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2304 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2363 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2364 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2422 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2423 QT PTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2471 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA- ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2472 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTTPITTTTTVT 2529 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2530 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2588 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P + TTT +Q+ +T ++ P+ 

Sbjct: 2589 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2648 

Query: 672 RKKPATDGAKPKSE I HVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T + TPPTQPTP 

Sbjct: 2649 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2708 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2709 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 2762 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P + 

Sbjct: 2763 TPTGTQTPTTT-PITTTTTVTPT 2784 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2206 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2265 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2266 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2324 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2325 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2384 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2385 TTTTTVTPTPTPTGTQTPTTTP I TTTTTVTPTPTPTGTQT PTTTPITTTTTVT PTPT 2441 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T+T TP T PI 

Sbjct: 2442 PTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTTPI TTTTTVTPTPTPTGTQTPTTTP ITT 2501 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 
T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 
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Sbjct: 2502 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2560 

Query: 44 4 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2561 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2609 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT + + + T PQP+ITTV T QT 

Sbjct: 2610 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGT QTPTTTPITTTTTVT 2667 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2668 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2726 

Query: 614 -PEGKTSAVVLADGATI VANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P + TTT +Q+ +T ++ P+ 

Sbjct: 2727 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2786 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTTAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2787 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2846 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2847 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT--GTQTPTTTPITTTTTVTPTP 2900 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2901 TPTGTQTPTTT-PITTTTTVTPT 2922 

Score = 318 (47.7 bits). Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2321 VTPTPTPTGTQTPTTTP I TTTTT VTPTPTPTGTQTPTTTP I TTTTTVT PTPTPTGTQTPT 2380 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASA: PVATI 3GQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V t 

Sbjct: 2381 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPI TTTTTVT PT 2439 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2440 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2499 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+■ P T P T + T+PTT T + T++ P 
Sbjct: 2500 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2556 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQ1HTMAQKT-I FSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2557 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTIPITTTTTVTPTPTPTGTQTPTTTPITT 2616 

Query: 386 TNTI -PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2617 TTT VTPTPTPTGTQTPTTTPITTTTT VTPTPTPTGTQTPTTTP I TTTTTVTPTP-TPTGT 2675 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2676 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2724 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 
Sbjct: 2725 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTT PI TTTTTVT 2782 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2783 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2841 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2842 TPI TTTTT VTPTPTPTGTQTPTTTPITTTTT VTPTPTPTGTQTPTTTP ITT TTT VT PTPT 2901 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2902 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2961 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 2962 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTT PI TTTTT VTPTP 3015 
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Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3016 TPTGTQTPTTT-PITTTTTVTPT 3037 

Score - 318 (47.7 bits). Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 2390 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2449 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ II V t 

Sbjct: 2450 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2508 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 2509 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2568 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 2569 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2625 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQL.HTMAQKT-I FSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

SbjCt: 2626 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2685 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ 4- T P +T T+T P+ + T TT VP T T 

Sbjct: 2686 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2744 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2745 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTP 2793 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 

Sbjct: 2794 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2851 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- "613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

SbjCt: 2852 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2910 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P + TTT +Q+ +T H+ P + 

SbjCt: 2911 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2970 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T+ TPPTQPTP 

Sbjct: 2971 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3030 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 3031 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT--GTQTPTTTPITTTTTVTPTP 3084 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3085 TPTGTQTPTTT-PITTTTTVTPT 3106 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASAT PVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 2459 VTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2518 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2519 TTPITTTTTVTPTPTPTGTQTPTTTEITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2577 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2578 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2637 

Query: 269 IHQPIQSRPPVTTSNA1 PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
SbjCt: 2638 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2694 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2695 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2754 
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Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

SbjCt: 2755 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2813 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2814 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2862 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 

Sbjct: 2863 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP— TGTQTPTTTPITTTTTVT 2920 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2921 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2979 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P + TTT +Q+ +T ++ P+ 

Sbjct: 2980 TPITTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3039 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T + TPPTQPTP 

Sbjct: 3040 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3099 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3100 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3153 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3154 TPTGTQTPTTT-PITTTTTVTPT 3175 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2528 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 2587 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P P G T T + p T +G Q P+ TT V + 

Sbjct: 2588 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 2646 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 2647 PTPTGTQTPTTTPITTTTTVTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2706 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T + T+PTT T + T++ P 
Sbjct: 2707 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2763 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTKAQKT-I FSTGTPVAAAT — VAPILA 385 

Q P + TT P + GT + T + T TP T PI 

Sbjct: 2764 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2823 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2824 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 2882 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2883 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2931 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 

Sbjct: 2932 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 2989 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2990 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3048 

Query: 614 - PEGKTS AVVLADGAT I VANPI SNPFSAAPAAT-TVVQTHSQSASTNAPAQGSS PRPS I L 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 3049 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3108 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3109 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3168 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
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P+ TP PIT TT P P+ T G+ + P V 
Sbjct: 3169 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3222 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3223 TPTGTQTPTTT-PITTTTTVTPT 3244 

Score - 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
Sbjct: 3080 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3139 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ IT V + 

Sbjct: 3140 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3198 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3199 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 3258 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3259 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3315 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3316 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3375 

Query: 386 TNTI- PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3376 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3434 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3435 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 3483 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T Q T 

Sbjct: 3484 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 3541 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3542 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 3 600 

Query: 614 -PEGKTSAVVLADGATI VANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P + 

Sbjct: 3601 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 3660 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3661 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTTP ITT 3720 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 3721 TTTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT— GTQTPTTTPITTTTTVTPTP 3774 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3775 TPTGTQTPTTT-PITTTTTVTPT 3796 

Score = 313 (47.0 bits), Expect = 4.2e-23, P = 4.2e-23 
Identities = 169/695 (24%), Positives = 245/695 (35%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 3655 VT PTPT PTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTTP I TTTTTVT PTPTPTGTQTPT 3714 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V t 

Sbjct: 3715 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3773 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3774 PT PTGTQTPTTT PI TTTTTVT PTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTT PI 3833 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3834 TTTTTVTPTPT PTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTTP I TTTTTVT PTPT 3890 
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Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3891 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3950 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3951 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4009 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 4010 QTPTTTPITTTTTVT PT PT PTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 4058 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 
SbjCt: 4059 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4116 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4117 PTPTPTGTQTPTTT PI TTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPT PTGTQTPT- 4174 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKK 674 

T+ + T+ P P T ++ ++N P + S+P+ S 

Sbjct: 4175 TTPITTT — TTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPESSTPQTSRSTSS 4229 

Query: 675 PATDGAKPKSEIH — VSMATPVTVSMETVSNQNNDQPTIAVPP-TAQQPP — PTIPTMIA 729 

PT+ S++M+ ST + T++ PP T PP PT T 

Sbjct: 4230 PLTESTTLLSTLPPAIEMTSTAPPSTPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTG 4289 

Query: 730 AASPPSQPAVALSTI PGAVPITPP--ITTIAAAP-PPSVTVGGSLSSVLGPPVPEI 782 

+ + S P+ V +T P P++ PIT P P SV + L+ P E+ 

Sbjct: 4290 SSSAPTPSTVQTTTTSAWTPTPTPLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV 4349 

Score = 279 (41.9 bits), Expect = 1.8e-19, P = l.Be-19 
Identities = 138/540 (25%), Positives = 194/540 (35%) 

Query: 278 PVTTSNAIPPAVVATVSATRAQSPVITTTAAH ATDSALSRP--TLSIQHPPSAA 329 

P+TT + + P T+T +P + TTT T + + P T + P 

Sbjct: 1946 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2005 

Query: 330 ISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILAT 386 

Q P + TT P+ GT + T + T TP T PI T 

Sbjct: 2006 TGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPTTT PI TTT 2065 

Query: 387 NTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSPR 444 

T+ P+ T G+ + T P +T T+T P++ TTT VP TT+ 

Sbjct: 2066 TTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGTQ 2124 

Query: 445 IQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHT 503 

P ++ + +P P +T + + P+ + PT P+ T 

Sbjct: 2125 TPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG--TQTPT 2173 

Query: 504 YTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGI 561 

TPIT++ +T P QP+ITTV T QT 
Sbjct: 2174 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVTP 2231 

Query: 562 QPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ-- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2232 TPTPTGTQTPTTTPITTTTTVT PTPTPTGTQ-TPTTTPITTTTTVT PTPTPTGTQTPTTT 2290 

Query: 614 PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2291 PITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP 2350 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMIA 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2351 TGTQTPTTTP I TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPTTT PI TTT 2410 

Query: 730 AASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

P+ T P PIT TT P P+ T G+ + P V 

Sbjct: 2411 TTVTPTPTPTGTQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPT 2464 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2465 PTGTQTPTTT-PITTTTTVTPT 2485 

Score = 265 (39.8 bits), Expect = 5.8e-18, P = 5.8e-18 
Identities = 179/746 (23%), Positives = 257/746 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3678 VTPTPTPTGTQTPTTTP I TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPT 3737 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V +■ 

Sbjct: 3738 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3796 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3797 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPTTTPI 3856 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
Sbjct: 3857 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3913 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3914 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3973 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3974 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4032 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 4033 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 4081 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T PQP+ITTV T QT 
Sbjct: 4082 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVT 4139 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4140 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTGPP 4198 

Query: 615 EGKTSAVVLADGATIVANPISNPFSAAPA ATTVVQTHSQSA-STNAPA-- -QGSSPRP 668 

TS +A+ T +NP P S+ P +T+ T S + ST PA S+ P 

Sbjct: 4199 T-HTSTAPIAELTT--SNP--PPESSTPQTSRSTSSPLTESTTLLSTLPPAIEMTSTAPP 4253 

Query: 669 SILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMI 728 

S TG S + +P +++PT+T T PT 

Sbjct: 4254 STPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS AWT- PTPT 4312 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

+ + P L P +V I + AP V G+ + E 

Sbjct: 4313 PLSTPSIIRTTGLRPYPSS VLICCVLNDTYYAPGEEV-YNGTYGDTCYFVNCSLSCTLEF 4371 

Query: 789 EPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

S P + +T +PS ++ S PT P P P +Q++ 
Sbjct: 4372 YNWSCPSTPSPTPTPSKSTPTPSKP — SSTPSKPTPGTKPPECPDFDPPRQEN 4422 

Score = 254 (38.1 bits), Expect = 8.7e-17, P = 8.7e-17 
Identities = 167/697 (23%), Positives = 245/697 (35%) 

Query: 115 SATPVAVTAPPAHLTPAVPLSFSEGLMKPPPK — PTMPSR-PIAPAPPSTLSLPPKV-PG 170 

3 + T PP TP+ P + + PPP P+ P+ PI P P ST +LPP P 

Sbjct: 1587 SPPTITTTTPPPTTTPSPPTTTTT TPPPTTTPSPPTTTPITP-PTSTTTLPPTTTPS 1642 

Query: 171 QVTVTMESSIPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHL 230 

T + P + PT+ + TT I + PPP + 

Sbjct: 1643 PPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPI — TTTPSPPTTTMTTPS 1700 

Query: 231 PRGAAAAAVMSSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQS-RPPVTTSNAIPPAV 289 

P SS +TT P+S + P P + PP TT +PP 

Sbjct: 1701 P TTTPSSPITTTTTPSS TTTPSPPPTTMTTPSPTTTPSPPTTTMTTLPPTT 1751 

Query: 290 VATVSATRAQSPVITT-TAAHATDSALSRPTLSIQH PPSAAISIQRPAQSRDVTTR 344 

++ T PITT++++P + + + S ++P + + 

Sbjct: 1752 TSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELIGD 1811 

Query: 345 ITLPSHPALGTPKQQLHTMAQKTI FSTGTPVAAATVAPILATN TIPSATTAGS 397 

+ PA++++ IGV ++N IPA 

Sbjct: 1812 VCGPGWAANISCRATMYP--DVPIGQLGQTVVCDVSVGLICKNEDQKPGGVIPMAFCLNY 1869 

Query: 398 VSHTQAPTSTI — VTMTVPSHSSHATAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSS 455 

+ Q TMT + + + T TT+ I V T T + P + + 

Sbjct: 1870 EINVQCCECVTQPTTMTTTT-TENPTPPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTT 1928 

Query: 456 LIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHTYTPITSSVS-T 513 

+ +P P +T + + P+ + PT P+ T TPIT++ + T 

Sbjct: 1929 TVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTPTTTPITTTTTVT 1977 

Query: 514 IRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQ 572 
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PQP+ITTV T QT PPTQ 



Sbjct: 


1978 


PTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVTPTPTPTGTQTPT 


2035 


Query 


573 


PAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ--PEGKTSAVVLA 


624 






PI T P P GTQ + TPI T P P GTQ P P T+ V 




Sbjct: 


2036 


TTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT 


2094 


Query : 


62 5 


DGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPK 


683 






T P+P+ TT T +Q+ +T ++ P+ TP 




Sbjct: 


2095 


PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 


2154 


Query : 


684 


SEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMIAAASPPSQPAVA 


740 






+ TP +T + T P PT Q P T P P+ 




Sbjct: 


2155 


TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTG 


2214 


Query: 


741 


LSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAV 


800 




T P PIT TT P P+ T G+ + P V P P + 




Sbjct: 


2215 


TQT-PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTPTPTGTQTPTTT- 


2267 


Query: 


801 


PPLATNTVSPS 811 








P T TV+P+ 




Sbjct: 


2268 


PITTTTTVTPT 2278 




Score 


= 243 


(36.5 bits), Expect = 1.3e-15, P = 1.3e-15 





Identities = 110/406 (27%), Positives = 154/406 (37%) 

Query: 121 VTAP-PAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESS 179 

+T P P TP+ P + + L P P+ P+ PP+T PP T + ++ 

Sbjct: 1396 ITTPSPPTTTPSPPPTTTTTL-PPTTTPSPPTTTTTTPPPTTTPSPPITT — TTTPLPTT 1452 

Query: 180 IPQASAIPVATISGQQGHPSNLHHIMTTNVQMSIIRSNAPGPPLHIGASHLPRGAAAAAV 239 

P P++T + P + TT + P PP + P 
Sbjct: 1453 TPSP PISTTTTPP — PTTTPSPPTTTPSPP TTTPSPPTTTTTTPPP TT 1498 

Query: 240 MSSSKVTTVLRP TSQLPNAATAQPAVQHIIHQPIQSRP-PVTTSNAIPPAVVATVSA 295 

S +TT + P T+ LP TP P+PP TT + PP T+ 

Sbjct: 1499 TPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPP 1558 

Query: 296 TRAQSPVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDV-TTRITLPSHPALG 354 

T SP TTT t S PT + PP+ + P + TT T P P 

Sbjct: 1559 TTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTP — PPTT 1616 

Query: 355 TPKQQLHTMAQKTI FSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTIVTMTVP 414 

TP T +T P T +P T T P TT S T P+ I T T P 

Sbjct: 1617 TPSPPTTTPITPPTSTTTLP-PTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTP 1675 

Query: 415 SHSSHATA-VTTSNIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMET 473 

++ ++ 4-TT+ P + TSP PP ++PS SPPMT 

Sbjct: 1676 PPTTTPSSPITTTPSPPTTTM TTPSPTTTPSSPITTTTT-PSSTTTPSPPPTTMTT 1730 

Query: 474 RSDNR-PSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNS 526 

S PS P LP S+ PL T TP+ S++ PS P + 

Sbjct: 1731 PSPTTTPSPPTTTMTTLPPTTTSS-PL TTTPLPPSITPPTFSPFSTTTPTT 1780 

Score = 189 (28.4 bits), Expect = 8.0e-09, P = 8.0e-09 
Identities = 92/374 (24%), Positives = 133/374 (35%) 

Query: 439 THTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYF-LPTYPPSAY 497 

T+PPP+++P++P PS P+ LPT PS 

Sbjct: 1398 TPSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSP- 1456 

Query: 498 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

P++ T P T++ S P S T T +T PM +T AS 

Sbjct: 1457 PISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTT 1516 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P +T P P TP +P T I P +T L P T P P 
Sbjct: 1517 LPPTTTPSPPTTTTTTPPPTTTP SPPTTTPI — TPPTSTTTLPP TTTPSPPP 1566 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAP — AQGSSPRPSILRKK 674 

T+ T +P P + P+ T+ T +T +P ++P P+ 

Sbjct: 1567 TTTTT PPPTTTPSP PTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSP 1620 

Query: 675 PATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAV-PPTAQQPPPTI PTMIAA — A 731 

PTP++PT + PT PPT P P I T 

Sbjct: 1621 PTTTPITPPTS — TTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPT 1678 

Query: 732 SPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPV PEIKVK 785 

+ PS P + P TP TT ++P + T S ++ PP P 

Sbjct: 1679 TTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTFSPTTTPS 1738 
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Query: 786 EEVEPMDIMRPVSAVPPLATNTVSPSL 812 

M + P + PL T + PS+ 
Sbjct: 1739 PPTTTMTTLPPTTTSSPLTTTPLPPSI 1765 

Score = 185 (27.8 bits), Expect = 1.6e-09, P = 1.6e-09 
Identities = 71/270 (26%), Positives = 99/270 (36%) 

Query: 563 PAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATP INTQGLQPAPMGTQQPQ PEG 616 

p+p +T P P TP P T + + TP I+T P P T P P 
Sbjct: 1422 PSPPTTTTTTPPPTTTPS-PPITTTTTPLPTTTPSPPISTT-TTPPPTTTPSPPTTTPSP 1479 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ T P + P +P TT + T S +T P SP + P 
Sbjct: 1480 PTTTPSPPTTTTTTPPPTTTP SPPMTTPI-TPPASTTTLPPTTTPSPPTTTTTTPPP 1535 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQ 736 

T P + TP+T T + P+ P T PPPT + PS 

Sbjct: 1536 TTTPSPPT TTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTTPSP 1588 

Query: 737 PAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRP 796 

P + +T P +PP TT PPP+ T ++ + PP + P P 

Sbjct: 1589 PTITTTTPPPTTTPSPPTTT-TTTPPPTTTPSPP7TTPITPPTSTTTLPPTTTPSP — PP 1645 

Query: 797 VSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

+ P T T SP + T+ PP +P 

Sbjct: 1646 TTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTP 1681 

Score = 183 (27.5 bits), Expect = 3.4e-09, P = 3.4e-09 
Identities - 91/390 (23%), Positives = 139/390 (35%) 

Query: 326 PSAAISIQRPAQSRDVTTR-ITLPSHPALGTPKQQLHTMAQKTIFSTGTPVAAATVAPIL 384 

PS + P + T TPSPT T I+T TP+ T +P + 

Sbjct: 1399 PSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTPSPPI 1458 

Query: 385 ATNTI PSATTAGSVSHTQAPTSTI VTMTVPSHSSHATAVTTSNI P--VAKVVPQQITHTS 442 

+T T P TT S T P+ T + P+ ++ TT+ P + P T T 

SbjCt: 1459 STTTTPPPTTTPSPP-TTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTPITPPASTTTL 1517 

Query: 443 PRIQPDYPAERSSLIPISGHRASP NPVAMETRSDNRP — SVPVQFQYFLPTYPPSAY 497 

P P++P SP P+T+P+P T PP+ 

Sbjct: 1518 PPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTTPPPTTT 1577 

Query: 498 PLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQL-MTVDASHAR 556 

P T TP +++T P + +P T T +T P +T S 

Sbjct: 1578 PSPPTTTTPSPPTITTTTPPPTTTPSPP TTTTTTPPPTTTPSPPTTTPITPPT3TTT 1634 

Query: 557 HIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEG 616 

P+P T PPTPPPT TT P P 

Sbjct: 1635 LPPTTTPSPPPTTTTTPPPTTTPS--P-PTTTTPSPPITTTTTPPPTTTPSSPITTTPSP 1691 

Query: 617 KTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAPAQGSSPRPSILRKKPA 676 

T+ + T ++PI+ + P++TT + +T +P SP + + P 

Sbjct: 1692 PTTTMTTPSPTTTPSSPITT— TTTPSSTTTPSPPPTTKTTPSPTTTPSPPTTTMTTLPP 1749 

Query: 677 TDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPP 715 

T + P + + P +++ T S + PT P 
Sbjct: 1750 TTTSSPLT TTPLPPSITPPTFSPFSTTTPTTPCVP 1784 

Score = 176 (26.4 bits), Expect = 1.8e-07, P = 1.8e-07 
Identities = 101/402 (25%), Positives = 142/402 (35%) 

Query: 345 ITLPSHPALGTPKQQLHTMAQKTIFSTGT PVAAATVAPILATNTIPSATTAGSVSHTQAP 404 

IT PS P TP T +T +P T P T P TT + T P 

Sbjct: 1396 ITTPSPPTT-TPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTTTP 1454 

Query: 405 TSTIVTMTVPSHSSHATAVTTS-NIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHR 463 

+ I T T P ++ + TT+ + P P T T+P P PI+ 

Sbjct: 1455 SPPISTTTTPPPTTTPSPPTTTPSPPTTTPSPPTTTTTTP — PPTTTPSPPMTTPITPP- 1511 

Query: 4 64 ASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVS AQA 523 

AS + T PS P T PP+ P + T TPIT ST P + + 

Sbjct: 1512 ASTTTLPPTTT PSPPTTTT TTPPPTTTP-SPPTTTPITPPTSTTTLPPTTTPS 1563 

Query: 524 PNSAITAQ TGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTP 579 

P T T +T +P + T P+P +T P P TP 

Sbjct: 1564 PPPTTTTTPPPTTT PSPPTTTT PS PPTITTTTPPPTT TPSPPTTTTTTPPPTTTP 1618 

Query: 580 G IQPAPLGTQGIHSAT PINTQGLQPAPMGTQQPQPEGKTSAVVLADGATIV 630 

IPPT+TPT PPTP S + 

Sbjct: 1619 SPPTTTPITP-PTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPP 1677 
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Query: 631 ANPISNPFSAAPAA-TTVVQTHSQSASTNAP-AQGSSPRPSILRKKPATDGAKPKSEIHV 688 

S+P + P+ TT + T S + + ++P ++P + P T P 
Sbjct: 1678 TTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP T 1734 

Query: 689 SMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPG 746 

+ +P T +M T+ P P PPT + + P+ P V L G 

Sbjct: 1735 TTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITPPTFSPF — STTTPTTPCVPLCNWTG 1790 

Score = 168 (25.2 bits), Expect = 9.3e-08, P = 9.3e-08 
Identities = 89/387 (22%), Positives = 133/387 (34%) 

Query: 448 DYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPI 507 

DY + P+ +P+P T + +PP PTPSP TP 

Sbjct: 1381 DYKIRVNCCWPMDKCITTPSP PTTTPSPP— PTTTTTLPPTTTPSP-PTTTTTTPPP 1434 

Query: 508 TSSVS TIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPA 564 

T++ S T P+ P+ 1+ T +T P T + P+ 
Sbjct: 1435 TTTPSPPITTTTTPLPTTTPSPPISTTTTPPPTTT PSPPTTTPSPPTT TPS 1485 

Query: 565 PISTQGIQPAPIGTPGI-QPAPLGTQGIHSATPINTQGLQPAPMGTQQPQ PEGKTSA 620 

P +T P P TP P+ + P T P TP P T + 

Sbjct: 1486 PPTTTTTTPPPTTTPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTT 1545 

Query: 621 VVLADGATIVANPISNPFSAAPAATTVVQTHSQSA-STNAPAQGS SPRPSILRKKP 675 

+ +T P + P TT T + S +T P+ + +P P+ P 

Sbjct: 1546 PITPPTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPP 1605 

Query: 676 ATDGAKPKSEIHVS — MATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTI PTMIAAASP 733 

TP S TP+T T + P+ P T PPPT + 

Sbjct: 1606 TTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPS-PPPTTTTTPPPTTTPSPPTTTT 1664 

Query: 734 PSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGP PVPEIKVKEEVE 789 

PS P +T P + PITT + P ++T ++ P P 

Sbjct: 1665 PSPPITTTTTPPPTTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPP 1724 

Query: 790 PMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

P + P P T +L ++T+ LPP +P 

Sbjct: 1725 PTTMTTPSPTTTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITP 1767 

Score = 154 (23.1 bits), Expect = 2.7e-06, P = 2.7e-06 
Identities - 70/277 (25%), Positives - 92/277 (33%) 

Query: 565 PISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAVVLA 624 

PIST PPTPPPT + TP PTPPT + 

Sbjct: 1457 PISTT-TTPPPTTTPS--P-PTTTPSPPTTTPSPPTTTTTTPPPTTTPSPPMTTP — ITP 1510 

Query: 625 DGATI VANPISNPFSAAPAATTVVQTHSQSASTNAP AQGSSPRPSILRKKPATDGA 680 

+T P + P TT T + S T P ++ P+ P T 

SbjCt: 1511 PASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTT 1570 

Query: 681 KPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQ--PPPTIPTMIAAASPPSQPA 738 

P S T T S T + + T PPT PPPT T + P P 

Sbjct: 1571 TPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTT-TPSPPTTTPITPP 1629 

Query: 739 VALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVS 798 

+ +T+P +PP TT PPP+ T ++ PP+ + 

Sbjct: 1630 TSTTTLPPTTTPS PPPTT-TTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTT 1688 

Query: 799 AVPPLATNTV SPSLALLANNL— SMPTSDLPPGASPRKKP 836 

PP T T +PS + S T PP P 

Sbjct: 1689 PSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP 1733 

Score = 148 (22.2 bits), Expect = l.le-05, P = l.le-05 

Identities = 62/254 (24%), Positives = 89/254 (35%) 

Query: 583 PAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAV VLADGATIVANPISNP 637 

P+P T SPTLP TPPT+ +T P+ 

Sbjct: 1399 PSPPTTTP — SPPPTTTTTLPP TTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTT 1452 

Query: 638 FSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKKPATDGAKPKSEIHVS--MATPVT 695 

+ P +TT T+++P SPP+ PT P SM TP+T 

Sbjct: 1453 TPSPPISTTT--TPPPTTTPSPPTTTPSP-PTTTPSPPTTTTTTPPPTTTPSPPMTTPIT 1509 

Query: 696 VSMETVSNQNNDQPTIAVPPTAQQPPPTIPTHIAAASPPSQPAVALSTIPGAVPITPPIT 755 

T + P+ T PP T P+ + P P + +T+P +PP T 

Sbjct: 1510 PPASTTTLPPTTTPSPPTTTTTTPPPTTTPS — PPTTTPITPPTSTTTLPPTTTPSPPPT 1567 

Query: 756 TIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALL 815 

T PPP+ T ++ PP + PP T P+ + 

Sbjct: 1568 T-TTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPI 1626 
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Query: 816 ANNLSMPTSDLPPGASPRKKP 836 

S T+ LPP +P P 
Sbjct: 1627 TPPTS — TTTLPPTTTPSPPP 1645 

Score = 131 (19.7 bits), Expect = 1.2e-03, P = 1.2e-03 
Identities = 112/492 (22%), Positives = 174/492 (35%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 
V PP T ++TVTP TP + +PPPT P 

3 977 VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 

155 A-PAPPSTLSLPPKVP-GQVTVTMESS1PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 
P +T P PGTT +PT+GQP+ TTV + 

4037 TTPITTTTTVTPT PTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 

213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 

+ P P+ + P +++ +TT T T P I 

4096 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 

2 69 IHQPIQSRPPVTTSNAIPPA — VVATVSATRAQS PVITTTA — AHATDSALSRPTLSI QH 
+ PT P + T + T +PTT H + + ++TS 
415 6 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPP 

325 PPSAAI SIQRPAQS — RDVTTRI -TLPSHPALGTPKQQLHTMAQKTI FSTGTPVAAATVA 
P S+ R S + TT + TLP PA+ + T T + T T++ 
4216 PESSTPQTSRSTSSPLTESTTLLSTLP — PAI EMTSTAPPSTPTAPTTTSGGHTLS 

382 PILATNTIPSAT-TAGSVS-HTQAPTSTI VTMTVPSHSSHATAVTTSNI PVAKVVPQQIT 
P +TTP TTG++ + APT + V T S A T + P++ P I 

4 270 PPPSTTTSPPGTPTRGTTTGSSSAPTPSTVQTTTTS AWTPTPTPLS — TPSIIR 

440 HTSPRIQPDYPAERSSLIPISGHRASPNP-VAMETRSDN RPSVPVQFQYFLPTYP- 

T ++P YP+ ++ +P V T D S+ +++ + P 

4322 TTG--LRP-YPSSVLICCVLNDTYYAPGEEVYNGTYGDTCYFVNCSLSCTLEFYNWSCPS 

4 94 -PSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVA5TVHLNPMQLMTVDA 
PSP+ + TPSS+ P P TL +T 

4379 TPSPTPTPSKS-TPTPSKPSSTPSKPTPGTKPPECPDFDPPRQENETWWLCDCFMATCKY 

553 SHARHIQGIQ PAPISTQGIQPAPIGTP 579 

++ I + + P P + G+QP + P 
4438 NNTVEIVKVECEPPPMPTCSNGLQPVRVEDP 4468 



154 

4036 

212 

4095 

268 

4155 

324 

4215 

381 

4269 

439 

4321 

493 

4378 

552 

4437 



Score = 117 (17.6 bits), Expect = 1.8e-02, P = 1.8e-02 
Identities = 41/156 (26%), Positives - 55/156 (35%) 

Query: 710 TIAVPPTAQQPPPTIPTMI AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGG 

T + P T PPPT T + + PS P +T P +PPITT P P+ T 

Sbjct: 13 98 TPSPPTTTPSPPPTTTTTLPPTTTPS PPTTTTTTPPPTTTPSPPITT-TTTPLPTTTPSP 

Query: 7 70 SLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPG 

+ S+ PP P P + P T T SP T+ PP 

Sbjct: 1457 PISTTTTPP PTTTPSPPTTTPSPPTTTPSPPTTTTTTP-PPTTTPSPPM 

Query: 830 ASPRKKPRKQQHVISTEEGDMMETNSTDDEKSTAKS 865 

+ P P + T T +T +T S 

Sbjct: 1505 TTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPS 1540 

Score = 61 (9.2 bits), Expect = 1.6e-09, P = 1.6e-09 
Identities = 23/93 (24%), Positives = 41/93 (44%) 

Query: 3 97 SVSHTQAPTSTIVTMTVPSHSSHATAVTTSNIPVAKVV PQQITHTSPRIQPDYPAE 

S + + + +T T+T + P+ + T TT+- P +■ V P+ SI D+P+ 

Sbjct: 1257 SITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKLCCLWSDWINEDHPSS 

Query: 453 RSS LIP I SGHRAS PN PVAMETRS DNRPS VPVQ 484 

S P G + P + E RS P + ++ 

Sbjct: 1317 GSDDGDREPFDGVCGAPEDI — ECRSVKDPHLSLE 1349 

Score =50 (7.5 bits), Expect = 8.0e-09, P = 8.0e-09 
Identities = 16/41 (39%), Positives = 19/41 (46%) 

Query: 3 34 RPAQSRD VTTRI TL PSH PALGT PKQQLHTMAQKT I FSTGTP 37 4 

RP+ TT ITLP+ P T T T+ ST TP 

Sbjct: 1261 RPSTLTTFTT-ITLPTTPTSFTTTTTTTTPTSSTVLST-TP 1299 

Score = 46 (6.9 bits), Expect = 5.4e-08, P = 5.4e-08 
Identities = 24/106 (22%), Positives = 37/106 (34%) 



769 
1456 
829 
1504 



452 
1316 



Query: 



324 HP PS AAIS I QRPAQSRDVTTRI TLPSHPALGTPKQQLHTMAQKTI FSTGTP VAAATVAP I 
+ PP A++ ++ST+PGQA G I 



383 
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Sbjct: 1196 


YPPGASVPTEETCKSCVCTNSSQVVCRPEEGKILNQTQDGAFCYWEICGPNGTVEKHFNI 1255 




Query: 384 


LATNTIPSA-TTAGSVSHTQAPTSTIVTMTVPSHSSHATAVTTSNI 4 28 






+ T PS TT +++ PTS T T + +S TT + 




Sbjct: 1256 


CSITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKL 1301 




Score = 44 


(6.6 bits), Expect = 8.7e-08, P = 8.7e-08 




Identities : 


= 14/34 (41%), Positives = 17/34 (50%) 




Query: 478 


RPSVPVQFQYF-LPTYPPSAYPLAAHTYTPITSSV 511 






RPS F LPT P S + T TP +S+V 




Sbjct: 1261 


RPSTLTTFTTITLPTTPTS-FTTTTTTTTPTSSTV 1294 




Pedant information for DKFZphtes3_2all, frame 2 






Report for DKFZphtes3_2all . 2 




[LENGTH] 


1048 




[MW] 


110324.04 




[pi] 


9.83 




[ HOMOL ] 


PIR: 147141 gastric mucin (clone PGM-2A) - pig (fragment) 8e-15 




[FUNCAT] 


30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-09 


[FUNCAT] 


30.01 organization of cell wall [S. cerevisiae, YlR019c] le-09 




[ FUNCAT] 


01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c] le-09 




[FUNCAT] 


30.02 organization of plasma membrane [S. cerevisiae, YDR420w] 4e-09 


[FUNCAT] 


01.05.04 regulation of carbohydrate utilization [S. cerevisiae, 


YDR420w] 


4e-09 






[FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YJR151c] 4e-06 




[ FUNCAT] 


03.04 budding, cell polarity and filament formation [S. cerevisiae, 


YGR014w] 


le-05 






[FUNCAT] 


11.01 stress response [S. cerevisiae, YHL028w] le-04 




[FUNCAT] 


09.01 biogenesis of cell wall [S. cerevisiae, YHL028w] le-04 




[EC] 


3.2.1.3 Glucan 1, 4-alpha-glucosidase 3e-08 




[PIRKW] 


glycosidase 3e-08 




[PIRKW] 


transmembrane protein 3e-08 




[PIRKW] 


polysaccharide degradation 3e-08 




[PIRKW] 


glycoprotein 9e-08 




[PIRKW] 


calcium binding 9e-08 




[PIRKW] 


hydrolase 3e-08 




[PIRKW] 


cytoskeleton 7e-08 




[SUPFAM] 


equine herpesvirus glycoprotein X 2e-07 




[SUPFAM] 


yeast glucan 1, 4-alpha-glucosidase homolog 3e-08 




[SUPFAM] 


polymorphic epithelial mucin 7e-08 




[SUPFAM] 


glucan 1 , 4-alpha-glucosidase homology 3e-08 




[SUPFAM] 


equine herpesvirus 1 glycoprotein homology 2e-07 




[PROSITE] 


MYRISTYL 9 




[PROSITE] 


AMIDATION 1 




[PROSITE] 


CAMP PHOSPHO SITE 2 




[PROSITE] 


CK2 PHOSPHO SITE 10 




[PROSITE] 


PKC PHOSPHO SITE 12 




[PROSITE] 


ASN_GLYC03YLATION 3 




[KW] 


Irregular 




[KW] 


LOW_COMPLEXITY 20.04 % 





SEQ MGPPRHPQAGEIEAGGAGGGRRLQVEMSSQQFPRLGAPSTGLSQAPSQIANSGSAGLINP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeeccccccccccccccccccccccccccccccccc 

SEQ AATVNDESGRDSEVSAREHMSSSSSLQSREEKQEPVVVRPYPQVQMLSTHHAVASATPVA 

SEG xxxxx xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ VTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESSI 

SEG xxxxxxxxxxxxx xxxxxxxxxx . . xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccceeeccccc 

SEQ PQASAI PVAT I SGQQGHPSNLHHIMTTNVQMS IIRSNAPGPPLHIGASHLPRGAAAAAVM 

SEG xxxxx . . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQSRPPVTTSNAIPPAVVATVSATRAQS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQL 

SEG 

PRD cccccccccccccccccccccccczcccccccccccccccccccccccccccccccccccc 
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SEQ HTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTIVTMTVPSHSSHA 

SEG xxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPS 

SEG xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeecccccccc 

SEQ VPVQFQYFLPT YPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQ 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GLQPAPMGTQQPQPEGKTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AQGSSPRPSILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccceeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ EIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQ 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccrccccccc 

SEQ HVISTEEGDMMETNSTDDEKSTAKSLLVKAEKRKSPPKEYTDEEGVRYVPVRPRPPITLL 

SEG xxxxxxxxxxx . . . . 

PRD ccccccccccccccccccccchhhhhhhhhccccccccccccccccccccccccccccee 

SEQ RHYRNPWKAAYHHFQRYSDVRVKEEKKAMLQEIANQKGVSCRAQGWKVHLCAAQLLQLTN 

SEG 

PRD eeccccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeecccceeehhhhhhhhhhc 

SEQ LEHDVYERLTNLQEGI IPKKKAATDDDLHRINELIQGNMQRCKLVMDQISEARDSMLKVL 

SEG 

PRD cchhhhhhhhhhhceeeeccocccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DHKDRVLKLLNKNGTVKKVSKLKRKEKV 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhccccceeeeeeeeccccc 



Prosite for DKFZphtes3_2all .2 



PS00001 


818->822 


ASN 


GLYCOSYLATE ON 


PDOC00001 


PS00001 


854->858 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00001 


1033->1037 


ASN 


"GLYCOSYLATION 


PDOC00001 


FS00004 


872->876 


CAMP PHOSPHO SITE 


PDOCQ0004 


PS00004 


1037->1041 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


68->71 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


75->78 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


242->245 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


342->345 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


355->358 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


442->445 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


513->516 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


665->668 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


831->834 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


862->865 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


940->943 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


1035->1038 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00006 


63->67 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


68->72 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


75->79 


CK2 


"PHOSPHO_SITE 


PDOC00006 


PS00006 


88->92 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


135->139 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


473->477 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


844->848 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


855->859 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


959->963 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


984->988 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00008 


15->21 


MYRI 


STYL 


PDOC00008 
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PS00008 


16->22 


MYRISTYL 


PDOC00008 


PS00008 


36->42 


MYRISTYL 


PDOC00008 


PS00008 


233->239 


MYRISTYL 


PDOC00008 


PS00008 


372->378 


MYRISTYL 


PDOC00008 


PS00008 


533->539 


MYRISTYL 


PDOC00008 


PS00008 


535->541 


MYRISTYL 


PDOC00008 


PS00008 


590->596 


MYRISTYL 


PDOC00008 


PS00008 


768->774 


MYRISTYL 


PDOC00008 


PS00009 


19->23 


AMIDATION 


PDOC00009 



(No Pfam data available for DKFZphtes3_2all . 2) 
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DKFZphtes3_2al7 



group: metabolism 



DKFZphtes3_2al7 encodes a novel 574 amino acid protein without similarity to known proteins. 

The novel protein contains a thiol protease cys pattern. Eukaryotic thiol proteases (EC 
3.4.22.-) are a family of proteolytic enzymes containing an active site cysteine. Cathepsins 
belong to this protease family. 

The new protein can find application in modulation of proteolytic processes and as a new 
enzyme for proteomic analysis and biotechnologic production processes. 



unknown 



complete cDNA, complete cds, EST hits 
Sequenced by EMBL 
Locus: unknown 



Insert length: 2312 bp 

Poly A stretch at pos . 2300, polyadenylation signal at pos . 2273 



1 GTTTTCACCT GATCATTAGA AACTAATGAA ACACC7TTTA AGTCTTATGA 
51 ATTCAGGTTA CACTGTTTTC CAGATGCCTT GGCAGCTGGT ACAGGGCCTC 
101 TGAAAAATGG AACCAAATTC TCTGAGGACT AAAGTCCCAG CTTTCTTATC 
151 TGATTTGGGG AAGGCCACAT TGAGGGGAAT CAGAAAGTGT CCCCGATGTG 
201 GCACATACAA TGGAACCCGG GGACTGAGCT GTAAGAACAA GACATGTGGA 
251 ACCATATTCC GCTACGGTGC ACGCAAGCAG CCTAGTGTTG AAGCTGTCAA 
301 AATCATTACA GGCTCTGATC TTCAGGTCTA CTCAGTGCGG CAAAGAGACC 
351 GGGGCCCTGA TTACCGATGC TTTGTGGAGC TCGGGGTTTC AGAGACAACA 
401 ATCCAGACAG TGGATGGGAC GATCATCACT CAGCTGAGCT CTGGACGGTG 
451 TTATGTCCCC TCATGCCTGA AAGCTGCCAC TCAAGGCGTT GTGGAAAACC 
501 AGTGCCAGCA CATCAAGCTG GCGGTGAACT GCCAGGCAGA GGCCACCCCT 
551 CTGACCCTGA AGAGCTCGGT CCTGAATGCA ATGCAGGCCT CCCCGGAAAC 
601 CAAACAGACC ATCTGGCAGT TGGCCACGGA ACCCACAGGT CCTCTGGTGC 
651 AGAGAATTAC TAAAAACATC TTGGTGGTGA AATGCAAGGC AAGCCAGAAG 
701 CACAGTTTGG GGTATTTGCA TACATCTTTT GTGCAGAAAG TCAGTGGCAA 
751 AAGCTTGCCT GAGCGCCGCT TCTTCTGCTC CTGTCAGACT CTGAAATCGC 
801 ACAAGTCAAA TGCCTCCAAG GAT GAG AC AG CCCAGAGATG CATTCATTTC 
851 TTTGCTTGCA TCTGTGCCTT TGCCAGTGAT GAGACACTGG CTCAGGAATT 
901 CTCAGACTTC CTAAATTTTG ATTCCAGCGG TCTTAAAGAG ATTATTGTAC 
951 CCCAGTTAGG TTGCCATTCA GAATCAACAG TATCIGCTTG TGAGTCTACT 
1001 GCCTCTAAGT CAAAGAAGAG GAGAAAGGAT GAAGTATCTG GTGCACAGAT 
1051 GAACAGTTCA CTACTGCCTC AAGATGCAGT GAGCAGTAAT CTAAGGAAAA 
1101 GTGGCCTGAA AAAGCCTGTG GTTGCTTCCT CGTTAAAAAG GCAGGCCTGT 
1151 GGTCAGCTGT TAGATGAGGC ACAAGTGACT TTATCCTTCC AAGACTGGCT 
1201 GGCCAGTGTC ACAGAACGCA TCCATCAAAC CATGCACTAT CAGTTTGATG 
1251 GCAAACCAGA ACCATTGGTG TTCCACATTC CTCAGTCATT TTTTGATGCC 
1301 CTGCAACAAA GAATATCTAT AGGAAGTGCA AAAAAACGGC TCCCCAACTC 
1351 CACCACAGCT TTTGTTCGGA AAGATGCCTT GCCACTGGGA ACCTTTTCCA 
14 01 AGTATACTTG GCATATCACT AATATCCTGC AAGTTAAACA AATCTTAGAT 

14 51 ACCCCAGAGA TGCCCTTGGA AATCACCCGT AGCTTTATCC AGAACCGAGA 
1501 TGGGACTTAT GAGCTATTTA AATGCCCTAA AGTGGAAGTA GAAAGCATAG 

15 51 CAGAAACCTA CGGTCGTATA GAAAAACAAC CAGTGCTGCG ACCCTTGGAA 
1601 CTAAAAACTT TTCTCAAAGT TGGCAACACT TCCCCAGATC AAAAGGAGCC 
1651 AACACCTTTC ATCATCGAGT GGATCCCAGA TATCCTTCCC CAATCTAAGA 

17 01 TTGGCGAGCT GCGGATCAAG TTTGAGTATG GCCACCACCG GAATGGGCAT 
1751 GTGGCGGAGT ACCAAGACCA GCGGCCCCCC TTGGACCAGC CCTTGGAACT 

18 01 GGCCCCTCTG ACCACTATTA CTTTCCCTTA AAGCAAAACA AGATAATAAT 
1851 CTTTTGCTGC TTAATTTGCA CATCCCCACC CCTTGACAAC TTTAAATGCT 
1901 AGTTAGGCAC TTAGATGGCC CTGTTCCTTG GTAAACTGCT CTTAGCTAAG 
1951 ATGCAAATTC TCAGTGCTTT CAAGTGGATT CTGTTGAAGA AAATCTCTTG 
2001 TAAATAGCCT TTTTGATGCT GCTGTGTACA GTCTTCATTA TGCATTGGGC 
2051 AGTATTTCTG GCTAGAGTTT TAAAAGGAAC AGAAAGAAAA CCAGCTTATT 
2101 TTCCTTCTTA CGGACTCATC TTTAGCGTTT ATTTCAACCT TTTGCTAATT 
2151 CTCTGAGAAA TCTGCAGCAC TCAGCCATAC ACCAACAGTG TTGGAAAGTT 
2201 AACACCCTGG TTAGGGCAGA ATGTTAAAGA CCATCTTGGC AGAGTTCCAG 
2251 CCACGCTCTT TATTCTGTTC TCAAATAAAG CAGTGTCACT AGTTTTTCCT 
2301 AAAAAAAAAA AA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 1828 bp; peptide length: 574 
Category: putative protein 



1 MEPNSLRTKV PAFLSDLGKA TLRGIRKCPR CGTYNGTRGL SCKNKTCGTI 

51 FRYGARKQPS VEAVKIITGS DLQVYSVRQR DRGPDYRCFV ELGVSETTIQ 

101 TVDGTIITQL SSGRCYVPSC LKAATQGVVE NQCQHIKLAV NCQAEATPLT 

151 LKSSVLNAMQ ASPETKQTIW QLATEPTGPL VQRITKNILV VKCKASQKHS 

201 LGYLHTSFVQ KVSGKSLPER RFFCSCQTLK SHKSNASKDE TAQRCIHFFA 

251 CICAFASDET LAQEFSDFLN FDSSGLKEII VPQLGCHSES TVSACESTAS 

301 KSKKRRKDEV SGAQMNSSLL PQDAVSSNLR KSGLKKPVVA SSLKRQACGQ 

351 LLDEAQVTLS FQDWLASVTE RIHQTMHYQF DGKPEPLVFH IPOSFFDALO 

401 QRISIGSAKK RLPNSTTAFV RKDALPLGTF SKYTWHITNI LQVKQILDTP 

451 EMPLEITRSF IQNRDGTYEL FKCPKVEVES IAETYGRIEK QPVLRPLELK 

501 TFLKVGNTSP DQKEPTPFII EWIPDILPQS KIGELRIKFE YGHHRNGHVA 
551 EYQDQRPPLD QPLELAPLTT ITFP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2al7, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2al7, frame 2 



Report for DKFZphtes3_2al7 .2 



[LENGTH) 

[MW] 

[pi] 

[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE J 
[PROSITE] 
[KW] 



574 

64076.89 
9.15 

MYRISTYL 5 
CK2_PHOSPHO_SITE 
PKC_PHOS PHO_S I T E 
AS N_GLYCOS YLAT I ON 
TH I OL_P ROTEAS E_C YS 
Alpha_Beta 



14 

5 

1 



SEQ MEPNSLRTKVPAFLSDLGKATLRG I RKCPRCGTYNGTRGLSCK.NKTCGT I FRYGARKQPS 

PRD ccccccccccchhhhhcccchhhhhcccccccccccccccccccccccceeeeccccccc 

SEQ VEAVKIITGSDLQVYSVRQRDRGPDYRCFVELGVSETTIQTVDGTIITQLSSGRCYVPSC 

PRD ceeeeeeecccceeeeeccccccccceeeeeecccccceeeccceeeeeecccccccchh 

SEQ LKAATQGVVENQCQHIKLAVNCQAEATPLTLKSSVLNAMQASPETKQTIWQLATEPTGPL 

PRD hhhhhhhhcchhhhheeehhhhhhhcccccchhhhhhhhhcccchhhhhhhhhcccccch 

SEQ VQRirKNILVVKCKASQKHSLGYLHTSFVQKVSGKSLPERRFFCSCQTLKSHKSNASKDE 

PRD hhhhhhheeeeeecccccccccccceeeeeeecccccccceeeecccccccccccccccc 

SEQ TAQRCIHFFACICAFASDETLAQEFSDFLNFDSSGLKEIIVPQLGCHSESTVSACESTAS 

PRD hhhhhhhhhhhhhhhhhchhhhhhhhhhhccccccceeeeeecccccccceeeccccccc 

SEQ KSKKRRKDEVSGAQMNSSLLPQDAVSSNLRKSGLKKPWASSLKRQACGQLLDEAQVTLS 

PRD ccchhhhhccccccccccccccccchhhhhhhccccceeehhhhhhhhhchhhhhhhhhh 

SEQ FQDWLASVTERIHQTMHYQFDGKPEPLVFHIPQSFFDALQQRISIGSAKKRLPNSTTAFV 

PRD hhhhhhhhhhhhhhhhhhhcccccccceeehhhhhhhhhhhhhhhhcccccccccceeee 

SEQ RKDALPLGTFSKYTWHITNILQVKQILDTPEMPLEITRSFIQNRDGTYELFKCPKVEVES 

PRD ecccccccccceeeeehhhhhhhhhhhccccccccceeeeeeccccceeeecccceeeeh 

SEQ IAETYGRIEKQPVLRPLELKTFLKVGNTSPDQKEPTPFIIEWIPDILPQSKIGELRIKFE 

PRD hhhhhhhhhccccccccccceeeeecccccccccccceeeeecccccccccccceeeeee 
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SEQ YGHHRNGHVAEYQDQRPPLDQPLELAPLTTITFP 
PRD ecccccceeeeccccccccccccccccceeeccc 



Prosite for DKFZphtes3_2al7.2 



PS00001 35->39 ASN_GLYCOSYLATION PDOCOOOOl 

PS00001 44->48 ASN_GLYCOSYLATION PDOC00001 

PS00001 235->239 ASN_GLYCOSYLATION PDOC00001 

PS00001 316->320 ASN_GLYCOSYLATION PDOCOOOOl 

PS00001 414->418 ASN_GLYCOSYLATION PDOCOOOOl 

PS00005 5->8 PKC_PHOSPHO_SITE PDOC00005 

PS00005 21->24 PKC_PHOSPHO_SITE PDOC00005 

PS00005 41->44 PKC_PHOSPHO_SITE PDOC00005 

PS00005 76->7 9 PKC_PHOSPHO_SITE PDOC00005 

PS00005 112->115 PKC_PHOSPHO_SITE PDOC00005 

PS00005 150->153 PKC_PHOSPHO_SITE PDOC00005 

PS00005 196->199 PKC_PHOSPHO_SITE PDOC00005 

PS00005 213->216 PKC_PHOSPHO_SITE PDOC00005 

PS00005 228->231 PKC_PHOSPHO_SITE PDOC00005 

PSO0005 231->234 PKC_PHOSPHO_SITE PDOC00005 

PS00005 302->305 PKC_PHOSPHO_SITE PDOC00005 

PS00005 342->345 PKC_PHOSPHO_SITE PDOC00005 

PS00005 369->372 PKC_PHOSPHO_SITE PDOC00005 

PS00005 407->410 PKC_PHOSPHO_SITE PDOC00005 

PS00006 68->72 CK2_PHOSPHO_SITE PDOC00006 

PS00006 216->220 CK2_PHOSPHO_SITE PDOC00006 

PSOQO06 237->241 CK2_PHOSPHO_SITE PDOC00006 

PS00006 293->297 CK2_PHOSPHO_SITE PDOCOOOOS 

PS00006 360->364 CK2_PHOSPHO_SITE PDOC00006 

PS00006 367->371 CK2_PHOSPHO_SITE PDOCC0006 

PS00006 394->398 CK2_PHOSPHO_SITE PDOCOOOOS 

PS00006 480->484 CK2_PHOSPHO_SITE PDOCC0UU6 

PS00006 508->512 CK2_PHOSPHO_SITE PDOCC0005 

PS00008 32->38 MYRISTYL " PDOC00008 

PS00008 93->99 MYRISTYL PDOCC0008 

PS00008 104->110 MYRISTYL PDOCC0008 

PS00008 127->133 MYRISTYL PDOC00008 

PS00008 312->318 MYRISTYL PDOC00008 

PS00139 109->121 THIOL PROTEASE CYS PDOC00126 



(No Pfam data available for DKFZphtes3_2al7 .2) 
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DKFZphtes3_2dl5 



group: testes derived 

DKFZphtes3_2dl5 encodes a novel 274 amino acid protein with similarity to 
C.elegans cosmid F25H2.1. 

The novel protein contains a Pfam predicted C2-domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C.elegans F25H2.1 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 
Locus: unknown 



Insert length: 3615 bp 

Poly A stretch at pos. 3603, polyadenylation signal at pos. 3578 



1 GCGGCGGCCT CGAGGTGACA 
51 GCAGGAGGTC GCCCGGCGCG 
101 GCCGCAGCAC CATGGCGACC 
151 ATCGGTGAGC TCCCGCAGGA 
201 GCGGCAGGTC CAGCTGGACG 
251 GCGCAGTGGG CACCGTGGGC 
301 TTGGCCAAGA ATTACGGCAT 
351 CCTGGGCTAC GCGGTGTACG 
401 ATCCCCGCTG GAATAAGGTC 
451 TCTTTCTATC TCGAGATCTT 
501 CATTGCCTGG ACCCACATCA 
551 TGGAGGACAA GTGGTACAGC 
601 GGCATGATCA ACCTCGTCAT 
651 GATGCCACCC CAGCCCGTGG 
701 TTGGCTATGT GCCCATCACA 
751 GTGCCCGTGG CCCTGCCCCC 
801 CGAGGAGGAC CTGAAAGCCA 
851 AGGTGATCCG CTCCGTGCTG 
901 ATCAACTCCC TGCTGCAGAT 
951 TGCCGTTTTG CCCCCGCTCT 
1001 AATGCTGTCC CAACAAGATT 
1051 CCGTGGACTT CTGTGCCGCC 
1101 GTTTTCGGTT CCTGGCGGTC 
1151 CGTGCTGGGA GGTCTCAGCG 
1201 CCTTCTCATG CCGTTCTGGA 
1251 CTGCCAGGGT GTTGGAGGTG 
1301 TTTTGTGATG TGATGTAATT 
1351 AATCCCTCAC ACTGTGGGTT 
1401 GCCCTTGCCC TAACGCGCTT 
1451 GGGTCTGGTG AGCTGAGCAG 
1501 GGCCTGGCTC ACCTGGCCAC 
1551 CTGAAGGCAG AATGAACCCA 
1601 GTTGGCCAGG CTCTGCCTGA 
1651 CCTTCGCCCG CCGGAGGCTG 

17 01 GCTCCGTGGG TGTCCTCCCA 
1751 GGCTGGGGTG AGAGGTGATA 

18 01 TGTGGTGGCA CTGCCAGCCG 
18 51 AGGACGTGGG TTCAGCGTGG 
1901 AAAAGCTTTC TGAGGCGGGA 
1951 CTGCGTGTAG CATCTTGGCC 
2001 ATGACAGTCA GAAATTTGAG 
2051 ATCTGCATGC CATTGAGACA 
2101 TCTTGCCGCC GGCCTTCGGA 
2151 AATGCCCACA AGTGGGTCTT 
2201 GCTTACATTT TAGTCTTTTT 
2251 GCCAGCTAGA AAATACTGCT 
2301 AAATATACTG TTGATAAATA 
2 351 GCCGTGGGGG AGGGACATGC 
2 4 01 CCTAAAGGCC TTTGATCCTT 
2 4 51 AGACGCCGAC CACTCAGACG 
2501 GGCCTGGTCT TACGCCTGTG 
2551 TCAGGCGGGA CTGGAACGTT 
2601 TCTAACCCAG GACAGACCAC 



ACTGTCTCCG TCGCAGGCTC CGGCGGGGGC 
TCACTGTCGG GTCGGCGAGC CACGGGGGCC 
ACCGTCAGCA CTCAGCGCGG GCCGGTGTAC 
CTTCCTCCGC ATCACGCCCA CACAGCAGCA 
CCCAGGCGGC CCAGCAGCTG CAGTACGGAG 
CGACTGAACA TCACGGTGGT ACAGGCAAAG 
GACCCGCATG GACCCCTACT GCCGACTGCG 
AGACGCCCAC GGCACACAAT GGCGCCAAGA 
ATCCACTGCA CGGTGCCCCC AGGCGTGGAC 
CGATGAGAGA GCCTTCTCCA TGGACGACCG 
CCATCCCGGA GTCCCTGAGG CAGGGCAAGG 
CTGAGCGGGA GGCAGGGGGA CGACAAGGAG 
GTCCTACGCG CTGCTTCCAG CTGCCATGGT 
TCCTGATGCC AACAGTGTAC CAGCAGGGCG 
GGGATGCCCG CTGTCTGTAG CCCCGGCATG 
GGCCGCCGTG AACGCCCAGC CCCGCTGTAG 
TCCAGGACAT GTTCCCCAAC ATGGACCAGG 
GAAGCCCAGC GAGGGAACAA GGATGCCGCC 
GGGGGAGGAG CCATAGAGCC TCTGCCTCGA 
TTGGACACGC CGACCCGGCG CTCCCCAAGG 
CCCGTGAAAG AGCACCCGTG TCGCCCCCTC 
CCGTCCACAC CTGTTCTTGG GTGCATGTGG 
CAGGACGGGG CGGGGGCTCC CCTCCCATCT 
CGCTCTCCTG TCCCTGGGAC GTGCGTCTCT 
AAATGCTCTT GCTGTAGAGA GCAGCTGCTT 
GTGGAGCGCC TTCCGATTCC ATTCATGGCA 
GGAATAGAGC TGTTGATTTA AGGCACACAC 
TTTTTTAGAA CTTCCCAGAC GAAAACTCAC 
TGCTGTGAGC CTGGCCCCTG CCCAGGGCTT 
CTTCCTGTGG ATGGTGTGGG GCCGGCCTCT 
TGTCCAGCCA GCCTTGTGAC AGACTCCGGC 
CACCTGGAGT GAGGAAGGGG GCCTGGCACG 
TTGCCAGCCA GCGGGCATCT GAAGCCGGGT 
CCGTCCGTCT CTCCTGCTGC GCTCGTGCCA 
GGGAGCTTCT CTTCTCAACA GGCCTTGCGA 
GAGGCAGCAC TGTGCATGAT TCCGAGAGGG 
ACTGCTGACA GCTTGGGAGC TGCTGTGCCC 
GCGAGGAAAG CCTGGCGAGC GTGGCCCTGT 
GGCGCTCACT TACCTCTGAC TGCCTGGGCG 
TACAGGACAG ATTTTAGGTG ACACCTGGTT 
AAGCTTCTCA CAAGTGATGC ACTTTAAATA 
CCTGCATGTC TGGTGTTTGT GGTTCAAGTG 
TGTAAACCCA CTGATAACGG ACAGAAAGAG 
CTGTGGAAGA TGCAGAAGGA GGAAGTTAGT 
CTCCCTCAAA AAAATAGGTT AAGTTTCAGT 
TTCTGCCATC GATTGGGGGT GGTTTTTGTC 
TTTATTTTTG TAAACTTGAA GTGTGTGGTG 
TGGCAGCAGG CGCCTTCTTC AGCTGTGGGT 
TGAAGAAGAA AGACATGGTA TTTGTTCAGC 
GAGGGGCCCC TGGGATTCCC TGTCTCAGAT 
TAGATTTCTT CTCCATTGGG AATGAAGGTG 
CTAGATGGTA TGTTCCGTGA TATTAACAAC 
AAGCCACACT CAGAGGCCTC ACTGTGCTGG 
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2651 GGGCTTCGGT GTCCAGGCGC 
2701 CTTCGCGTTG CTGGGGTGCA 
2751 TCTGTGGGTG TCTCCTAGAG 
28 01 CAGCCCGTGT GGGGGCCCGA 
2851 GGAGGGAGAG CAACCCTTTG 
2901 TTTTCTTTTT CACAAGCGCT 
2951 CAAGGCCTTT AATTAAATAA 
3001 TTCCTGTTTG AAGGCTTACT 
3051 CTGAGCCCCT CCGAGCGGTC 
3101 CTCCCCCGCC CCCGCCTGTG 
3151 AGGACAGGCT TGTCTGCCAG 
3201 AGCTGGGTTT AGGCCCCTGG 
3251 GCTGCTCCTG CTCCTGGGTT 
3301 GCAGCGGTCA CTAAGGACAG 
3351 GGGCTCCGGA GATAGAAGAC 
3401 TCCCCTCTGC AGATGCTCCC 
34 51 AGTGGTCTCA GAACGTGCGC 
3501 AGATTTTTCT TTGATTGTAA 
3551 TAATAAATGA TCCATATAAA 
3601 TTTAAAAAAA AAAAA 



CCAGGTGTGG CCACCAGCAC CGGTTTCTGC 
GTGAGACTGC CACACGCGTG CACATGTGGC 
AGGACGTGGC CCCTGCTGCC AGCCCTTGAG 
GGGACCCACA CAGTGGGGGC CAGCCTCGCT 
CCGATGACCA CGCTTGCCGC CATCTCTTAG 
TTATTTTTTT AATAGACAAA TCACATTTTG 
GATTCTTCTT TCCTTCATTT TATGCTTTAT 
GTAGAAGTGG CTTACTGTAG AAGCAGCTTG 
CCCAGAATTA GCTGGTTCAC AACCCCCACC 
TCAGGTGTGG ATGAGGTCGT CACACTCAGA 
CTCACAAGGG GAGGCTGCAG TGGGTTTGGG 
TGTCTGAGGG CCCAGGCCTT GCCAGCCTCT 
TGAAGATGCA GGCCGATCGC CAGCTCCGTG 
CCTGACTGTG CCATCTTGGA GCCTCAGGCG 
AGGTCGCCGG AGGCTCCCCC TCCTCTCCTC 
TGGGCGCTAC CCTGCAGGGT GCCAGGCAGG 
TTCTGATTAT TTTACTGGGG TCCATTGTCC 
AATATATTTT TACTTTTTAG TCTTCTAATT 
AATAGAGAAA TAAAGTCCTT TAAGGGAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 112 bp to 933 bp; peptide length: 274 
Category: similarity to unknown protein 
Classification: no clue 



1 MATTVSTQRG PVYIGELPQD FLRITPTQQQ RQVQLDAQAA QQLQYGGAVG 
51 TVGRLNITVV QAKLAKNYGM TRMDPYCRLR LGYAVYETPT AHNGAKNPRW 
101 NKVIHCTVPP GVDSFYLEIF DERAFSMDDR IAWTHITIPE SLRQGKVEDK 
151 WYSLSGRQGD DKEGMINLVM SYALLPAAMV MPPQPVVLMP TVYQQGVGYV 
201 PITGMPAVCS PGMVPVALPP AAVNAQPRCS EEDLKAIQDM FPNMDQEVIR 
251 SVLEAQRGNK DAAINSLLQM GEEP 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_2dl5, frame 1 



TREMBL : CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2, 
N = 1, Score = 385, P = l.le-35 



>TREMBL:CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 
Length = 457 



HSPs : 



Score = 385 (57. B bits). Expect = l.le-35, P = l.le-35 
Identities = 77/182 (42%), Positives = 118/182 (64%) 



Query: 4 TVSTQRGPVYIGELPQDFLRIT-PTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVVQA 62 

TV+ +R V +GELP FLR+ PQQ+++Q+++ T GRL+ +T+++A 

Sbjct: 5 TVAERRRQVLVGELPPHFLRLAVPIQQTAEPEI-VQP-RMVSFVPP-NTRGRLSVTILEA 61 

Query: 63 KLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIFDE 122 

L KNYG+ RMDPYCR+R+G ++T AN + P WN+ ++ +P V+S Y++IFDE 
Sbjct: 62 NLVKNYGLVRMDPYCRVRVGNVEFDTNVAANAGRAPTWNRTLNAYLPMNVESIYIQIFDE 121 

Query: 123 RAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYAL — LPAAMV 180 

+AF D+ IAW HI +P ++ G D+++ LSG+QG+ KEGMI+L S+A LP 
Sbjct: 122 KAFGPDEVIAWAHIMLPLAIFNGDNIDEYFQLSGQQGEGKEGMIHLHFSFAPIDLPLQQA 181 
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Query: 181 MPPQP 185 
P +P 

Sbjct: 182 APAEP 186 

Score = 92 (13.8 bits), Expect = 1.8e-01, P = 1.7e-01 
Identities = 26/68 (38%), Positives = 38/68 (55%) 

Query: 194 QQGVGYVPITGMPAVCSPGMVPV— ALP--PAAVNAQPRCSEEDLKAIQDMFPNMDQEVI 24 9 

QQG G + + +P +P+ A P PA +EED K IQ+MFP +D+EVI 

Sbjct: 156 QQGEGKEGMIHLHFSFAPIDLPLQQAAPAEPAPAPLPVEITEEDTKEIQEMFPIVDKEVI 215 

Query: 250 RSVLEAQR 257 

+ +LE +R 
Sbjct: 216 KCILEERR 223 



Pedant information for DKFZphtes3_2dl5, frame 1 



Report for DKFZphtes3_2dl5 . 1 



[LENGTH] 274 

[MW] 30281.97 

[pi] 5.68 

[ HOMOL] TREMBL : CEF2 5H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 4e-36 

[PFAM] C2 domain 

[KW] AlphaBeta 

[KW] LOW_COMPLEXITY 16.42 % 



SEQ MATTVSTQRGPVYIGELPQDFLRITPTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVV 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeccccceeeecccchhhhhhhhhhhhhhhhhcccccceeeeceeeh 

SEQ QAKLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIF 

SEG 

PRD hhhhhhhhcccccccchhhhheeeeeecccccccccccccceeeeeccccccceeeeeec 

SEQ DERAFSMDDRIAWTHITIPESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYALLPAAMV 

SEG xxxxxxxx 

PRD cccccccccceeeeccccccccccccccceeeeeccccccccccceeeeehhhhhhhhhc 

SEQ MPPQPVVLMPTVYQQGVGYVPITGMPAVCSPGMVPVALPPAAVNAQPRCSEEDLKAIQDM 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD ccccceeeeeeeeecccccccccccceeecccccccccccceeeeccccchhhhhhhhhc 



SEQ FPNMDQEVIRSVLEAQRGNKDAAINSLLQMGEEP 

SEG 

PRD ccccchhhhhhhhhhhccccchhhhhhhhhhccc 



(No Prosite data available for DKFZphtes3_2dl5 . 1 ) 



Pfam for DKFZphtes3_2dl5 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 



C2 domain 



55 



*LtVrIIeARNLWkMDMnGfSDPYVKVdMdPdpkDtkKWKTkTiWNNGLN 
L++++++A+ + + M+ DPY+++ + + + +T T +N N 

LNITVVQAKLAKNYGMT-RMDPYCRLRLGYAVY ETPTAHNGAKN 



PVWNEEeFvFedlPyPdlqrkMLRFaVWDWDRFSRBDFIGHCi* 
P+WN + +P + + ++++D+ FS +D 1+ + 
98 PRWN-KVIHCT-VPPGVDSF YLEIFDERAFSMDDRIAWTH 



135 
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DKFZphtes3_2el2 



group: Transcription Factors 

DKFZphtes3_2el2 encodes a novel 849 amino acid protein with similarity to Zinc finger 
proteins . 

The new protein is a putative transcription factor with three C2H2 zinc fingers. Additionally, 
a cytochrome C family heme-binding site signature is present in the protein, which is only 
found in cytochrom C related proteins. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



similarity to finger proteins 

complete cDNA, complete cds, 5 EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3205 bp 

Poly A stretch at pos . 3192, polyadenylation signal at pos . 3171 



1 GGCACGGCCG GGTCCTGGCT GGCCAAACGA GGCTCGCGGA AGCAGCAGCC 

51 GCCGCCTGAC CGCAGCTGGA TTTTGAAGAT TGATCCAAGG GACTGTATTA 

101 ATTTCAGGAA TTGATTTGAA AGACACTGGC TCTGCCACTT AACAGCCATG 

151 TAACCTTGGA TATGGAAGAA AGTAGCAGTG TTGCCATGTT GGTGCCAGAT 

201 ATTGGGGAAC AGGAAGCTAT ACTGACTGCT GAAAGTATCA TCAGTCCTTC 

251 ATTGGAAATT GATGAACAAA GAAAAACTAA ACCAGATCCA TTAATCCATG 

301 TTATCCAGAA GTTAAGCAAG ATAGAAAAAT GAAAAGTCAC AAAAATGTCT 

351 TTTAATTGGG AAGAAACGCC CACGTTCAAG TGCTGCAACA CACTCTCTTG 
4 01 AAACCCAAGA ACTTTGTGAG ATTCCGGCTA AAGTAATCCA GTCACCTGCT 
4 51 GCTGATACTA GAAGGGCTGA GATGTCACAA ACAAATTTTA CCCCTGACAC 

501 TCTTGCCCAG AATGAAGGGA AGGCTATGTC TTATCAGTGT AGCCTTTGTA 

551 AGTTTCTATC ATCATCCTTT TCCGTGTTAA AAGATCATAT TAAGCAACAT 

601 GGTCAGCAAA ATGAAGTGAT ACTGATGTGC TCAGAGTGCC ATATTACATC 

651 TAGAAGCCAG GAGGAACTTG AAGCCCACGT GGTGAATGAC CATGACAATG 

701 ATGCCAATAT CCACACCCAA TCCAAAGCCC AACAGTGCGT AAGCCCCTCC 

7 51 AGCTCTTTGT GTCGGAAAAC CACAGAAAGA AATGAAACCA TTCCAGATAT 

801 CCCAGTAAGT GTGGACAATC TACAGACTCA TACTGTCCAA ACTGCATCTG 

851 TGGCAGAAAT GGGTAGGAGG AAATGGTATG CATACGAACA GTACGGCATG 

901 TATCGATGCT TGTTTTGTAG TTATACTTGT GGCCAGCAGA GAATGTTGAA 

951 AACACACGCT TGGAAACATG CTGGGGAGGT TGATTGCTCC TATCCAATCT 

1001 TTGAAAATGA AAATGAACCC CTAGGCCTGC TGGATTCTTC AGCAGCTGCT 

1051 GCGCCTGGTG GGGTCGATGC AGTCGTCATT GCTATTGGAG ACAGTGAACT 

1101 GAGTATCCAC AATGGGCCAT CAGTGCAAGT GCAGATTTGC AGCTCAGAAC 

1151 AGTTATCATC TTCATCTCCT TTAGAACAGA GTGCAGAAAG AGGAGTACAC 

1201 CTAAGTCAGT CAGTTACCCT GGACCCCAAT GAGGAAGAAA TGCTAGAAGT 

1251 GATTTCTGAT GCAGAGGAGA ATCTGATTCC TGATAGCCTG CTTACATCAG 

1301 CACAGAAAAT CATCAGCAGC AGCCCCAATA AAAAAGGGCA TGTTAACGTG 

1351 ATAGTGGAGC GATTGCCAAG TGCTGAAGAA ACCCTTTCAC AGAAGCGCTT 

1401 CCTCATGAAC ACTGAAATGG AAGAAGGGAA GGACCTGAGC CTGACAGAAG 

1451 CTCAGATTGG GCGCGAAGGA AT GGATG ATG TTTATCGTGC TGATAAATGT 

1501 ACTGTTGATA TTGGGGGATT GATCATAGGC TGGAGCAGTT CAGAGAAAAA 

1551 AGACGAGTTA ATGAATAAAG GCCTGGCTAC TGATGAGAAT GCCCCACCAG 

1601 GCCGGAGAAG GACAAATTCT GAGTCTCTTC GATTACACTC ATTAGCTGCA 

1651 GAAGCCCTTG TCACAATGCC TATAAGAGCT GCAGAGTTGA CAAGAGCCAA 

1701 CCTGGGGCAC TATGGAGATA TAAACCTTTT AGATCCAGAT ACTAGTCAAA 

17 51 GGCAAGTAGA TAGTACATTG GCAGCGTACT CAAAAATGAT GTCGCCACTT 
1801 AAAAACTCTT CAGATGGATT AACTAGTCTT AACCAAAGCA ACTCCACCTT 

18 51 GGTAGCACTC CCAGAGGGTA GGCAGGAATT GTCAGATGGG CAGGTTAAGA 
1901 CAGGCATCAG CATGTCCTTA CTCACCGTCA TTGAAAAATT GAGAGAAAGG 
1951 ACAGACCAAA ACGCTTCAGA CGATGACATT TTGAAAGAGT TGCAGGACAA 
20 01 CGCCCAGTGC CAACCCAACA GCGATACAAG TTTGTCCGGA AACAATGTGG 
2051 TGGAATACAT CCCGAATGCT GAACGACCCT ACCGTTGCCG CCTGTGTCAC 
2101 TACACAAGTG GCAACAAGGG CTACATCAAG CAGCACTTAC GAGTCCATCG 
2151 ACAGAGACAG CCTTATCAGT GTCCTATCTG CGAGCACATA GCGGACAACA 
2201 GCAAAGATTT GGAGAGTCAC ATGATCCACC ACTGTAAGAC AAGAATATAC 
2251 CAGTGCAAGC AGTGTGAAGA ATCCTTCCAT TATAAGAGTC AATTGAGGAA 
2301 CCATGAGAGA GAACAGCACA GTCTTCCAGA TACCTTGTCA ATAGCAACTT 
2351 CTAATGAGCC AAGAATTTCC AGTGATACAG CTGATGGAAA ATGTGTCCAG 
24 01 GAAGGGAATA AGTCTTCAGT CCAGAAACAA TATAGATGTG ATGTGTGTGA 
2451 TTATACAAGT ACAACATATG TTGGTGTCAG AAACCACAGG CGAATCCATA 
2501 ACTCTGATAA GCCGTACAGA TGCTCTCTGT GTGGGTATGT GTGTAGCCAT 
2551 CCTCCTTCTT TGAAGTCTCA TATGTGGAAA CATGCAAGTG ACCAAAATTA 
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2 601 CAACTACGAA CAAGTAAACA AGGCTATTAA CGACGCGATT TCACAAAGTG 

2651 GCAGAGTTCT GGGGAAATCC CCTGGAAAGA CTCAATTAAA GAGCAGTGAA 

2701 GAGAGTGCAG ATCCCGTCAC TGGAAGTTCG GAAAATGCAG TGTCATCTTC 

2751 AGAACTGATG TCCCAGACTC CCAGTGAAGT TCTGGGTACC AACGAGAATG 

2801 AGAAACTGAG CCCTACAAGT AATACCTCAT ATAGTTTAGA AAAAATCTCC 

28 51 AGTCTGGCCC CTCCTAGCAT GGAGTACTGC GTTTTACTCT TCTGCTGTTG 

2901 TATTTGTGGT TTTGAATCAA CCAGCAAAGA AAACCTCTTG GATCATATGA 

2951 AAGAGCACGA GGGTGAAATT GTAAACATCA TCCTGAATAA GGACCACAAT 

3001 ACAGCTCTAA ACACAAATTA GGTGGAATAA TGACTCGAGC AGGAAAGCAG 

3051 TAGAAGAGGA TTCCTTCACC ACAGTTTCAC CTTTACGCTG TCAGACAACT 

3101 TCCTGCCACA GAAGAAGTCG TTGATGTGAT TTTTGAGGAA ATGACAGATG 

3151 TGACTTTGGA ACCAAACTTG TAATAAAAGG AATTCCAAAT GGAAAAAAAA 

3201 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



90301500: 

Cloning and sequencing of a zinc finger cDNA expressed in mouse testis. 
92310982: 

Zfp-37, a new murine zinc finger encoding gene, is expressed in a 
developmentally regulated 

pattern in the male germ line. 



Peptide information for frame 1 



ORF from 472 bp to 3018 bp; peptide length: 849 
Category: similarity to known protein 



1 MSQTNFTPDT 

51 LMCSECHITS 

101 TERNETIPDI 

151 YTCGQQRMLK 

201 VVIAIGESEL 

251 DPNEEEMLEV 

301 AEETLSQKRF 

351 IIGWSSSEKK 

401 IRAAELTRAN 

451 TSLNQSNSTL 

501 DDILKELQDN 

551 YIKQHLRVHR 

601 SFHYKSQLRN 

651 QKQYRCDVCD 

701 MWKHASDQNY 

751 GSSENAVSSS 

801 EYCVLLFCCC 



LAQNEGKAMS 
RSQEELEAHV 
PVSVDNLQTH 
THAWKHAGEV 
SIHNGPSVQV 
ISDAEENLIP 
LMNTEMEEGK 
DELMNKGLAT 
LGHYGDINLL 
VALPEGRQEL 
AQCQPNSDTS 
QRQPYQCPIC 
HEREQHSLPD 
YTSTTYVGVR 
NYEQVNKAIN 
ELMSQTPSEV 
ICGFESTSKE 



YQCSLCKFLS 
VNDHDNDANI 
TVQTASVAEM 
DCSYPIFENE 
QICSSEQLSS 
DSLLTSAQKI 
DLSLTEAQIG 
DENAPPGRRR 
DPDTSQRQVD 
SDGQVKTGIS 
LSGNNVVEYI 
EHIADNSKDL 
TLSIATSNEP 
NHRRIHNSDK 
DAISQSGRVL 
LGTNENEKLS 
NLLDHMKEHE 



SSFSVLKDHI 
HTQSKAQQCV 
GRRKWYAYEQ 
NEPLGLLDSS 
SSPLEQSAiR 
ISSSPNKKGH 
REGMDDVYRA 
TNSESLRLHS 
STLAAYSKMM 
MSLLTVIEKL 
PNAERPYRCR 
ESHMIHHCKT 
RISSDTADGK 
PYRCSLCGYV 
GKSPGKTQLK 
PTSNTSYSLE 
GEIVNIILNK 



KQHGQQNEVI 
SPSSSLCRKT 
YGMYRCLb'CS 
AAAAPGGVDA 
GVHLSQSVTL 
VNVIVERLPS 
DKCTVDIGGL 
LAAEALVTMP 
SPLKNSSDGL 
RERTDQNASD 
LCHYTSGNKG 
RIYQCKQCEE 
CVQEGNKSSV 
CSHPPSLKSH 
SSEESADPVT 
KISSLAPPSM 
DHNTALNTN 



BLASTP hits 



Entry S10245 from database PIR: 
finger protein, testis - mouse 

Score = 265, P = 8.4e-23, identities = 61/205, positives = 91/205 

Entry S22954 from database PIR: 
finger protein zfp-37 - mouse 

Score = 265, P = 9.1e-22, identities = 61/205, positives = 91/205 
Entry AF031657_1 from database TREMBL: 

gene: "Zfp94"; product: "zinc-finger protein 94"; Rattus norvegicus 
zinc-finger protein 94 (Zfp94) gene, partial cds. 

Score = 243, P - 1.6e-21, identities = 57/190, positives = 85/190 



Alert BLASTP hits for DKFZphtes3_2el2, frame 1 
No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_2el2, frame 1 



Report for DKFZphtes3_2el2 . 1 



LENGTH] 849 

MW] 94325.42 

pi] 5.47 

HOMOL] PIR:A54661 zinc finger protein ZNF41 - human (fragment) 2e-22 

FUNCAT) 04.05.01.04 transcriptional control [S. cerevisiae, YJL056c] 3e-09 

FUNCAT] 30.10 nuclear organization [S. cerevisiae, YJL056c] 3e-09 

FUNCAT] 04.03.01 trna synthesis [S. cerevisiae, YPRlB6c PZF1 - TFIIIA] le-07 

FUNCAT] 04.01.01 rrna synthesis [S. cerevisiae, YPR186C PZF1 - TFIIIA] le-07 

FUNCAT] 04.99 other transcription activities [S. cerevisiae, YOR113w] 4e-07 

FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YGL209w] 

:e-04 

FUNCAT] 13.04 homeostasis of other ions [S. cerevisiae, YNL027w] 2e-04 

FUNCAT] 11.01 stress response [S. cerevisiae, YMR037c] 3e-04 

BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

SCOP] dlmeyg_ 9.6.1.1.1 a designed zinc finger protein [syntheti 8e-06 

PIRKW] nucleus 8e-18 

PIRKW] RNA binding 5e-13 

PIRKW] duplication 7e-13 

PIRKW] tandem repeat le-21 

PIRKW] spermatogenesis 6e-16 

PIRKW] zinc 9e-21 

PIRKW] zinc finger le-21 

PIRKW] DNA binding le-21 

PIRKW] metal binding 3e-15 

PIRKW] phosphoprotein 5e-13 

PIRKW] leucine zipper le-13 

PIRKW] alternative splicing 6e-18 

PIRKW] eye lens 2e-16 

PIRKW] oocyte le-12 

PIRKW] transcription factor 6e-18 

PIRKW] segmentation 7e-13 

PIRKW] embryo le-12 

PIRKW] transcription regulation 2e-19 

PIRKW] homeobox 2e-08 

SUPFAM] POZ domain homology 7e-15 

SUPFAM] transcription factor Krueppel 7e-13 

SUPFAM] zinc finger protein ZFP-36 le-21 

SUPFAM] homeobox homology 2e-08 

SUPFAM] unassigned homeobox proteins 2e-08 

PROSITE] CYTOCHROME_C 1 

PROSITE] MYRI ST YL 10 

PROSITE] ZINC_FINGER_C2H2 3 

PROSITE] AMI DAT I ON 2 

PROSITE] CAMP_PHOSPHO_SITE 2 

PROSITE] CK2_PHOSPHO_SITE 18 

PROSITE] TYR_PHOSPHO_SITE 3 

PROSITE] PKCPHOSPHOSITE 10 

PROSITE] ASN_GLYCOSYLATI0N 7 

PFAM] Zinc finger, C2H2 type 

KW] Irregular 

KW] 3D 

KW] LOW_COMPLEXITY 5 . 65 % 



SEQ MSQTNFTPDTLAQNEGKAMSYQCSLCKFLSSSFSVLKDHIKQHGQQNEVILMCSECHITS 

SEG xxxxxxxxxxxxxxx 

lmeyF 

SEQ RSQEELEAHVVNDHDNDANIHTQSKAQQCVSPSSSLCRKTTERNETIPDIPVSVDNLQTH 

SEG 

lmeyF 

SEQ TVQTASVAEMGRRKWYAYEQYGMYRCLFCSYTCGQQRMLKTHAWKHAGEVDCSYPIFENE 

SEG 

lmeyF 

SEQ NEPLGLLDSSAAAAPGGVDAVVIAIGESELSIHNGPSVQVQICSSEQLSSSSPLEQSAER 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . . . 

lmeyF 

SEQ GVHLSQSVTLDPNEEEMLEVISDAEENLIPDSLLTSAQKIISSSPNKKGHVNVIVERLPS 

SEG 

lmeyF 
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SEQ AEETLSQKRFLMNTEMEEGKDLSLTEAQIGREGMDDVYRADKCTVDIGGLIIGWSSSEKK 

SEG 

lmeyF 

SEQ DELMNKGLATDENAPPGRRRTNSESLRLHSLAAEALVTMPIRAAELTRANLGHYGDINLL 

SEG 

lmeyF 

SEQ DPDTSQRQVDSTLAAYSKMMSPLKNSSDGLTSLNQSNSTLVALPEGRQELSDGQVKTGIS 

SEG 

lmeyF 

SEQ MSLLTVIEKLRERTDQNASDDDILKELQDNAQCQPNSDTSLSGNNVVEYIPNAERPYRCR 

SEG 

lmeyF TTTEETT 

SEQ LCHYTSGNKGYI KQHLRVHRQRQPYQCPICEHIADNSKDLESHMIHHCKTRI YQCKQCEE 

SEG 

lmeyF TTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCCCEEETTTTE 

SEQ SFHYKSQLRNHEREQHSLPDTLSI ATSNEPRI SSDTADGKCVQEGNKSSVQKQYRCDVCD 

SEG 

lmeyF EECCHHHHHHHHHHHC 

SEQ YTSTTYVGVRNHRRIHNSDKPYRCSLCGYVCSHPPSLKSHMWKHASDQNYNYEQVNKAIN 

SEG 

lmeyF 

SEQ DAI SQSGRVLGKSPGKTQLKS SEES ADPVTGSSENAVSSSELMSQTPSEVLGTNENEKLS 

SEG 

lmeyF 

SEQ PTSNTS YSLEKISSLAPPSMEYCVLLFCCCICGFESTSKENLLDHMKEHEGEIVNI ILNK 

SEG 

lmeyF 

SEQ DHNTALNTN 

SEG 

lmeyF 
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Pfam for DKFZphtes3_2el2 . 1 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR.T.H* 

C++ C+ T R++++L++H H 
Query 53 CSE— CHITSRSOEELEAHVVN-DH 



74 



23.25 (bits) f: 539 t: 559 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMRTH» 
C C++T ++ ++H+R+H 

dkfzphtes3 539 CRL— CHYTSGNKGYIKQHLRVH 559 

Query f: 567 t: 587 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM 'CpwPDCgKtFrrwsNLrRHMRTH* 

CP+ C+ ++ +L+ HM+ H 
Query 567 CPI--CEHI ADNS KDLESHMI HH 587 

33.47 (bits) f: 595 t: 616 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFrrwsNLrRHMR. T . H» 

C+ C+++F ++S+LR+H R H 

dkfzphtes3 595 CKQ — CEESFHYKSQLRNHERE-QH 616 

Query f: 656 t: 676 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM * CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C++T ++ R+H+R+H 
Query 656 CDV— CDYTSTTYVGVRNHRRIH 676 

24.53 (bits) f: 684 t: 704 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query * CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CG++ +++ +L+ HM H 

dkfzphtes3 684 CSL — CGYVCSHPPSLKSHMWKH 704 

Query f: 809 t: 829 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus : 
HMM * CpwPDCgKtFrrwsNLrRHMRTH* 

C + CG ++++NL HM+ H 
Query 809 CCI— CGFESTSKENLLDHMKEH 829 



790 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_2f 14 



group: testes derived 

DKFZphtes3_2f 14 encodes a novel 129 amino acid protein with very weak similarity to human 
omega protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

weak similarity to omega protein 

complete cDNA, complete cds, 1 EST hit 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2353 bp 

Poly A stretch at pos . 2341, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 



GCAGATTCTC 
AGCGCCTGCC 
GCTCTTTAGG 
TTGTCTCATG 
CCTGGACAGG 
ATGCCTCACA 
TCCAGGCCCA 
TCCCGGCATT 
ACCAACTCCT 
ACTGCCGGCC 
CCACCTTCTG 
CAGCCTCTGC 
CTCTTGCCTC 
CAGCCTCAAC 
TTGATGCCTC 
TCTCTAGGCC 
TCCTCTTGTC 
CTCACAGTGC 
GACCAGGTTC 
CAATGGCCTT 
TAGCTTTCGC 
TCTCCAGGCC 
CTCAAATTGG 
GACCCAAATC 
GGTGGCCTCT 
GCTCATGCCT 
TCTTCAGGCC 
TGCCTTCTTA 
CCACACCCAG 
CTTTTGGCAG 
GGCCCCGCTC 
TGGCGGCCTC 
AGTTGCTGCC 
TGTCTACAGG 
CTGCCCAGCT 
CTCTCCAGGC 
TGCCTGACAA 
CCAGGCCCAG 
TGACTTCTGG 
CTAGGCCCAG 
TCAGATCTGC 
CACCAGCTCC 
AGTGGCCACT 
TGCTCTTGCC 
GGTCAACAGC 
CTTTACAGCA 
ATTTCTGACA 
AAA 



CAGGCCCAGC 
TTTCAGCAGC 
CCAAGCTCAT 
GCAACCTTCC 
CCCAGGTCCT 
GTGGCCTCTC 
AAACTTCCTC 
CTCTCCAGGC 
GCCTCACAAC 
TTTGTAGGCC 
CCTTGCAGTG 
AGGCCCTGCT 
ACAGTGGCTT 
AGGCCTAGCT 
TGGCAACCTG 
GAGGTCCTTT 
ATCTCTCCAG 
ACCTTCCAGI 
CTGCCTTTCG 
TGTAGGCCAC 
TTTTTGGCCA 
CAGCTCTTCC 
CCTCTTCTTT 
GTCCTCCAGT 
CCAGGTGCAA 
CTTGGTGGCC 
CAGAACTTGA 
AGGTCTGTAC 
CTCTTGCCTC 
CTTCGACAAG 
ATTCCTTACA 
TCCAGGCCCA 
TCCTGGCATC 
CCCAACTCCT 
CCTGGTGGCC 
CTACTGTCAG 
TGGCCTCTCC 
CTCTTGCCTC 
CAGCCTCAAC 
CTCCTTTTTC 
CTCCCAAGAC 
TGCCTCACAA 
CAAGGCCCAT 
TCACAGTTGC 
ATCAAGGAGC 
GAGTGCCTTA 
AATCGATAGT 



ATCTGCCTCA 
CTCTACACAC 
ACCTCACGAT 
CTGGCCAAGT 
GCCACACACT 
CAGGCCCAGC 
AAGTCGGCCT 
CTAGCTCTTC 
AACCTTTTAT 
CAAAACTTCC 
GCCTGTACAG 
CTTGCCTCTT 
CCGTGGGCCA 
CCTCCCTCAC 
TCCAGGCCCA 
CTCATACTGG 
GCCCAGCTTT 
CCCACCTCTT 
GCAGCCTCTA 
GCTCATGCCT 
CTCCAGGCCC 
TCCCAGCAAC 
CCCAGCTCCT 
TGGTTTTTCC 
AACTTCCTCC 
TTCTCAGGCC 
ACTCAAGTCA 
AGGCCCAGCC 
ACTGTAGCCT 
CCCAGCTCCT 
ACGGCCTTTC 
GAACTTCCTC 
CTCTGCAGGC 
GCCTCACAAC 
TTTGTAGGCT 
CCTCGTGGCA 
AGGCTTTTCT 
ATGGTGGCCT 
CGGCCCAGCT 
ACAGTGGCCT 
CCAGCTCCTG 
TGGCCTCGTC 
CTTTTGCCTC 
CTCTTCCAGA 
CTAAAGCTTC 
GCAAAAACTG 
AAATTCTGCC 



CCGTGGCCCC 
CCAGCTCCTG 
GATTTTTCCA 
TTCCACCTAT 
GGCCTCTCTA 
TCCTGTCCCG 
CTCCAGGCCC 
CTCCTGGCTG 
GGCTCAGCTC 
TCAAGTCAAG 
ACCCAGCTCT 
AGCTCCCTCT 
AGTTCCCGCC 
AATGGCTTGT 
GCTCCTGCCT 
CCTGTTTAGG 
TGCCTGTTGT 
GCCTCACCAT 
CAGGCCTAGC 
CACTGTGGCC 
AGAACT TCCC 
CTCTGCAGGC 
GCCTCCTGGT 
AGGCCCAGCT 
CATCAGCCTG 
CTGCTTTTGA 
GCCTCTCCAG 
TCTACCTCAC 
CCCCAGTCCA 
GCCTTTCAAT 
CAGGCCCAGT 
AAGTCGGCCT 
CGAGCTCTTC 
AACCTCCTTG 
CAAAATTTTC 
GCCTAAACAG 
CCTGCCTCGC 
TCCCCGGCCA 
TCTGCCTCAC 
CACTACGCCC 
TCTCATGGTG 
TGGCCCATCT 
ATGGTAGCCT 
TCCAGCTTTA 
CCTGGACTCT 
TCTCTTAACC 
TGTGTGGTTT 



CCACAAGCCA 
CCACCCAATG 
GGCCCAACTT 
TTCCTGGCAG 
CGCCCAGCTC 
GGACATCATC 
AGTTGCTGCC 
TATCTACAAG 
CTGCCCAACT 
CTCTTTAGGC 
GGCTTGAGAA 
CCAGGCCCAT 
TGCCTCCCAG 
TTAGGTCCAG 
CACACTCGCC 
CCCAGCTCAT 
TGGCCTCTAC 
GGCCTCCTCT 
TGCTGCCTCC 
TTTCCAGGCC 
CCAGTCAGCC 
CCAAATCATC 
GGCCTCTGAA 
CCTGCCTTTT 
TCCAGGCCCA 
CTTGGTGGCC 
GCCCAGCTCC 
AGCGGACTCT 
AAACTCCTGC 
GACCTCTTTA 
TTTTCCCTTT 
CTTTAGGCCC 
CTCCCTGCTG 
GACTCAGCTT 
TCAAATCAAG 
GCCCAGCTCC 
AGCAGGCTTT 
TGTTCCTATC 
ACTGGCCTCT 
ATCTCCTACC 
GTCTCTCTTA 
TCTGCCTCAC 
CTTCTGGTTT 
AGCCTTTGAT 
CATTTGTTCA 
TTGAGAGTGG 
CAAAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 158 bp to 544 bp; peptide length: 129 
Category: similarity to known protein 



1 MATFPGQVST YFLAAWTGPG PATHWPLYAQ LMPHSGLSRP SSCPGTSSPG 
51 PKLPQVGLSR PSCCLPAFSP GLALPPGCIY KTNSCLTTTF YGSAPAQLLP 
101 AFVGPKLPQV KLFRPTFCLA VACTDPALA 



BLA3TP hits 



Entry 170697 from database PIR: 
omega protein - human (fragment) 

Score =79, P = 2.8e-03, identities = 32/94, positives = 38/94 



Alert BLASTP hits for DKFZphtes3_2 f 14 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2f 14 , frame 2 



Report for DKFZphtes3_2f 14 .2 



[LENGTH] 129 

[MW] 13421.76 

[pi] 9.14 

[PROSITE] MYRISTYL 2 

[KW] Irregular 

[KW] LOW_COMPLEXITY 10.85 % 

SEQ MATFPGQVSTYFLAAWTGPGPATHWPLYAQLMPHSGLSRPSSCPGTSSPGPKLPQVGLSR 
SEG xxxxxxxxxxxxxx 



PRD cccccccceeehhhhhcccccccccccccccccccccccccccccccccccccccccccc 

SEQ PSCCLPAFSPGLALPPGCI YKTNSCLTTTFYGSAPAQLLPAFVGPKLPQVKLFRPTFCLA 

SEG 

-PRD cccccccccccccccccccccccccceeeccccccccccccccccccccccccccccccc 

SEQ VACTDPALA 

SEG 

PRD ccccccccc 



Prosite for DKFZphtes3_2f 14 .2 

PS00008 6->12 MYRISTYL PDOC00008 

PS0Q008 92->98 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2f 14 . 2 ) 
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DKFZphtes3_2g7 



group: testes derived 

DKFZphtes3_2g7 encodes a novel 359 amino acid protein with similarity to neurof iliament 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to neurofilament proteins 

complete cDNA, complete cds, 6 EST hits (5 hits are out of a testis 
library) 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1613 bp 

Poly A stretch at pos. 1595, polyadenylation signal at pos . 1557 



1 GCCACACAGG CTCCTTGGAG TAAGAGTGTG AGAAACTGGA TGAAGACAGC 

51 TGTATTCTTT TGGAAGCGTT CGAGATTGGT CTGTCTCTAC CAACTAAAAA 

101 CTTCTAGCTT AAGTGCAGAG ATTTAAGGAG ATCAACAAAA ACTCAGTCTA 

151 GACATATTAT GAGGCTGGGA GGGTATCAAC AGACTTGAGT TCTTGTCAGC 

201 AAGATCACCT GCTTTTAATA TTGTCCTCAG GGTCTGAGCA CATCTGGAAG 

251 TGAGGTCAAT CAAGTTAGAC CCCAAAAACT TTTGTGACAA CAGTGAAGAG 

301 GGGAAAATAA ACACACCACA AACATGAACC TCAACCCCCC GACATCTGCT 

351 CTTCAGATCG AGGGCAAAGG CAGCCATATT ATGGCTAGAA ATGTAAGCTG 

401 CTTTCTAGTC AGGCACACCC CTCATCCCAG AAGAGTCTGC CACATCAAAG 

451 GCTTGAATAA CATTCCAATC TGTACTGTGA ATGATGATGA GAATGCATTT 

501 GGAACATTGT GGGAAGTTGG CCAG'f CTAAC TACTTAGAGA AGAACAGGAT 

551 ACCATTTGCC AATTGCAGTT ACCCCCCGAG CACTGCAGTC CAGAAGAGCC 

601 CTGTAAGAGG AATGTCGCCA GCCCCAAACG GTGCCAAAGT GCCTCCACGG 

651 CCTCATTCTG AGCCCAGTAG AAAAATTAAA GAGTGCTTCA AAACTTCCAG 

701 TGAGAATCCC TTAGTAATTA AAAAGGAAGA AATTAAGGCC AAAAGACCAC 

751 CATCACCTCC AAAGGCATGC TCTACTCCTG GCTCCTGTTC TTCAGGGATG 

801 ACAAGTACCA AGAATGATGT GAAAGCAAAC ACCATTTGCA TACCAAACTA 

851 TCTGGATCAG GAAATAAAAA TCCTGGCAAA GCTCTGTAGC ATTTTGCATA 

901 CTGATTCTCT GGCAGAAGTT TTACAGTGGC TGCTTCATGC AACTTCAAAA 

951 GAAAAAGAGT GGGTCTCAGC TTTGATTCAT TCTGAGCTTG CCGAGATAAA 

1001 CCTGTTAACT CATCACAGAA GAAACACCTC AATGGAACCA GCAGCAGAGA 

1051 CTGGGAAGCC ACCCACAGTT AAATCACCAC CCACAGTTAA ATTGCCCCCA 

1101 AATTTTACTG CAAAATCAAA AGTGCTGACC AGAGATACAG AAGGGGATCA 

1151 ACCAACCAGA GTGTCAAGTC AAGGATCTGA AGAAAACAAG GAAGTACCAA 

1201 AAGAGGCTGA GCACAAGCCT CCACTACTTA TAAGAAGAAA TAATATGAAA 

1251 ATACCTGTTG CAGAATATTT CAGCAAACCA AATTCTCCTC CCAGGCCTAA 

1301 CACTCAGGAG AGTGGATCAG CAAAACCAGT GTCAGCAAGG AGTATACAAG 

1351 AATACAACCT CTGTCCCCAA AGAGCATGTT ATCCTTCAAC ACACCGGAGG 

1401 TAGAAGTTCT AGACTGGGTG AATTCTTTCA TGAATATGAG CTTCACATTT 

1451 ACATCATCAA ATTATTTTTC AAATGAATAT TTTTGGTATT GAGGAATCAA 

1501 GTGGTCCTCT TTATGGTGGC ACATGTAAAT CTAAAAATAC CTGTATGTAA 

1551 TGCTACAAAT AAATATTACT GGAAATGATA TTTCCATTTG TAGTTAAAAA 

1601 AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 324 bp to 1400 bp; peptide length: 359 
Category: similarity to known protein 
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1 MNLNPPTSAL QIEGKGSHIM ARNVSCFLVR HTPHPRRVCH IKGLNNIPIC 

51 TVNDDENAFG TLWEVGQSNY LEKNRIPFAN CSYPPSTAVQ KSPVRGMSPA 

101 PNGAKVPPRP HSEPSRKIKE CFKTSSENPL VIKKEEIKAK RPPSPPKACS 

151 TPGSCSSGMT STKNDVKANT ICIPNYLDQE IKILAKLCSI LHTDSLAEVL 

201 QWLLHATSKE KEWVSALIHS ELAEINLLTH HRRNTSMEPA AETGKPPTVK 

251 SPPTVKLPPN FTAKSKVLTR DTEGDQPTRV SSQGSEENKE VPKEAEHKPP 

301 LLIRRNNMKI PVAEYFSKPN SPPRPNTQES GSAKPVSARS IQEYNLCPQR 
351 ACYPSTHRR 



BLASTP hits 



Entry A43427 from database PIR: 

neurofilament triplet HI protein - rabbit (fragment) 

Score = 118, P = 5.6e-04, identities = 79/290, positives = 110/290 

Entry RNNFH_1 from database TREMBL : 

Rat high molecular weight neurofilament (NF-H) protein mRNA, 3' end. 
Score = 115, P = 9.5e-04, identities = 59/281, positives = 100/281 

Entry B43427 from database PIR: 

neurofilament protein H form H2 (repetitive region) - rabbit (fragment) 
Score = 111, P = 1.3e-03, identities = 64/269, positives = 102/269 



Alert BLASTP hits for DKFZphtes3_2g7, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2g7, frame 3 



Report for DKFZphtes3_2g7 . 3 



[LENGTH] 359 

[MM] 39725.53 

[pi] 9.45 

[PROSITE] MYRISTYL 3 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PH0SPHO_SITE 9 

[PROSITE] PKC_PHCSPHO_SITE 10 

[PROSITE] ASNGLYC OS YLATION 4 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.18 % 



SEQ MNLNPPTSALQIEGKGSHIMARNVSCFLVRHTPHPRRVCHIKGLNNIPICTVNDDENAFG 

SEG 

PRD ccccccccceeecccccceeeeccceeeeecccccccccccccccccccccccccccccc 

SEQ TLWEVGQSNYLEKNRIPFANCS YPPSTAVQKSPVRGMSPAPNGAKVPPRPHSEPSRKIKE 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SEQ CFKTSSENPLVIKKEEI KAKRPPSPPKACSTPGSCSSGMTSTKNDVKANTICI PNYLDQE 

SEG 

PRD hcccccccceeeehhhhhhccccccccccccccccccccccccccccceeeeccccchhh 

SEQ IKILAKLCSILHTDSLAEVLQWLLHATSKEKEWVSALIHSELAEINLLTHHRRNTSMEPA 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

SEQ AETGKPPTVKSPPTVKLPPNFTAKSKVLTRDTEGDQPTRVSSQGSEENKE VPKEAEHKPP 

SEG . . . .xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeecccccccceeeeccccccccccccccccccc 

SEQ LLIRRNNMKI PVAEYFSKPNSPPRPNTQESGSAKPVSARSIQEYNLCPQRACYPSTHRR 

SEG 

PRD eeeeccccccceeeeecccccccccccccccccccchhhhhhccccccccccccccccc 



Prosite for DKFZphtes3_2g7 . 3 



PS00001 23->27 ASN_GLYCOS YLATION PDOC00001 

PS00001 80->84 ASN_GLYCOS YLATION PDOC00001 

PS00001 234->238 ASN GLYCOS YLATION PDOC00001 
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PS00001 


260- 


>264 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


232- 


•>236 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


115- 


>118 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


161- 


•>164 


PKC 


"PHOSPHO 


"SITE 


PDOC00005 


PS00005 


207- 


•>210 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


243- 


•>246 


PKC 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


248- 


•>251 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


254- 


•>257 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


262- 


•>265 


PKC 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


332- 


>335 


PKC 


>HOSPHO_ 


"site 


PDOC00005 


PS00005 


337- 


■>340 


PKC - 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


356- 


>359 


PKC 


"PHOSPHO_ 


"site 


PDOC00005 


PS00006 


51 


->55 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


61 


->65 


CK2~ 


"PHOSPHO" 


[site 


PDOC00006 


PS00006 


124- 


•>128 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


162- 


>166 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


195- 


•>199 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


207- 


•>211 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


235- 


•>239 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


272- 


•>276 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


340- 


■>344 


CK2 


PHOSPHO 


"site 


PDOC00006 


PS00008 


153- 


>159 


MYRI STYL 




PDOC00008 


PS00008 


158- 


>164 


MYRISTYL 




PDOC00008 


PS00008 


284- 


■>290 


MYRI STYL 




PDOC00008 



(No Pfam data available for DKFZphtes3_2g7 . 3 ) 
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DKFZphtes3_2hl 



group: transmembrane protein 

DKFZphtes3_2hl encodes a novel 116 amino acid protein with weak similarity to C. elegans 
cosmid C13F10. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 

similarity to C. elegans C13F10.5 
TRANSMEMBRANE 1 
Sequenced by EMBL 
Locus: /map="2" 
Insert length: 1156 bp 

Poly A stretch at pos. 1143, polyadenylation signal at pos . 1121 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



GGCCATCAAA 
GCCTCCATTT 
GGGCCTCGCT 
TGAGGAAGTA 
AAGCAGTTTC 
AAAATGCTGG 
GAACACAGCC 
ATATCACCTT 
GTGGAACTGG 
GATGTACGTC 
GCGCCTACTC 
ACTGCAGAGC 
ATAGGACCCA 
CACCATTGGC 
GCGGGTTAGC 
TTTGCTGCCA 
TTCACTGTGG 
GACCTGTTTG 
AGAAAGGCAC 
ATGATTAGGT 
AACATTTATA 
GGGTTTCTTG 
CAGTCATTAC 
AAAAAA 



ATAACTAAAC 
GGGCCAAGCT 
GTGACTGACA 
TCTACATCCT 
CTGAAATAAT 
AAGCGGCTCA 
ATTCCTCTGC 
CTTGAAGGTT 
AATTTGGCCT 
GGGACACGAG 
TGTGTTCAAT 
AGTTGGAGCG 
GCTGTGCTGT 
TATGGATTTG 
TCTGTGACTG 
TTTGATCTTT 
GTCCGACGCA 
TTTCATTTCT 
TGGGGAGATT 
ACATCAGGGC 
GCAATTTTTT 
TTTGTTTTTG 
TGGTATTGAA 



CATGTCATTT 
CTGACTGCAA 
ATGCCGCTGC 
CCTTCCCACT 
TCTGTGACGA 
GCCCCAGGGC 
CGTCGTGCTG 
CTTCTCTGGT 
GGCATATTTT 
GCCCTGAAGA 
CCAGGCTGTG 
CGAGTTACAG 
CATGCAGCTA 
ATTTCAGGTG 
CATAGTTTTT 
GATAGTTTTG 
ATTTATAAAA 
CATCTGTTTG 
CTCAGCTTAA 
TGCATTGTCA 
TTTTCCCGGA 
TTTTGCTTCC 
AAATAAAATA 



GGAGCAACAA 
TGATGCCTCT 
ATCTTTTCAG 
ACCAGATTTT 
GCTTCTTCCA 
AGCACATCAG 
GGACCAGTCT 
TGGTCCTGCT 
GTCCTGTCCT 
GAAGAAAGAG 
AAGCCATCCA 
TTGAGACCCC 
ACCTCTGATG 
TATAGGACTA 
CTACCTTCTT 
GTGAAACTCT 
ATTATGTACT 
GGAGATGATT 
AACATCCAGC 
ATGTTCTCTT 
GAGTTTAGGT 
TGCTTTAATT 
TCTTTAAAAC 



AGCCACTGCG 
GCCCCGACCC 
CAGTCATTGA 
GCTTGGAGAA 
CATTAGGACA 
AGACACCATG 
TTCCTGACCA 
GGGACTGTTT 
TGTTCTATTG 
GGAGAGAAGA 
GGGCACCCTG 
TGGCAGGGAG 
TGGTCTTCCT 
AGGGCAGCTT 
TCCCTGATCT 
CTAAAATACA 

CAAGAAGGGA 
TTAGAGCACT 
AGTTTGAAGT 
TAAGTCTTTT 
TGCAAGTTTT 
CTTTAATTTT 
ATCAAAAAAA 



BLAST Results 



Entry HS313307 from database EMBL: 
human STS SHGC-16715. 
Score = 1222, P = 1.4e-48, identities 



248/2bl 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 254 bp to 601 bp; peptide length: 116 
Category: similarity to unknown protein 



1 MLEAAQPQGS TSETPWNTAI PLPSCWDQSF LTNITFLKVL LWLVLLGLFV 
51 ELEFGLAYFV LSLFYWMYVG TRGPEEKKEG EKSAYSVFNP GCEAIQGTLT 
101 AEQLERELQL RPLAGR 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl, frame 2 

TREMBL:CEUC13F10_2 gene: "C13F10.5"; Caenorhabditis elegans cosmid 
C13F10., N = 1, Score - 141, P = 8.2e-10 



>TREMBL:CEUC13F10_2 gene: "C13F10. 5"; Caenorhabditis elegans cosmid 
C13F10. 

Length = 171 

HSPs: 

Score = 141 (21.2 bits), Expect - 8.2e-10, P = 8.2e-10 
Identities = 32/82 (39%), Positives = 52/82 (63%) 



Query: 27 DQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFVLSLFYWMYVGTRGPEEKKEGEKSAYS 86 

+QS ++ T + V++++V L ++FG +F+LSL + Y T G ++ GE SAYS 
Sbjct: 90 EQSVVS— TRIAVVVYVVGQALAAWVQFGAVFFILSLILFTYWNT-G--RRRRGEMSAYS 144 

Query: 87 VFNPGCEAIQGTLTAEQLEREL 108 

VFN CE + G++TAE ER++ 
Sbjct: 145 VFNDNCERLAGSMTAEHFERDM 166 

Pedant information for DKFZphtes3_2hl, frame 2 



Report for DKFZphtes3_2hl . 2 

[LENGTH] 116 

[MW] 13092.19 

[pi] 4.64 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 32.76 % 

SEQ MLEAAQPQGSTSETPWNTAIPLPSCWDQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFV 

SEG xxxxxxxxxxxxxxxxxxxxx . . . . 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhchhhhh 
MEM MMMMMXMMMMMMMMMMM 

SEQ LSLFYWMYVGTRGPEEKKEGEKSAYSVFNPGCEAIGGTLTAEQLERELQLRPLAGR 

SEG xxxxxxxxxxxxxxxxx . . 

PRD hhhhhhhhcccccchhhhhcccceeeecccccccccccchhhhhhhhhhccccccc 

MEM 

Prosite for DKFZphtes3_2hl . 2 

PS00001 33->37 ASN_GLYCOS YLAT I ON PDOC00001 

PS00006 10->14 CK2 PHOSPHO_SITE PDOC00006 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00007 78->86 TYR_PHOSPHO_SITE PDOC00007 

PS00007 77->86 TYR_PHOSPHO_SITE PDOC00007 

PS00008 97->103 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2hl . 2 ) 
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DKFZphtes3_2hl5 



group: testes derived 

DKFZphtes3_2hl5 encodes a novel 855 amino acid protein with very weak similarity to S. pombe 
cdc23 . 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to cdc23 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 4619 bp 

Poly A stretch at pos. 4598, polyadenylation signal at pos . 4589 



1 GAAGGCGTCC CGGCATCGGC CAAGATTCTA CATTGCTCAT CTGGGCATCT 
51 GAGCCTCCTT CGAAGTTTCC TGTCACAACT GTCCTCTTGA CAGCATGGAT 
101 GAGGAGGAAG ACAATCTGTC TCTGCTGACC GCACTGCTGG AAGAAAATGA 
151 GTCAGCCTTG GATTGTAATT CAGAAGAAAA TAACTTCTTG ACGCGGGAAA 
201 ATGGCGAGCC CGACGCATTT GATGAGCTCT TTGATGCCGA CGGCGACGGT 
251 GAATCTTATA CAGAAGAGGC TGATGATGGA GAAACAGGAG AGACAAGAGA 
301 CGAAAAGGAA AATCTGGCCA CTCTCTTTGG AGATATGGAG GACTTAACAG 
351 ATGAAGAAGA AGTTCCCGCA TCACAGTCAA CTGAAAATAG GGTCCTCCCT 
4 01 GCTCCTGCCC CCAGGCGAGA GAAAACGAAT GAAGAGTTGC AAGAGGAATT 
451 AAGGAATTTG CAAGAGCAAA TGAAGGCCTT ACAAGAGCAG CTAAAAGTAA 
501 CAACAATTAA ACAGACAGCA AGCCCAGCCC GTCTGCAAAA ATCCCCTGAG 
551 AAGTCTCCCC GGCCACCTCT TAAGGAGAGG AGAGTTCAGA GAATTCAGGA 
601 GTCAACATGC TTTTCTGCGG AGCTTGATGT CCCTGCGCTA CCAAGAACCA 
651 AGAGGGTGGC TCGAACACCA AAGCCTTCAC CTCCAGATCC CAAAAGCTCA 
701 TCTTCAAGGA TGACAAGTGC ACCCTCCCAA CCCCTACAGA CGATTTCTCG 
7 51 GAACAAACCT AGTGGGATAA CTAGAGGTCA AATTGTGGGG ACCCCAGGAA 
801 GTTCTGGGGA AACGACTCAA CCCATCTGTG TGGAAGCCTT CTCTGGTCTG 
851 CGGCTCAGGC GGCCTCGAGT ATCCTCCACA GAAATGAACA AGAAAATGAC 
901 CGGCCGAAAA CTGATCAGAC TGTCTCAGAT CAAGGAAAAG ATGGCCAGAG 
951 AGAAGCTGGA AGAAATAGAT TGGGTGACAT TTGGGGTTAT ATTGAAGAAG 
1001 GTTACGCCAC AGAGTGTGAA TAGTGGAAAA ACCTTCAGCA TATGGAAACT 
1051 GAATGATCTT CGTGACCTGA CACAATGTGT GTCCTTGTTC TTATTTGGAG 
1101 AAGTTCACAA AGCGCTCTGG AAGACGGAGC AGGGGACTGT CGTAGGGATC 
1151 CTCAATGCCA ACCCCATGAA GCCCAACGAT GGTTCAGAGG AGGTGTGTTT 
1201 ATCTATCGAT CATCCTCAGA AGGTCTTAAT TATGGGTGAA GCTCTTGACC 
1251 TGGGAACCTG TAAAGCCAAG AAGAAGAATG GAGAGCCGTG CACGCAGACT 
1301 GTGAATTTGC GTGACTGTGA GTACTGTCAG TACCATGTCC AGGCTCAGTA 
1351 CAAGAAGCTC AGTGCAAAGC GTGCGGATCT GCAGTCCACC TTCTCTGGAG 
1401 GACGAATTCC AAAGAAGTTT GCCCGCAGAG GCACCAGCCT CAAAGAACGG 
14 51 CTGTGCCAAG ATGGCTTTTA CTACGGAGGG GTTTCTTCTG CCTCGTATGC 
1501 AGCTTCAATT GCAGCAGCTG TGGCTCCTAA GAAGAAGATT CAAACCACTC 
1551 TGAGTAATCT GGTTGTTAAG GGCACAAACT TGATCATCCA GGAAACACGG 
1601 CAAAAACTCG GAATACCCCA GAAGAGCCTG TCTTGCTCTG AGGAGTTCAA 
1651 GGAACTGATG GACCTGCCGA CGTGTGGAGC CAGGAACTTA AAACAACATT 
1701 TAGCCAAAGC CTCAGCTTCA GGGATTATGG GGAGCCCAAA ACCAGCCATC 
1751 AAGTCCATCT CGGCCTCAGC ACTCTTGAAG CAACAGAAGC AGCGGATGTT 
1801 GGAGATGAGG AGAAGGAAAT CAGAAGAAAT ACAGAAGCGA TTTCTGCAGA 
1851 GCTCAAGTGA AGTTGAGAGC CCAGCTGTGC CATCTTCATC AAGACAGCCC 
1901 CCTGCTCAGC CTCCACGGAC AGGATCCGAG TTCCCCAGGC TGGAGGGAGC 
1951 CCCGGCCACA ATGACGCCCA AGCTGGGGCG AGGTGTCTTG GAAGGAGATG 
2001 ATGTTCTCTT TTATGATGAG TCACCACCAC CAAGACCAAA ACTGAGTGCT 
2051 TTAGCAGAAG CCAAAAAGTT AGCTGCTATC ACCAAATTAA GGGCAAAAGG 
2101 CCAGGTTCTT ACAAAAACAA ACCCAAACAG CATTAAGAAG AAACAAAAGG 
2151 ACCCTCAGGA CATCCTGGAG GTGAAGGAAC GTGTAGAAAA AAACACCATG 
2201 TTTTCTTCTC AAGCTGAGGA TGAATTGGAG CCTGCCAGGA AAAAAAGGAG 
2251 AGAACAACTT GCCTATCTGG AATCTGAGGA ATTTCAGAAA ATCCTAAAAG 
2301 CAAAATCAAA ACACACAGGC ATCCTGAAAG AGGCCGAGGC TGAGATGCAG 
2351 GAGCGCTACT TTGAGCCACT GGTGAAAAAA GAACAAATGG AAGAAAAGAT 
24 01 GAGAAACATC AGAGAAGTGA AGTGCCGTGT CGTGACATGC AAGACGTGCG 
24 51 CCTATACCCA CTTCAAGCTG CTGGAGACCT GCGTCAGTGA GCAGCATGAA 
2 501 TACCACTGGC ATGATGGTGT GAAGAGGTTT TTCAAATGTC CCTGTGGAAA 
2551 CAGAAGCATC TCCTTGGACA GACTCCCGAA CAAGCACTGC AGTAACTGTG 
2601 GCCTCTACAA ATGGGAACGG GACGGAATGC TAAAGGTATG CCATTTGCGT 
2 651 ACTAATTTTT GACTCCTTTT AGTGACCCAT GCTAATAATG TGGAACCATC 
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2701 TCCTATTAAA ATATTTTCAT TTTTCTAGGA AAAGACTGGT CCAAAGATAG 
2751 GAGGAGAAAC TCTGTTACCA AGAGGAGAAG AACATGCTAA ATTTCTGAAC 
2801 AGCCTTAAAT AACCCGAACT TCAGACATTT TCCCACAGAC TTCCTGGCCT 
2851 CCTGTGACTC TGGAAAGCAA AGGATTGGCT GTGTATTGTC CATTGATTCC 
2901 TGATTGACGC CGTCAAAAAC AAATGCTTGT TAAGCCCATA AGCTTTGCCT 
2951 GCTTACTTTC TGCCATTGGG TTGGTTTGAT ACCACATTTA ACATTGACAT 
3001 TTAAGTGGAA AACCAAGTTA TCATTGTCTT TTCTAAGCTC AGTGTGGATG 
3051 ATTGCATTAC TTCATTCACT GAAGTTTTTG CCCAAAAATT GGAAGGTAAA 
3101 CAGAGAGCTA TGTTTCTGTA TCTTTTGGTT ATAGAGTGTT CACTTCTTTA 
3151 TCATAACAAA ATTCTAGTGT TTATACGAAC ACCCAGAGGC AAAAGAATTT 
3201 GGCTTAATTC TCACTCCAGG TAAGTAGCTT AACTTCTGGG CTTCAGTTTT 
3251 CTCATCTGTA AAATCAGGAA GATTGGACTA AGTGATCCTG AAATGTATTT 
3301 TTTAGCACTG GATTTCTACA AATAATAAAA CTTTCCCATC TAGATAATGA 
3351 TGATCACATA GTCTTGATGT ACGGACATTA AAAGCCAGAT TTCTTCATTC 
3401 AATTCTGTTA TCTCTGTTTT ACTCTTTGAA ATTGATCAAG CCACTGAATC 
3451 ACTTTGCATT TCAGTTTATA TATAGAGAGA GAAAGAAGGC TGTCTGCTCT 
3501 TACATTATTG TGGAGCCCTG TGATAGAAAT ATGTAAAATC TCATATTATT 
3551 TTTTTTTTAA TTTTTTTATT TTTTATGACA GGGTCTCACT ATGTCACCCT 
3601 GGCTGGAGTG CAGTAGTGCG ATCGCGGCAC ACTGCAGCCT TGGCTTCCCT 
3651 GGGCTCAAGC AGTCCTCCCA CCTCAGTCTC CCAAATAGCT AGGACTACAG 
3701 GCGTGCGTGA CCAAGCCCAG CTAATTTTTG CATTTTTTGT AGAGATGGGG 
3751 TTTTGCCATG TTGCTCAGGC TGGTCTCAAA CTCCTGAGCA CTAGCAATCC 
3801 ACCCACCTCT GTTTCCAAAA AAAAAAAAAA AATGAAAGGT CAACCCCTAT 
3851 GCAAATTACC ACAGCAAAGG TTTCATTCAG GAGATTCTTC CATCTGGGCA 
3901 ACCTGGTTTT CCAAATATCA TTTGACCTAA GTGAATGTTG ATACTAGCTA 

3 951 AAGATTGGGT AAATTGGTTG AATTATTGTA TTGAAGCTTG AGCTGTAGCT 
4001 AAAAGTAATT TAGGTTTCCC CTAAGATGTT ATTATGTTAG GGACATAACA 

4 051 CTTTTGGGAG GTTGTTGTGG GAGATGGTTG ATTTAGGTTT TCAAAAGCTA 
4101 GAAATAAAAT TTACATGCCT TAGATTTCAT AAAATTCTGC TCTAATTGGG 
4151 TGGAAGGTGC TGTATCTAAC TTGTGTTCCT CCTAAGGTTA TGTCCTAATA 
4201 ACTATTCTTT TAGGAGTATA CTTCTACTTT ATAGAAGGTT GCTTTTCTTT 
4251 TTAATTTTTT CTAACAAAGA AAAGAATAAA GTATTTATTA ATAAGAACCA 
4 301 GAAAGCACTT GAAACTGATG TTTTTAATGG CTCATTTAGG GTAGATTTAT 
4 351 TTATCTCATT AACTTAAAAC AGCTATGTGT ATGAAATAGG TCACAACAGA 
4 401 ACTTGAACAC CAGGTTGGTG TCTGAGCAAT CCCTTTCTTA TGGGAAAAAC 
4451 AATGTTCTTG TTTGAACAGA GGGTATCATT GCAGTCAGTA TTCACGTGTA 
4 501 TATTGTTATA TAAGTTGTAT AATATGCTTG TAAAGGCTGA GGGTGAGCTG 
4 551 TATCTGGATG CCTTTTTACA ATTTGATTTT AACTTTTAAA ATAAATTTAA 
4 SOI AACATAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 95 bp to 2 659 bp; peptide length: 855 
Category: similarity to known protein 
Classification: Cell division 



1 MDEEEDNLSL 
51 DGESYTEEAD 
101 LPAPAPRREK 
151 PEKSPRPPLK 
201 SSSSRMTSAP 
251 GLRLRRPRVS 
301 KKVTPQSVNS 
351 GILNANPMKP 
401 QTVNLRDCEY 
451 ERLCQDGFYY 
501 TROKLGIPQK 
551 AIKSISASAL 
601 QPPAQPPRTG 
651 SALAEAKKLA 
701 TMFSSQAEDE 
7 51 MQERYFEPLV 
801 HEYHWHDGVK 
851 LRTNF 



LTALLEENES 
DGETGETRDE 
TNEELQEELR 
ERRVQRIQES 
SQPLQTISRN 
STEMNKKMTG 
GKTFS IWKLN 
KDGSEEVCLS 
CQYHVQAQYK 
GGVSSASYAA 
SLSCSEEFKE 
LKQQKQRMLE 
SEFPRLEGAP 
AITKLRAKGQ 
LEPARKKRRE 
KKEQMEEKMR 
RFFKCPCGNR 



ALDCNSEENN 
KENLATLFGD 
NLQEQMKALQ 
TCFSAELDVP 
KPSGITRGQT 
RKLIRLSQIK 
DLRDLTQCVS 
IDHPQKVLIM 
KLSAKRADLQ 
SIAAAVAPKK 
LMDLPTCGAR 
MRRRKSEEIQ 
ATMTPKLGRG 
VLTKTNPNSI 
QLAYLESEEF 
NIREVKCRVV 
SISLDRLPNK 



FLTRENGEPD 
MEDLTDEEEV 
EQLKVTTIKQ 
ALPRTKRVAR 
VGTPGSSGET 
EKMAREKLEE 
LFLFGEVHKA 
GEALDLGTCK 
STFSGGRIPK 
KIQTTLSNLV 
NLKQHLAKAS 
KRFLQSSSEV 
VLEGDDVLFY 
KKKQKDPQDI 
QKILKAKSKH 
TCKTCAYTHF 
HCSNCGLYKW 



AFDELFDADG 
PASQSTENRV 
TASPARLQKS 
TPKPSPPDPK 
TQPICVEAFS 
IDWVTFGVIL 
LWKTEQGTVV 
AKKKNGEPCT 
KFARRGTSLK 
VKGTNLIIQE 
ASGIMGSPKP 
ESPAVPSSSR 
DESPPPRPKL 
LEVKERVEKN 
TGILKEAEAE 
KLLETCVSEQ 
ERDGMLKVCH 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl5 , frame 2 

TREMBLNEW: SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10" ; product: "cell 
division cycle protein 23"; S.pombe chromosome II cosmid C1347., N = 
2, Score = 284, P = 7e-21 

PIR:S48384 DNA4 3 protein - yeast (Saccharomyces cerevisiae) , N = 2, 
Score = 203, P = 7e-12 

TREMBL:SCDNA52A_1 gene: "DNA52"; Saccharomyces cerevisiae DNA52 gene, 
complete cds . , N = 2, Score = 201, P = 7.9e-12 

TREMBLNEW:AC006234_6 gene: "F5H14.6"; Arabidopsis thaliana chromosome 
II BAC F5H14 genomic sequence, complete sequence., N = 2 , Score = 211, 
P = 1.7e-15 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae), N = 2, 
Score = 203, P = 7.2e-12 



>TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid cl347. 
Length = 593 

HSPs: 



Score 


= 284 


(42.6 bits), Expect = 7.0e-21, Sum P(2) - 7.0e-21 




Identities - 


= 97/383 (25%), Positives = 186/383 (48%) 




Query: 


109 


EKTNEELQEELRNLQEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQ 


168 






E+ + +L+E + LQ Q+ +QE+ ++ + ++ AS + + PR P ++ RV + 




Sbjct: 


8 


EENDLDLEE--KRLQRQLNEIQEKKRLRSAQKEASSENAEVI--QVPRSPPQQVRVLTVS 


63 


Query : 


169 


ESTCFSAB LDVPALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQP LQTIS 


218 






+ + L + K V+ P P PK R+ A +Q L+T+ 




Sbjct: 


54 


SPSKLKSPKRLILGIDKGKTGKDVSLGKGPRGPLPKPFHERLAEARNQERKRSDKLKTMK 


123 


Query: 


219 


RNKPSGITRGQI VGTPGSSGETTQPI-C — VEAFSGLRLRRPRVSSTEMNKKMTGRKLIR 


275 






+N+ R + + G S E P+ C ++ +S + +S + + G ++ 




Sbjct: 


124 


KNRKQS FQRKRNILEDGKSEEEKFPMKCDEIDPYSRQAIVIRYISDEVAKENIGGNQVYL 


183 


Query: 


276 


LSQIKEKMAREKLE — EI D-WVTFGVILKKV-TPQSVNSGKTFS IWKLNDLRDLTQCVSL 


331 






+ Q+ + + K E E+D +V G++ T ++VN K + + L DL+ +C 




Sbjct: 


184 


IHQLLKLVFAPKFEAPEVDNYVVMGIVASNSGTRETVNGNK-YCMLTLTDLKWQLEC 


239 


Query: 


332 


FLFGEVHKALWKTEQGTVVGILNANPMKPKDGS-EEVCLSIDHPQKVLI-MGEALDLGTC 


389 






FLFG+ + WK + GTV+ +LN +KPK+ L +D VL+ +G + LG C 




Sbjct: 


240 


FLFGKAFERYWKIQSGTVI ALLNPEVLKPKNPDIGRFSLKLDSEYDVLLEIGRSKHLGYC 


299 


Query: 


390 


KAKKKNGEPCTQTVNLRDCEYCQYHVQAQYKKLSAKRADLQSTFSGGRIPKKFARRGTSL 


449 






+++K+GE C ++ R + C+YHV ++ + R + S+ + P+ ARR 




Sbjct: 


300 


SSRRKSGELCKHWLDKRAGDVCEYHVDLAVQRSMSTRTEFASSMATMHEPR--ARR 


353 


Query: 


450 


KERLCQDGF— YYGGVSSASYAASIAAAVAPKKKIQT 4 84 








++R GF Y+ G ++ ++A + +QT 




Sbjct: 


354 


EKRFRGQGFQGYFAGEKYSAIPNAVAGLYDAEDAVQT 390 




Score 


= 41 


(6.2 bits), Expect = 7.0e-21, Sum P(2) = 7.0e-21 




Identities = 


= 12/43 (27%), Positives = 17/43 (39%) 




Query: 


453 


LCQDGFY YGGVSSAS YAAS I AAAVAPKKKI QTTLSNLWKGTN 495 








L +D S AS A++ K + SN + GTN 




Sbjct: 


465 


LSKDSEIDSSTKKPSVLASFNASIMNPKSSLPSFSNSAILGTN 507 




Score 


= 40 


(6.0 bits), Expect = 8.9e-21, Sum P(2) = 8.9e-21 





Identities = 13/26 (50%), Positives = 18/26 (69%) 

Query: 536 LAKASASGIMGSPKPAIKSISASALL 561 

LA +AS IM +PK ++ S S SA+L 
Sbjct: 481 LASFNAS-IM-NPKSSLPSFSNSAIL 504 



Pedant information for DKFZphtes3_2hl5, frame 2 



Report for DKFZphtes3_2hl5 . 2 



800 



12/13/10, EAST Version: 2.4.2.1 
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[LENGTH] 855 

[MW] 96135.01 

[pi] 8.96 

[HOMOL] TREMBLNEW: SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10"; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid C1347. 5e-16 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YIL150c] le-11 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YIL150c] le-11 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YIL150c] le-11 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 12.05 % 

[KW] COILED_COIL 4.21 % 

SEQ MDEEEDNLSLLTALLEENESALDCNSEENNFLTRENGEPDAFDELFDADGDGESYTEEAD 

SEG xxxxx 

PRD cccchhhhhhhhhhhhhhhhccccccccceeeeccccccccceeeecccccccceeeeec 

COILS 

SEQ DGETGETRDEKENLATLFGDMEDLTDEEEVPASQSTENRVLPAPAPRREKTNEELQEELR 

SEG xxxxxxxxxxxx xxxxxxxxx 

PRD cccccccccccchhhhhhcccccccceeeccccccccccccccccccchhhhhhhhhhhh 

COILS CCCCCCCCCCCCCC 

SEQ NLQEQMKALQEQLKVTT I KQTASPARLQKS PEKS PRPPLKERRVQRI QESTC FSAELDVP 

SEG xxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccceeeeecccccccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCC 

SEQ ALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQPLQTISRNKPSGITRGQI VGTPGSSGET 

SEG xxxxxxxxxxxxx 

PRD cccccceeeecccccccccccchhhhhhhccccchhhhhhccccccceeeeecccccccc 

COILS 

SEQ TQPICVEAFSGLRLRRPRVSSTEMNKKMTGRKLIRLSQIK3KMAREKLEEIDWVTFGVIL 

SEG 

PRD cccccccccchhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeeeee 

COILS 

SEQ KKVTPQSVNSGKTFSIWKLNDLRDLTQCVSLFLFGEVHKALWKTEQGTVVGILNANPMKP 

SEG 

PRD cccccccccccceeeeeeeccchhhhhhheeeeecchhhhhhhhccceeeeecccccccc 

COILS 

SEQ KDGSEEVCLSIDHPQKVLIMGEALDLGTCKAKKKNGEPCTQTVNLRDCEYCQYHVQAQYK 

SEG 

PRD ccccceeeeecccccceeeccccccccccccccccccccceeecccccccchhhhhhhhh 

COILS 

SEQ KLSAKRADLQSTFSGGRIPKKFARRGTSLKERLCQDGFYYGGVSSASYAASIAAAVAPKK 

SEG xxxxxxxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhhhccccccccccccccchhhhhhhccccccccccchhhhhhhhhhhhcch 

COILS 

SEQ KIQTTLSNLVVKGTNLIIQETRQKLGIPQKSLSCSEEFKELMDLPTCGARNLKQHLAKAS 

SEG 

PRD hhhhhhheeecccceeeehhhhhhhcccccccchhhhhhhhhhccccccchhhhhhhhhh 

COILS 

SEQ ASGIMGSPKPAIKSISASALLKQQKQRMLEMRRRKSEEIQKRFLQSSSEVESPAVPSSSR 

SEG xxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 

COILS 

SEQ QPPAQPPRTGSEFPRLEGAPATMTPKLGRGVLEGDDVLFYDESPPPRPKLSALAEAKKLA 

SEG xxxxxxxx xxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeeccccccchhhhhhhhhhhhh 

COILS 

SEQ AITKLRAKGQVLTKTNPNSIKKKQKDPQDILEVKERVEKNTMFSSQAEDELEPARKKRRE 

SEG xxxxx 

PRD hhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhh 

COILS 

SEQ QLAYLESEEFQKILKAKSKHTGILKEAEAEMQERYFEPLVKKEQMEEKMRNIREVKCRVV 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheee 

COILS 

SEQ TCKTCAYTHFKLLETCVSEQHEYHWHDGVKRFFKCPCGNRSISLDRLPNKHCSNCGLYKW 
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SEG 

PRD eeecceeeeeeecccceeeccccccccceeeeeecccccccccccccccccccccceeec 

COILS 

SEQ ERDGMLKVCHLRTNF 

SEG 

PRD ccccccccccccccc 

COILS 



(No Prosite data available for DKFZphtes3_2hl5.2) 
(No Pfam data available for DKFZphtes3_2hl5 . 2) 
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DKFZphtes3_2i5 



group: testes derived 

DKFZphtes3_2i5 encodes a novel 151 amino acid protein with weak similarity to. C.elegans 
cosmid F20D12.3 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to C.elegans F20D12.3 

many ATGs in front of the start of the ORF, 
unspliced intron in 5' region? 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2142 bp 

Poly A stretch at pos. 2121, polyadenylation signal at pos . 2102 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 



GCAGTAAATA 
CTGTAATAAA 
TGAGCAATCC 
TTTCCGTTAT 
ACCTGCAGCA 
GGCTTTAAGT 
TCCCCTCACC 
TCTTACCCAC 
GTAGCACCTT 
GCTGGGCTTT 
GCCCCTGCCT 
AAGGCCTCAT 
TGGTGTTACT 
TATATCTGTG 
ATAGAATGGA 
GAAGAGCTGT 
TTGCATATAA 
TGTGGGGCCA 
AGAAAAGGGA 
GTATAATTCT 
GAGAACTCTT 
AAAAAAAAGA 
ATTTTCAAGG 
CACTCACATT 
ATGGCGGCCA 
AATACTGATG 
ATTTTTTGCT 
TTGAGGAATT 
CATCAGAAGC 
TTTGCTGGTC 
TGAAGAGTCG 
GGATATAAAA 
AGCAGTAAAT 
CAAAGAACCA 
ATCAACACAC 
GAGGAAAATA 
CTATGCTGGT 
ATGAAAATAA 
TGAAGATAGA 
AATAATACTA 
GCTTCAGAGT 
TCTTGAAATT 
TCATTAAAAT 



TGATATGAAA 
AATATTGCTA 
TCAGTTATTG 
GGTCTTAGTG 
AGATGAAAAG 
GCTACGTTAA 
AGTGTGAGCC 
CTTATTGCGG 
GTACATTTGA 
GGAATTCTTG 
GGCCTGTGGA 
GGGTGGTTCT 
GCTATTAACA 
GTGTTTGCCC 
ATGGTTTTAA 
TGTGTAACAG 
GACTATGTCT 
GGCATAATTA 
ATGTGACTAG 

CCCTCAAGAA 
ACGTCTGTAC 
TTGTTGTATG 
CAGAATGCTC 
CCTGCATATA 
ATATTGATTT 
ATTGAAGACC 
ACGAAAGGTG 
TCAGTGCTGA 
GGAGCTGAGG 
TTATATGGAA 
TTCGCTGTAA 
CAAGCAATTC 
GGTGATCACT 
TGTTCAAAAT 
CAGGTCATGA 
TTGCTTTGGA 
CCTTGGTGAG 
CTTGCTTTCT 
GGTTTTTATG 
CCAATTTTGC 
GCAATTGAGT 
GTAAAGATGT 



GAATTCTCTA 
AAATACCTTC 
GTGAATTCTT 
TGGTTGTCCT 
AGAGTGGGAC 
CTCATTAAAT 
TCAGTTTTCT 
GGGCCCGAGG 
AAGGACTAAT 
GTGGGACTTT 
ACCACAGACT 
CATGTGGAAC 
TTAAAACTTA 
ATGTATACTT 
GCACGCTACA 
AATCAGCATC 
AAGTAGAAGA 
TGAATGTTAC 
TGTTTTAGTA 
TCTCAACAAA 
TGGCTGTACC 
CTTAAAAATG 
GCTCGGTCAG 
CATTTCAAGT 
AAAATAAAAC 
GGCTGGTGAT 
TTCAAGTAGA 
CTAGTTAAGG 
TATGGCTGAT 
ATGCTCGTCT 
CTCTATGACC 
CAATCACACA 
AAAGAGCAGG 
GCTTGTCGGG 
CATGCGAGTG 
AGTTCCTGGC 
TCACACCCTG 
TTGTACAAAT 
ATTTTTAAAT 
GAATAGGATG 
TTATTTATGG 
TTGTACTTTT 
TAAAAAAAAA 



ACTTGGGGGT 
TCTCACTTTG 
ACCAGTGTTT 
GGTGTAGTAT 
TTGGAGCTAA 
TCTTAGTGAT 
TATCTAATAA 
ATTACATGAT 
ACCAGTGGAC 
TTAATCATGT 
CTATAGGTGG 
CTGTGTTGCA 
TATTTTCCTT 
CATTTTACAT 
TTGTCCAGGT 
ATACCTGAAT 
TGCTATGAAA 
TTAAGAGCAT 
TTTTCTTGGT 
GCAGTAAAAC 
TTCATATCTA 
GAGGTCATTT 
AACTTTCTGT 
GTGTTTCACA 
TTAGTGGAGA 
ATCATCCAGT 
AGCGGATTTT 
TGGATGAATA 
CATTCTAATT 
GATGAGGGAC 
TTAATAGAGA 
GAGCTGTTGG 
TCGTCTGCGG 
ATGCAATTCG 
GGGACAGCTT 
AAAGATTTTC 
GTGAACCCCG 
TAAAGACAAA 
CAGTAGTAGT 
AATGCTTTTG 
TATATAAATA 
CAAATAGATT 
AAAAAAAAAA 



GGCTTGTAAC 
AAAAAGCATC 
AATTCCTCTC 
TTCAAGAGGA 
GAACGTTTTT 
CTTGGGGAAG 
GTAAGGATAA 
TGGTGTAACA 
TTTAACCTTG 
AGATTCTCAG 
GCCCTTCCAG 
AGCCACTGCA 
ATTGTGTGGA 
TTCTTAAAGA 
TATACCCACA 
CATTTGTACA 
TCATGTCTGC 
AGGTGAGGTG 
GTGGGATGAA 
TAGAAAGAAG 
GAGGCACATT 
CATTGTGTTC 
TACCAGAAGA 
TCTTTACGGA 
GATCACTATA 
CAATGGCATC 
CCTGTCTATT 
TCATTCAGTG 
TGATCCGAAG 
ATGAAAACAA 
CTTGCTAAAT 
GAAACCTCAA 
GTTGGAAAAC 
AAGCAATAAC 
CTTCCTAGGT 
TGTTAAAAAC 
GGTGCTAAGA 
GAACTACATG 
ACTGTTGCTG 
AAGTATTAGG 
CATATTTTTT 
ATCTACTTTT 
AA 



No BLAST result 



BLAST Results 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 1293 bp to 1745 bp; peptide length: 151 
Category: similarity to unknown protein 
Classification: no clue 

1 MASFFAIEDL QVEADFPVYF EELRKVLVKV DEYHSVHQKL SADMADHSNL 

51 IRSLLVGAED ARLMRDMKTM KSRYMELYDL NRDLLNGYKI RCNNHTELLG 

101 NLKAVNQAIQ RAGRLRVGKP KNQVITACRD AIRSNNINTL FKIMRVGTAS 
151 S 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2i5, frame 3 

TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid 
F20D12., N - 1, Score = 173, P = 4.5e-12 



>TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 
Length = 699 

HSPs: 

Score = 173 (26.0 bits), Expect = 4.5e-12, P = 4.5e-12 
Identities = 33/130 (25%), Positives - 72/130 (55%) 

Query: 20 FEELRKVLVKVDEYHSVHQKLSADMADHSNLIRSLLVGAEDARLMRDMKTMKSRYMELYD 79 

F+E ++L ++D V +L+A++ + ++ +++ AED+ + ++ + Y+ L 
Sbjct: 569 FKEADEILEEIDPMTEVRDRLTAELQERQAAVKEIIIRAEDSIAIDNIPDARKFYIRLKA 628 

Query: 80 LNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKPKNQVITACRDAIRSNNINT 139 

+ ++R NN + +L+ +N+ 1+ RLRVG+P Q++ +CR AI +N 

Sbjct: 629 NDAAARQAAQLRWNNQERCVKSLRRLNKIIENCSRLRVGEPGRQI VVSCRSAIADDNKQI 688 

Query: 140 LFKIMRVGTA 149 

+ KI++ G + 
Sbjct: 689 ITKILQYGAS 698 



Pedant information for DKFZphtes3_2i5, frame 3 



Report for DKFZphtes3_2i5 . 3 



[LENGTH] 151 

[MW] 17304.07 

[pi] 9.33 

[HOMOL] TREMBL : CEF20D12_1 gene: "F20D12.3"; Caenorhabditis elegans cosmid F20D12. 2e 

[KW] Alpha_Beta 



SEQ MAS FFAIEDLQVEADFPVYFEELRKVLVKVDEYHSVHQKLSADMADHSNL IRSLLVGAED 

PRD ccceeeehhhhhhccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ ARLMRDMKTMKSRYMELYDLNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKP 

PRD hhhhhccchhhhhheeeccchhhhhhheeeeeccchhhhhhhhhhhhhhhhhcccccccc 

SEQ KNQVITACRDAIRSNNINTLFKIMRVGTASS 

PRD cceeeeehhhhhhcccceeeeccceeecccc 



(No Prosite data available for DKFZphtes3_2i5.3) 
(No Pfam data available for DKFZphtes3_2i5 . 3 ) 
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DKFZphtes3_2119 



group: testes derived 

DKFZphtes3_2119 encodes a novel 166 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, no EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1079 bp 

Poly A stretch at pos. 1053, polyadenylation signal at pos . 1038 



1 CCACAGGACA CACTGTTCCC AGGGCACAGA CACCCTGGGC TTTGGTTGGG 

51 TCTTGGCCTC CAGGTAGGGC CCTGTTGGGC AGCGGGCAGC AACTCCTGAG 

101 ACACTACTGT GATTCTTGGT GGTGGCTGTG GTAAAAAACC TGCAGGGCTA 

151 GAGTTTGGGG TGAGATTCAG CAGTAACTGT GGCCTCTCCT AGTGACAGTA 

201 TGTCACTCCC ACTCCCAGCA CGCATGCCCA CAGGCCACGG CCTCCACATC 

251 ACAAACCCCC CACCAAGTTG CCCATCTATG GAGCAGCTCC CATACGGCAG 

301 GGTCAGGCTC TTACCTCCAC CTCCAGGGCA CAGACAGGGG GAGCTCTGTC 

351 TCACTGTAAG GCAATGAGGA GAGTTGAGGG CCCAGACCAG GCTAGGGGCC 

4 01 ATCCCCTTTC CCGAGCAGGC CTCAGGGAAG GACCAGCCCC ATTCCCATCT 

451 GACCTAGGTC TTAGCCCAGG AGCCTGCATA GGGAAGAAAG GACAGACAGG 

501 GCCTCCTTAC TGGCTGACAC TCAGGAGGGG CTGGGGCAAG AGAGCAGAGG 

551 GAGCGCAGGG CCAGGCAGGG GCTGCTGAGG ATCCATGGGA GCTCAGGGTG 

601 CACAAGGGGG CTGCCCTTCC TGGGCTGCAG GCAGCATCCC TATGGGAGCT 

651 GAGAAAGTCC AATCCTGAGA TGGGACAGTG CTGCCCAGGG GTGTGTGGCT 

701 GGGCCCTGAC AACAGTCTCC CCAAAAGTGA CCACATCACC AGGCTCAGTT 

7 51 CCAGGAAGGC TGAGAAGTGC CCAGTACACT GAGGATGCAC CTCAGTTACA 

801 TAAAATAAAT GAAACTGGAG TACTAACGTA CAGTTTAAAG GTTATAGTTA 

851 CTATTTTTAT ATGATATACT AGTAATTTTT GAATAGGGTA AACTTTAGGT 

901 GTTTTGACAC CAAAAGAAAA CTACATGAGT TCATGCATGT GTTAAATTGC 

951 TTTACTGTAG TAATCATTTA CATGTATATG TATATATGAA TATAATTATG 

1001 GGCTCATTAA ATTTAAATAT TATAAATAGG TGACAAAGAA TAAAGTTAAC 

1051 TGGAAAAAAA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 364 bp to 861 bp; peptide length: 166 
Category: putative protein 
Classification: no clue 

1 MRRVEGPDQA RGHPLSRAGL REGPAPFPSD LGLSPGACIG KKGQTGPPYW 
51 LTLRRGWGKR AEGAQGQAGA AEDPWELRVH KGAALPGLQA ASLWELRKSN 
101 PEMGQCCPGV CGWALTTVSP KVTTSPGSVP GRLRSAQYTE DAPQLHKINE 
151 TGVLTYSLKV IVTIFI 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_2119, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2119, frame 1 



Report for DKFZphtes3_2119 . 1 



166 

17691.35 
9.54 

All_Beta 

LOW COMPLEXITY 7.23 % 



SEQ MRRVEGPDQARGHPLSRAGLREGPAPFPSDLGLSPGACIGKKGQTGPPYWLTLRRGWGKR 

SEG 

PRD cccccccoccccccccccccccccccccccccccccceeeccccccccceeeeecccccc 

SEQ AEGAQGQAGAAEDPWELRVHKGAALPGLQAASLWELRKSNPEMGQCCPGVCGWALTTVSP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccceeeeccccccccchhhhhhhhhhcccccccccccccceeeeeccc 

SEQ KVTTSPGSVPGRLRSAQYTEDAPQLHKINETGVLTYSLKVIVTI FI 

SEG 

PRD ccccccccccccccccccccccccceeeccccceeeehhhhhhccc 



(No Prosite data available for DKFZphtes3_2119 . 1 ) 
(No Pfam data available for DKFZphtes3_2119. 1 ) 



[LENGTH] 

[MW] 

[pi] 

[KW] 

[KW] 
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DKFZphtes3_2mlB 



group: nucleic acid management 

DKFZphtes3_2ml3 encodes a novel amino acid protein, with similarity to mouse Dhml . 

The protein seems to play a role in nucleotide metabolism, RNA metabolism, but also in DNA 
repair and cell cycle. The yeast homologue is a DNA strand exchange protein required for 
sporulation and homologous recombination. 

The novel protein can find application as multifunctional nuclease / exoribonuclease . 



nearly identical to mouse Dhml 

complete cDNA, complete cds, start at Bp 42, EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3022 bp 

Poly A stretch at pos . 3004, polyadenylation signal at pos . 2981 



1 CTCGTCAGCC GGTCGGCCGC CGCCTCCAGC CGTGTGCCGC TATGGGAGTC 

51 CCGGCGTTCT TCCGCTGGCT CAGCCGCAAG TACCCGTCCA TCATAGTCAA 

101 CTGCGTGGAA GAGAAGCCAA AAGAATGCAA TGGTGTAAAG ATTCCAGTTG 

151 ATGCCAGTAA ACCTAATCCA AATGATGTGG AGTTTGATAA TCTGTATTTG 

201 GATATGAATG GAATCATCCA TCCCTGTACT CATCCTGAAG ACAAACCAGC 

251 ACCAAAAAAT GAAGATGAAA TGATGGTTGC AATTTTTGAG TACATTGACA 

301 GACTTTTCAG TATTGTAAGA CCAAGAAGAC TTCTCTACAT GGCAATAGAT 

351 GGAGTGGCAC CACGTGCTAA AATGAACCAG CAGCGTTCAA GGAGGTTCAG 

401 GGCATCAAAA GAAGGAATGG AAGCAGCAGT CGAGAAGCAG CGAGTCAGGG 

4 51 AAGAAATATT GGCAAAAGGT GGCTTTCTTC CTCCAGAAGA AATAAAAGAA 

501 AGATTTGACA GCAACTGTAT TACACCAGGA ACTGAATTCA TGGACAATCT 

551 TGCTAAATGC CTTCGCTATT ACATAGCTGA TCGTTTAAAT AATGACCCTG 

601 GGTGGAAAAA TTTGACAGTT ATTTTATCTG ATGCTAGTGC TCCTGGTGAA 

651 GGAGAACATA AAATCATGGA TTACATTAGA AGGCAAAGAG CCCAGCCTAA 

701 CCATGACCCA AATACTCATC ATTGTTTATG TGGAGCAGAT GCTGATCTCA 

7 51 TTATGCTTGG CCTTGCCACA CATGAACCGA ACTTTACCAT TATTAGAGAA 

801 GAATTCAAAC CAAACAAGCC CAAACCATGT GGTCTTTGTA ATCAGTTTGG 

851 ACATGAGGTC AAAGATTGTG AAGGTTTGCC AAGAGAAAAG AAGGGAAAGC 

901 ATGATGAACT TGCCGATAGT CTTCCTTGTG CAGAAGGAGA GTTTATCTTC 

951 CTTCGGCTTA ATGTTCTTCG TGAGTATTTG GAAAGAGAAC TCACAATGGC 

1001 CAGCCTACCA TTCACATTTG ATGTTGAGAG GAGCATTGAT GACTGGGTTT 

1051 TCATGTGCTT CTTTGTGGGA AATGACTTCC TCCCTCATTT GCCATCGTTA 

1101 GAGATTAGGG AAAATGCAAT TGACCGTTTG GTTAACATAT ACAAAAATGT 

1151 GGTACACAAA ACTGGGGGTT ACCTTACAGA AAGTGGTTAT GTCAATCTGC 

1201 AAAGAGTACA GATGATCATG TTAGCAGTTG GTGAAGTTGA GGATAGCATT 

1251 TTTAAAAAGA GAAAGGATGA TGAGGACAGT TTTAGAAGAC GACAGAAAGA 

1301 AAAAAGAAAG AGAATGAAGA GAGATCAACC AGCTTTCACT CCTAGTGGAA 

1351 TATTAACTCC TCATGCCTTG GGTTCAAGAA ATTCACCAGG TTCTCAAGTA 

1401 GCCAGTAATC CGAGACAAGC AGCCTATGAA ATGAGGATGC AGAATAACTC 

1451 TAGTCCTTCG ATATCTCCTA ATACGAGTTT CACATCTGAT GGCTCCCCGT 

1501 CTCCATTAGG AGGAATTAAG CGAAAAGCAG AAGACAGTGA CAGTGAACCT 

15 51 GAGCCAGAGG ATAATGTCAG GTTATGGGAA GCTGGCTGGA AGCAGCGGTA 

1601 CTACAAGAAC AAATTTGATG TGGATGCAGC T GAT GAG AAA TTCCGTCGGA 

1651 AAGTTGTGCA GTCGTACGTT GAAGGACTTT GCTGGGTTCT TAGATATTAT 

1701 TACCAGGGCT GTGCTTCCTG GAAGTGGTAT TATCCATTTC ATTATGCACC 

1751 ATTTGCTTCA GACTTTGAAG GCATTGCAGA CATGCCATCT GATTTTGAGA 

1801 AGGGTACGAA ACCGTTTAAA CCACTAGAAC AACTTATGGG GGTATTTCCA 

1851 GCTGCAAGTG GTAATTTTCT ACCTCCATCA TGGCGGAAGC TCATGAGTGA 

1901 TCCTGATTCT AGTATAATTG ACTTCTATCC TGAAGATTTT GCTATTGATT 

1951 TGAATGGGAA GAAATATGCA TGGCAAGGTG TTGCTCTCTT GCCATTCGTG 

2001 GATGAGCGAA GGCTACGAGC TGCCCTAGAA GAGGTATACC CAGACCTCAC 

2051 TCCAGAAGAG ACCAGAAGAA ACAGCCTTGG AGGTGATGTC TTATTTGTGG 

2101 GGAAACATCA CCCACTCCAT GACTTCATTT TAGAGCTGTA CCAGACAGGT 

2151 TCCACAGAGC CAGTGGAGGT ACCCCCTGAA CTATGTCATG GGATTCAAGG 

2201 AAAGTTTTCT TTGGATGAAG AAGCCATTCT TCCAGATCAA ATAGTATGTT 

2251 CTCCTGTTCC TATGTTAAGG GATCTGACAC AGAACACTGT AGTCAGTATT 

2301 AATTTTAAAG ACCCACAGTT TGCTGAAGAT TACATTTTTA AAGCTGTAAT 

2351 GCTTCCAGGA GCAAGAAAGC CAGCAGCAGT ACTGAAACCT AGTGACTGGG 

2401 AAAAATCCAG CAATGGACGG CAGTGGAAGC CTCAGCTTGG CTTTAACCGT 

24 51 GACCGGAGGC CTGTGCACCT GGATCAGGCA GCCTTCAGGA CTTTGGGCCA 

2501 TGTGATGCCA AGAGGCTCAG GAACTGGCAT TTACAGCAAT GCTGCACCAC 

2551 CACCTGTGAC TTACCAGGGA AACTTATACA GGCCGCTTTT GAGAGGACAA 

2601 GCCCAGATTC CAAAACTTAT GTCAAATATG AGGCCCCAGG ATTCCTGGCG 

2651 AGGTCCTCCT CCCCTTTTCC AGCAGCAAAG GTTTGACAGA GGCGTTGGGG 
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2701 CTGAACCTCT GCTCCCATGG AACCGGATGC TGCAAACCCA GAATGCAGCC 

2751 TTCCAGCCAA ACCAGTACCA GATGCTAGCT GGGCCTGGTG GGTATCCACC 

2801 CAGACGAGAT GATCGTGGAG GGAGACAGGG ATATCCCAGA GAAGGAAGGA 

2851 AATACCCTTT GCCACCACCC TCAGGAAGAT ACAATTGGAA TTAAGCTTTT 

2901 GTAAAGCTTT CCCAAATCCT TTCATCATTC TACAGTTTTA TGCTATTTGT 

2951 GGAAAGATTT CCTTCTCAAG TAGTAGTTTT TAATAAAACT ACAGTACTTT 

3001 GTGTAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 

Medline entries 



95192042 : 

Characterization of cDNA encoding mouse horaolog of fission yeast dhpl+ 
gene: structural 

and functional conservation. 

97361754 : 

Cloning and characterization of mouse Dhm2 cDNA, a functional homolog 
of budding yeast 
SEP1 . 



Peptide information for frame 3 



ORF from 42 bp to 2891 bp; peptide length: 950 
Category: strong similarity to known protein 



1 MGVPAFFRWL SRKYPSIIVN CVEEKPKECN GVKIPVDASK PNPNDVEFDN 
51 LYLDMNGIIH PCTHPEDKPA PKNEDEMMVA IFEYIDRLFS IVRPRRLLYM 
101 AIDGVAPRAK MNQQRSRRFR ASKEGMEAAV EKQRVREEIL AKGGFLPPEE 
151 IKERFDSNCI TPGTEFMDNTi AKCLRYYI AD RLNNDPGWKN LTV I LS DAS A 
201 PGEGEHKIMD YIRRQRAQPN HDPNTHHCLC GADADLIMLG LATHEPNFTI 
251 IREEFKPNKP KPCGLCNQFG HEVKDCEGLP REKKGKHDEL ADSLPCAEGE 
301 FIFLRLNVLR EYLERELTMA SLPFTFDVER SIDDWVFMCF FVGNDFLPHL 
351 PSLEIRENAI DRLVNIYKNV VHKTGGYLTE SGYVNLQRVQ MIMLAVGEVE 
401 DSIFKKRKDD EDSFRRRQKE KRKRMKRDQP AFTPSGILTP HALGSRNSPG 
451 SQVASNPRQA AYEMRMQNNS SPSISPNTSF TSDGSPSPLG GIKRKAEDSD 
501 SEPEPEDNVR LWEAGWKQRY YKMKFDVDAA DEKFRRKVVQ SYVEGLCWVL 
551 RYYYQGCASW KWYYPFH YAP FASDFEGIAD MPSDFEKGTK PFKPLEQLMG 
601 VFPAASGNFL PPSWRKLMSD PDSSIIDFYP EDFAIDLNGK KYAWQGVALL 
651 PFVDERRLRA ALEEVYPDLT PEETRRNSLG GDVLFVGKHH PLHDFILELY 
701 QTGSTEPVEV PPELCHGIQG KFSLDEEAIL PDQIVCSPVP MLRDLTQNTV 
751 VSINFKDPQF AEDYIFKAVM LPGARKPAAV LKPSDWEKSS NGRQWKPQLG 
801 FNRDRRPVHL DQAAFRTLGH VMPRGSGTGI YSNAAPPPVT YQGNLYRPLL 
851 RGQAQIPKLM SNMRPQDSWR GPPPLFQQQR FDRGVGAEPL LPWNRMLQTQ 
901 NAAFQPNQYQ MLAGPGGYPP RRDDRGGRQG YPREGRKYPL PPPSGRYNWN 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_2ml8, frame 3 

PIR: 149635 mouse Dhml protein - mouse, N = 1, Score = 4765, P = 0 

PIR:S43891 dhpl protein - fission yeast (Schizosaccharorayces pombe), N 
= 3, Score = 1172, P = 2e-197 

PIR:S20126 exoribonuclease RATI (EC 3.1.11.-) - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 1146, P = 3.8e-175 

PIR:S72531 exonuclease II - fission yeast (Schizosaccharomyces pombe), 
N = 4, Score = 622, P = 4.2e-125 



>PIR:I49635 mouse Dhml protein 
Length = 947 

HSPs: 
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Score = 4765 (714.9 bits), Expect = Q.0e+00, P = 0.0e+00 
Identities = 884/930 (95%), Positives = 895/930 (96%) 



Query: 1 MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 

MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 
Sbjct: 1 MGVPAFFRWLSRKYPSIIVNCVEEKPKECNGVKIPVDASKPNPNDVEFDNLYLDMNGIIH 60 

Query: 61 PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

PCTHPEDKPAPKNEDEMMVAI FEYIDRLF+IVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 
Sbjct: 61 PCTHPEDKPAPKNEDEMMVAI FEYIDRLFNIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

Query: 121 ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

A K GMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
Sbjct: 121 AIKGGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

Query: 181 RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 240 

RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPN DPNTHHCLCGADADLIMLG 
Sbjct: 181 RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNQDPNTHHCLCGADADLIMLG 240 

Query: 241 LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

LATHEPNFTI IREEFKPNKPKPC LCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
Sbjct: 241 LATHEPNFTI IREEFKPNKPKPC ALCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

Query: 301 FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 360 

FIFLRLNVLREYLERELTMASLPF FDVERS DDW FMCFFVGNDFLPHLPSLEIRE AI 
Sbjct: 301 FIFLRLNVLREYLERELTMASLPFPFDVERSNDDWEFMCFFVGNDFLPHLPSLEIREGAI 360 

Query: 361 DRLVNIYKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 420 

DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDS I FKKRKDDEDSFRRRQKE 
Sbjct: 361 DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDS I FKKRKDDEDSFRRRQKE 420 

Query: 421 KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 480 

KRKRMKRDQPAFTPSGILTPHALGSRNSPG QVASNPRQAAYEMRMQ NSSPSISPNTSF 
SbjCt: 421 KRKRMKRDQPAFTPSGILTPHALGSRNSPGCQVASNPRQAAYEMRMQRNSSPSISPNTSF 480 

Query: 481 TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRY YKNKFDVDAADEKFRRKVVQ 540 

SDGSPSPLGGI + RKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 
SbjCt: 481 ASDGSPSPLGGIRRKAEDSDSEPEPEDNVRLWEAGWKQRY YKNKFDVDAADEKFRRKVVQ 540 

Query: 541 SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 600 

SYVEGLCWVLRYYYQGCASWKW YPFHYAPFASDFEGIADM S+FEKGTKPFKPLEQLMG 
Sbjct: 541 SYVEGLCWVLRYYYQGCASWKWLYPFHYAPFASDFEGIADMSSEFEKGTKPFKPLEQLMG 600 

Query: 601 VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 

VFPAASGNFLPP+WRKLMSOPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 
Sbjct: 601 VFPAASGNFLPPTWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 660 

Query: 661 ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLKDF ILELYQTGSTEPVEVPPELCHGIQG 720 

ALEE VYPDLT PEE RRNSLGGDVLFVGK HPL DFILELYQTGSTEPV+VPPELCHGIQG 
Sbjct: 661 ALEEVYPDLTPEENRRNSLGGDVLFVGKLHPLRDFILELYQTGSTEPVDVPPELCHGIQG 720 

Query: 721 KFSLDEEAILPDQIVCSPVPMLRDLTQNTVVSINFKDPQFAEDYIFKAVMLPGARKPAAV 780 

FSLDEEAILPDQ VCSPVPMLRDLTQNT VSINFKDPQFAEDY+FKA MLPGARKPA V 
SbjCt: 721 TFSLDEEAILPDQTVCSPVPMLRDLTQNTAVSINFKDPQFAEDYVFKAAMLPGARKPATV 780 

Query: 781 LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGIYSNAAPPPVT 840 

LKP DWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHV PRGSGT +Y+N A P 
Sbjct: 781 LKPGDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVTPRGSGTSVYTNTALLPAN 840 

Query: 841 YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 900 

YQGN YRPLLRGQAQIPKLMSNMRP+DSWRGPPPLFQQ RF+R VGAEPLLPWNRM+Q Q 
Sbjct: 841 YQGNNYRPLLRGQAQIPKLMSNMRPKDSWRGPPPLFQQHRFERSVGAEPLLPWNRMIQNQ 900 



Query: 901 NAAFQPNQYQMLAGPGGYPPRRDD-RGGRQ 929 

NAAFQPNQYQML GPGGYPPRRDD RGGRQ 
Sbjct: 901 NAAFQPNQYQMLGGPGGYPPRRDDHRGGRQ 930 



Pedant information for DKFZphtes3_2ml8, frame 3 



Report for DKFZphtes3_2ml8 . 3 



[LENGTH] 950 

[MW] 108582.68 

[pi] 7.26 

[HOMOL] EIR:I49635 mouse Dhml protein - mouse 0.0 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YOR048C] le-123 

[FUNCAT] 04.01.04 rrna processing [S. cerevisiae, YOR048c] le-123 
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[FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YOR048c] le-123 






[FUNCAT] 


01.03.16 polynucleotide degradation [S. cerevisiae, YGL173c] 


3e- 


-79 


[ FUNCAT ] 


30.03 organization 


of cytoplasm [S. cerevisiae, YGL173c] 


3e- 


-79 


[ FUNCAT ] 


03.22 cell cycle control and mitosis [S. cerevisiae, YGL173c] 


3e- 


-79 


[PIRKW] 


nucleus le-126 








[PIRKW] 


hydrolase le-122 








[PIRKW] 


exoribonuclease le- 


-122 






[PROSITE] 


MYRISTYL 








[PROSITE] 


AMIDATION 2 








[PROSITE] 


CAMP_PHOSPHO SITE 


1 






[ PROSITE] 


CK2 PHOSPHO SITE 


12 






[PROSITE] 


TYR_PHOSPHO SITE 


1 






[PROSITE] 


GLYCOSAMINOGLYCAN 


1 






[PROSITE] 


PKC PHOSPHO SITE 


8 






[PROSITE] 


ASN GLYCOS YLATION 


4 






[KW] 


TRANSMEMBRANE 1 








[KW] 


LOW COMPLEXITY 


6.21 % 







SEQ MGVPAFFRWLSRKYPSI IVNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGIIH 

SEG 

PRD cccchhhhhhhhhcceeeeeeecccccccccccccccccccccccccccceeeeccceee 

MEM 

SEQ PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSIVRPRRLLYMAIDGVAPRAKMNQQRSRRFR 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhcceeeeeeeccccchhhhhhhhhhhhh 

MEM 

SEQ ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 

SEG 

PRD hcccccccceeeeeeeccccccccchhhhhhhhhhhhccccccccccccccccccceeec 

MEM 

SEQ LATHEPNFTIIREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

SEG 

PRD ccccccccccccccccccccccceeeccccccccccccccchhhhhhhhhcccccccccc 

MEM 

SEQ FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhchhhhhhhhhhhheeeeeeccccccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ DRLVNIYKNWHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSIFKKRKDDEDSFRRRQKE 

SEG xxxxxx 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ KRKRMKRDQPAFTPSGILTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 

SEG xxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccchhhhhhhhhhhhhhhccccccccccccc 

MEM 

SEQ TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 

SEG xx xxxxxxxxxxx 

PRD ccccccccchhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 

SEQ SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 

SEG 

PRD hhhhhhheeeeeeccccccccccccccccccccccccccccccccccccccccchhhhhh 

MEM 

SEQ VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

SEG 

PRD hccccccccccccccccccccccceeeccccceeeccccccceeeeeeeeeccchhhhhh 

MEM 

SEQ ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 

SEG 

PRD hhhhhccccchhhhhhcccccceeeeeecccchhhhhhhhhcccccceeecccccccccc 

MEM 

SEQ KFSLDEEAILPDQI VCSPVPMLRDLTQNTVVSINFKDPQFAEDYIFKAVMLPGARKPAAV 

SEG 
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PRD cccccceeecccceeeccccccccccccceeeeecccccchhhhheeeccccccccccee 

MEM 

SEQ LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGI YSNAAPPPVT 

SEG 

PRD eccccccccccccccccccccccccccccchhhhhhhhhhcccccccccccccccccccc 

MEM 

SEQ YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 

SEG 

PRD cccccchhhhhcccchhhhhcccccccccccccccchhhhhccccccccccccchhhhhh 

MEM 

SEQ NAAFQPNQYQMLAGPGGYPPRRDDRGGRQGYPREGRKYPLPPPSGRYNWN 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccccceeecccccccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_2ral8 . 3 



pcnnnn 1 


1 90 


->1 94 


ASM ftT.YCflS YT.ATTON 


pnnronnm 

IT ' J ~ ' - UVJUU 1 


pcnnnn i 


247 


->251 


ASM CIT.YrOS YT.ATTON 

fiO IV VJLl 1 V^UiJ J. iJrt ± A. WLN 


PDnrooooi 




1 DO 


->4 72 


A^H f^T.Y<"Tl c I VT.aTTHM 


C L*Ww \J KJ VJ KJ X 


pcnnnn i 


All 


->4 8 1 


rtJ IN ULi 1 i. J-iilJ. J. WIN 


pnofonnm 

c l 1 — \j vj yj u x 




826 


->8 30 


flT.YrO^ AMTMO^T.Yr AM 

VJil L l„UJnlll IX \JKJlJ 1 v^rtLN 




IT O \J *-J v_J *i 


675 


->679 


CAMP PHOSPHO STTF 




PS00005 


1 


1->14 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116 


->119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


413 


->416 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


559 


->562 


PKC PHOSPHO SITE 


PDOC00005 


PSQ0005 


613 


->616 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


674 


->677 


PKC_PHOSPHO_SITE 


PDCC00005 


PS00005 


8 68 


->871 


PKC PHOSPHO SITE 


PDCC0COD5 


PS00005 


944 


->947 


PKC PHOSPHO SITE 


PDCC00005 


PS00006 


6 


3->67 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


331 


->335 


CK2 PHOSPHO SITE 


PDCC0C006 


PS00006 


499 


->503 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


501 


->505 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


541 


->545 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


573 


->577 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


583 


->587 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


619 


->623 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


624 


->628 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


670 


->674 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


723 


->727 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


784 


->788 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


659 


->667 


TYR PHOSPHO SITE 


PDOC0000 / 


PS00008 


125 


->131 


MYRISTYL 


PDOC00008 


PS00008 


375 


->381 


MYRISTYL 


PDOC00008 


PS00008 


450 


->456 


MYRISTYL 


PDOC00008 


PS00008 


600 


->606 


MYRISTYL 


PDOC00008 


PS00008 


825 


->831 


MYRISTYL 


PDOC00008 


PS00008 


829 


->835 


MYRISTYL 


PDOC0C008 


PS00008 


926 


->932 


MYRISTYL 


PDOC0C008 


PS00009 


638 


->642 


AMIDATION 


PDOC00009 


PS00009 


934 


->938 


AMIDATION 


PDOC0C009 



(No Pfam data available for DKFZphtes3_2ml8 . 3) 



811 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_2m20 



group: testes derived 

DKFZphtes3_2m20 encodes a novel 183 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



group: unknown 

DKFZphtes3_2m20 encodes a novel 

amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



unknown 

EST hits are only from testis or uterus librarys 
remaining intron in3' UTR see EST-BLAST 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1341 bp 

Poly A stretch at pos . 1320, polyadenylation signal at pos. 1300 



ACTGTTCGTG 
TCCAGCAGCC 
TTTGGCAAAA 
CCTGTG7GAG 
AAATGAGAGG 
C7GAGCTGGC 
GGTGACGAGT 
CTGAGGCTCT 
GCGGAGGAGC 
CGCTGTCTGG 
GTGTTGGGTG 
ATTTTGGACA 
AGCGACGACT 
GAAAGACATT 
AAATCACTGG 
GCTGTTGTCC 
CTAGGCCTGG 
GGGCTTCCCT 
CCTGCCCTGT 
AAAGGCCAGA 
TTCCCTCTCA 
CCAACTTGGC 
GAAAGCCTGA 
CCAGAGCATG 
CTCTGCTGGT 
CCAATGAGCA 
A 



1 GCAATCCAGG AGCTGAATGG 
51 AATACAAGCA AAAGGCCCCC 
101 TCGGGCCCCG CTGGTGTTGG 
151 CCGCCGCTAT TATACAGCTC 
201 ACAGAGAAAC TGAGGGCAGC 
251 GTCCTTCCTG GTGCTGCTCC 
301 TGGCACTCAT CCATAGCGTC 
351 ATCGTCCCGA AGACCCCGTT 
401 GCACCAGAAC ATGCAGGCTC 
451 AGCAGCCCTA CCTGGAGGCT 
501 CAGAGTACCA CCTGGGGGAT 
551 CTGGACAGGG TGGACACCTG 
601 GTTGGCCACC ATCCCTGTGC 
651 TCTGGACCAT CCCACCCCTG 
701 TTGAGTTCGT ATGAGGTTGT 
751 TGCTTTGAAC TCGGCGGTAA 
801 CTCCACTCCT GCCCTTGGGG 
851 ACACATTGCA CATCCTAAAG 
901 CAGCATGTTC CCTCTCCTGT 
951 CTTCTCGTAC CCCTTTCACT 
1001 CTGTGCCCAG GATTGATTCA 
1051 AAAGAGAGTG AAGTCTCATT 
1101 ATGAACATTT GAACCAAACA 
1151 GGCAGCTGGG ATGGTCTTTC 
1201 ATATAAGTGG TCCTAACAGA 
1251 TTTCCTGGCA TTCCATGTAG 
1301 AATAAATGTT GGCATGTTTC 



TAACTCTTCC ACAAGCGAAA 
CAAGAGGACC CCTGATATGA 
AGAAGGCTTC TGGTGAAGGA 
GCTCCTAAAG CTCCTGTTGA 
CTTCTTTGCA GTCCCGTTGG 
TGAGGGAATG CTTCCGAGAC 
CGTGGGGAGG CGGGGCTGCT 
TTTCTGGGCC ATGCACATCA 
TGTTTAGCAC CCTGGCTCAG 
CCACCGTTAT GCGCGGGACT 
TATGGACACG CCTGGAACAG 
GGCTGTGGTC ATGTTCATTG 
AGTCTCTGCG CCAGCTAGAC 
ACTCAGCCAT TCATGCTGGA 
CCATCGAATC CTCAAAGGGA 
CTGCTCCTGC ATCTAACTTG 
TGTCTGCAGC AGGCTGCTGC 
TTTGAAGAGT CTAAATAACG 
TTGCCACGGA TCCAGAGCCA 
CTTGAGGCCT GGGAGGTGAA 
ATTTTGCTTT TACTCCCAGC 
TGTCATGTGT CTTCAGTTCC 
TAGGAAACTA CCATTAGGTT 
TTGTGTCTCT TCTTTGCACC 
TTCTGGATAA TGGAGAAGCC 
AATAGGTAGA GAATATTTAA 
ATGAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 2 



ORF from 479 bp to 841 bp; peptide length: 121 
Category: questionable ORF 
Classification: no clue 

1 MRGTRCLAEY HLGDYGHAWN RCWVLDRVDT WAVVMFIDFG QLATIPVQSL 
51 RQLDSDDFWT IPPLTQPFML EKDILSSYEV VHRILKGKIT GALNSAVTAP 
101 ASNLAVVPPL LPLGCLQQAA A 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 87 bp to 635 bp; peptide length: 183 
Category: putative protein 
Classification: no clue 

1 MIQQPRAPLV LEKASGEGFG KTAAIIQLAP KAPVDLCETE KLRAAFFAVP 

51 LEMRGSFLVL LLRECFRDLS WLALIHSVRG EAGLLVTSIV PKTPFFWAMH 

101 ITEALHQNMQ ALFSTLAQAE EQQPYLEAPP LCAGLAVWQS TTWGIMDTPG 

151 TGVGCWTGWT PGLWSCSLIL DSWPPSLCSL CAS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2m20, frame 2 



Report for DKFZphtes3_2m20 . 2 



[LENGTH] 121 

[MW] 13436.69 

[pi] 5.81 

[KW] Alpha_Beta 



SEQ MRGTRCLAEYHLGDYGHAWNRCWVLDRVDTWAVVMFIDFGQLATIPVQSLRQLDSDDFWT 

PRD ccchhhhhcccccccccccceeeecccccccceeeeeecccccccccccccccccccccc 

SEQ IPPLTQPFMLEKDILSSYEVVHRILKGKITGALNSAVTAPASNLAVVPPLLPLGCLQQAA 

PRD cccccchhhhhhhcchhhhhhhhhhcccccchhhhhhcccccceeeeccccccccccccc 



SEQ A 
PRD c 



(No Prosite data available for DKFZphtes3_2m20.2) 
(No Pfam data available for DKFZphtes3_2m20 . 2) 

Pedant information for DKFZphtes3_2m20, frame 3 

Report for DKFZphtes3_2m20 . 3 



[LENGTH] 183 

[MW] 19971.49 

[pi] 5.31 

[KW] Alpha_Beta 
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SEQ MIQQPRAPLVLEKASGEGFGKTAAIIQLAPKAPVDLCETEKLRAAFFAVPLEMRGSFLVL 

PRD ccccccccceeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhcchhhhhh 

SEQ LLRECFRDLSWLALIHSVRGEAGLLVTSIVPKTPFFWAMHITEALHQNMQALFSTLAQAE 

PRD hhhhhhcchhhhhhhhhhcccceeeeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EQQPYLEAPPLCAGLAVWQSTTWGIMDTPGTGVGCWTGWTPGLWSCSLILDSWPPSLCSL 

PRD hhhcccccccccccceeeecccceeecccccccccccccccccccceeeeccccccceee 

SEQ CAS 

PRD CCC 



(No Prosite data available for DKFZphtes3_2m20 . 3) 
(No Pfam data available for DKFZphtes3_2m20 . 3) 
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DKFZphtes3_2n9 



PCT/IB00/01496 



group: testes derived 

DKFZphtes3_2n9 encodes a novel 184 amino acid protein with very weak similarity to Homo 
sapiens PAC clone DJ0771P04 from 7qll . 21-qll . 23 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 



on genomic level encoded by HS1186N24, no splice pattern but EST 
matches 



Sequenced by EMBL 
Locus : unknown 



Insert length: 1000 bp 

Poly A stretch at pos . 988, polyadenylation signal at pos. 970 



1 CAACTTTTTA AAGATGTGAA TTGGACAGCC AGACTTGCTT ATTTGTCTGA 
51 TATCTTCAGT ATTTTTTAAT GATCTTAATG CTTCTATGCA AGGGAAGAAT 
101 GCAACTTATT TTTCAATGGC AGATAAAGTT GAAGGACAAA AACAGAAGTT 
151 AGAAGCTTGG AAAAACAGAA TTTCTACAGA TTGTTATGAC ATGTTTCATA 
201 ATTTAACAAC AATTATCAAT GAAGTAGGTA ATGATCTTGA TATTGCACAT 
251 CTGCGAAAAG TTATCAGTGA ACATCTTACA AATTTGTTAG AATGTTTTGA 
301 ATTTTATTTT CCATCAAAAG AAGATCCACG CATAGGAAAT TTGTGGATCC 

3 51 AAAATCCATT TCTTTCATCA AAAGATAACT TAAATTTAAC TGTAACTCTA 
401 CAGGATAAGT TGTTGAAGCT GGCTACCGAC GAAGGATTGA AAATCAGTTT 

4 51 TGAAAATACA GCATCACTTC CTTCATTTTG GATAAAAGCT AAAAATGACT 
501 ATCCTGAGCT TGCTGAGATT GCTTTAAAAT TGCTGCTTCT TTTCCCCTCA 
551 ACATACCTCT GTGAGACCGG ATTCTCTACT TTAAGTGTTA TTAAAACAAA 
601 ACATAGAAAC AGTTTAAATA TACATTATCC CCTGAGGTAG CATTGTCATC 
651 AATCCAACCT AGATTAGACA AATTAACAAG CAAGAAGCAA GCTCACTTAT 
7 01 CACATTAAAA GCTTTAAATA TTGATATGTA AGGTATTGGT TCAAAGTATG 
7 51 CATATAAGCA TTGAGTGTGA GGAATTTGCT ATTTCACTTT AAACTTTCTG 
801 TCTAGTTACA GTTATGGAAG TATGAGAAGT TATGAGTGAA ACAGCAATTT 
851 TCTATATAAA TTGCCTATAT GTATATTTTC AATTAAGAAT GTGTACAGTT 
901 TTTATAATTC TATTTTTCCT CATATTTGTC GTATTTATTA AAATATAATT 
951 TTAAATCTGT TGATTCTAAT ATTAAAACAT TTGATCTTAA AAAAAAAAAA 



BLAST Results 



Entry HS1186N24 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 1186N24 
Score = 4921, P = 5.8e-215, identities = 989/992 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 86 bp to 637 bp; peptide length: 184 
Category: similarity to unknown protein 
Classification: no clue 



1 MQGKNATYFS MADKVEGQKQ KLEAWKNRIS TDCYDMFHNL TTIINEVGND 

51 LDI AHLRKVI SEHLTNLLEC FEFYFPSKED PRIGNLWIQN PFLSSKDNLN 

101 LTVTLQDKLL KLATDEGLKI SFENTASLPS FWIKAKNDYP ELAEIALKLL 

151 LLFPSTYLCE TGFSTLSVIK TKHRNSLNIH YPLR 



BLASTP hits 



815 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2n9, frame 2 

TREMBLNEW: AC004883_3 gene: "WUGSC : H_DJ0771P04 . 2" ,- Homo sapiens PAC 
clone DJ0771P04 from 7qll . 21-qll . 23, complete sequence., N = 1, Score = 
94, P = 0.042 



>TREMBLNEW:AC004883_3 gene: "WUGSC : H_D J0771P04 . 2 "; Homo sapiens PAC clone 
DJ0771P04 from 7qll . 21-qll . 23, complete sequence. 
Length = 533 

HSPs: 

Score = 94 (14.1 bits), Expect = 4.3e-02, P = 4.2e-02 
Identities = 39/177 (22%), Positives = 75/177 (42%) 

Query: 1 MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLD-IAHLRKV 59 

+QG + M D + KL W+ ++ + F L + L+ I + ++ 

Sbjct: 354 LQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAEL 413 

Query: 60 ISEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLK 119 

+E L + F+ Y + + + +PF + D+++ LQ +++ L + LK 

Sbjct: 414 KTEFQKRLSD-FKLY ESELTL FSSPFSTKIDSVH — EELQMEVIDLQCNTVLK 4_63 

Query: 120 ISFENTASLPSFWIKAKNDYPXXXXXXXXXXXXFPSTYLCETGFSTLSVIKTKHRNSL 177 

++ +P F+ YP F STY+CE FS + + KTK+ + L 

Sbjct: 464 TKYDKVG-IPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQL 520 



Pedant information for DKFZphtes3_2n9 , frame 2 



Report for DKFZphtes3_2n9 . 2 



[LENGTH] 184 

[MW] 21203.53 

[pi] 6.52 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 6 . 52 % 



SEQ MQGKNATYFSMADKVEGQKQKLEAWKNRISTDCYDMFHNLTTIINEVGNDLDIAHLRKVI 

SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhcccceeecccccccchhhhhhhhh 

SEQ SEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLKI 

SEG 

PRD hhhhhhhhhhhhcccccccccccceeeeccccccccccceeeeehhhhhhhhhhhcccee 

SEQ SFENTASLPSFWIKAKNDYPELAEIALKLLLLFPSTYLCETGFSTLSVIKTKHRNSLNIH 

SEG xxxxxxxxxxxx 

PRD eecccccccceeeeecccchhhhhhhhhhhhhcccccccccccceeeeeecccccceeec 

SEQ YPLR 

SEG .... 

PRD cccc 



(No Prosite data available for DKFZphtes3_2n9 . 2 ) 
(No Pfam data available for DKFZphtes3_2n9.2) 
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DKFZphtes3_30f4 



group: testes derived 

DKFZphtes3_30f 4 encodes a novel 192 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

Sequenced by LMU 

Locus: /map="717 . 2-8 cR from top of Chr8 linkage group" 
Insert length: 1388 bp 

Poly A stretch at pos. 1330, polyadenylation signal at pos. 1310 



1 CACTGAGCCC TCCTCAGATG GTTAGTGGCT TCCAACAGCC ATCAGGAGTG 

51 TTTCTTGAAT GCCCCAGGTG TGGAGGACTT GGTCTGTGAC CACCTAGAAC 

101 CCCAGAGCTG AACAGGAAGC CGTCCCTGCA GCAACAAGAG GGCTGGAAGG 

151 GGGAGCTGCA GGCCACCCTC GGCTCTCCCA CTGCTGGGGC GGTGATGTTC 

201 GGGTGACATG TTTGAAAAAT ACTCTTAAAG ATACCAACTG TTCCCTTATA 

251 TGGCTAATGG TTTGTGCAGC CACCAGCGAT GGCGGCCCCT ATTAGAGACC 

301 AGGTTTGTTA AAACACCAAA TATTGCTGTC CACACTAGAC ATTAACCGGC 

351 TTCAGAAAAG ATGGACACCT TTTCCCACGC TGTTTCGCTT CTTAACTTTG 

401 GTCCAGCTTT AGCCACCACA CAGCGTGTGA GGGACTGCTG CTGCGGAGTC 

451 AGCCTCGTTT GTCCCTCCGC CTCCCACCAG CATGCGCCGC TTCTGAGAGA 

501 CACCAGCTCC CTGCCTCCAA GCCTGGTGCC ACAGGCCTGT CGTGAGGGAC 

551 CCCTGCTTCC GAGAGCTCCT GGGGGGGTTC TGCCCTTCAC CACCTGGGAG 

601 AGGTGTCAGT TCAGTTCCGA GTTGAACAAG GCCCGTGCAC ACAGCATGTT 

651 GGGGGCCCAG CCCAAAGTTC TTGTCACCTC CTCATGCAAA GCCAGCCATC 

7 01 ACCCTCCGGC CAGAGCTCAA GGTGGCCCCT TGGCCAGCCC CTCCTTGGGT 

751 CCTCCAGGAG GACTGAGCAC CCCTCCTAGC GGCATCCCTT GCCCTCCACA 

801 GTGCTGCCAG GGGCACGTCG CTCTGTGCCG TGGACTGAGA CCATCCCCTG 

851 GTGACAGAAT GACCCGTTTG TTGGAAATGC CTCGTTGCCA GAGAAACTCC 

901 CCAGGCATCT CGGAACGAAA CTATTTAGTT CCATTGTGAA CTGGCCACGG 

951 GACAGCTTTT TATCAACTTA TTAAGTTGGA GCACTGTAAT CGCGCTTGCT 

1001 GAGTTAGCAG TGGTGGTAAG CGTGTGTTAA ACACATAATG TTACGTTTTA 

1051 GGAGAGAGAG GTCGTAAGGA AGTGTCGTGT CGCTCATGAC TCTCTTCTAT 

1101 TAGTTGGGTA ACAGTGGCCT CATGTTTGTG TCTGTGTGTA CACAGAGCCC 

1151' TTAGGTTCTG CTCTGTTTCT TTGCCAGGTG AATGTTTGTG GCATGCGCTG 

1201 CTGTCCGCGC CCCTCTGTCC TGCGCAGGGT TCAGCTGTGC GGCGCCCTGA 

1251 TTTCCTCCAT GCACACAGAA CCTCCTTGTG TCTGTTTCTC TGTTCCTCTG 

1301 TGGCTGACTC AATAAACTTT TCCCTCTGAC ATGAAAAAAA AAAAAAAAAA 

1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 



Entry HS548358 from database EMBL: 
human STS EST67250. 
Score = 2126, P = 1.5e-89, identities = 444/472 

Entry HS670351 from database EMBL: 
human STS WI-18501. 
Score = 2089, P = 7.1e-88, identities = 445/476 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 361 bp to 936 bp; peptide length: 192 
Category: putative protein 
Classification: no clue 
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1 MDTFSHAVSL LNFGPALATT QRVRDCCCGV SLVCPSASHQ HAPLLRDTSS 

51 LPPSLVPQAC REGPLLPRAP GGVLPFTTWE RCQFSSELNK ARAHSMLGAQ 

101 PKVLVTSSCK ASHHPPARAQ GGPLASPSLG PPGGLSTPP3 GIPCPPOCCQ 

151 GHVALCRGLR PSPGDRMTRL LEMPRCQRNS PGISERNYLV PL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30f 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_30f 4 , frame 1 



Report for DKFZphtes3_30f 4 . 1 



[LENGTH] 192 

[MWJ 20281.56 

[pi] 9.21 

[BLOCKS] BL01013C Oxysterol-binding protein family proteins 

[KW] AllAlpha 

[KW] LOW_COMPLEXITY 10.94 % 



SEQ MDTFSHAVSLLNFGPALATTQRVRDCCCGVSLVCPSASHQHAPLLRDTSSLPPSLVPQAC 

SEG 

PRD ccchhhhheeecccccchhhhhhhhcccceeeeccccccccccccccccccccccccccc 

SEQ REGPLLPRAPGGVLPFTTWERCQFSSELNKARAHSMLGAQPKVLVTSSCKASHHPPARAQ 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhccccceeeeecccccccccccccc 

SEQ GGPLASPSLGPPGGLSTPPSGIPCPPQCCQGHVALCRGLRPSPGDRMTRLLEMPRCQRNS 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccchhhhhhhhcccccccchhhhhccccccccc 

SEQ PGISERNYLVPL 

SEG 

PRD cccccccccccc 



(No Prosite data available for DKFZphtes3_30f 4 . 1) 
(No Pfam data available for DKFZphtes3_30f 4 . 1) 
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DKFZphtes3_35b4 



group: cell cycle 

DKFZphtes3_35b4 encodes a novel 1780 amino acid protein which is C-terminal identical to human 
M-phase phosphoprotein-1 (MPP1). 

The novel protein contains a N-terminal Pfam kinesin motor domain and a ATP/GTP-binding site 
motif A (P-loop) . MPP1 is expressed and phosphorylated in the metaphase. Therefore the novel 
protein seems to be involved in the mitotic spindle during cell division. 

The new protein can find application in modulation of the mitotic spindle. 



"M-phase phosphoprotein-1" extension 



motor protein 
Sequenced by DKFZ 

Locus: /map="750_H_l; 758_H_7; 759_C_9; 847_D_4; 906_D_1; 931_D_3; 944_C_1; 750_G_12; 
800_A_11; 512.1 cR from top of ChrlO linkage group" 

Insert length: 6284 bp 

No poly A stretch found, no polyadenylation signal found 



1 ATCGCAGTGC TGCTCGCGGG TCTGGCTAGT CAGGCGAAGT TTGCAGAATG 
51 GAATCTAATT TTAATCAAGA GGGAGTACCT CGACCATCTT ATGTTTTTAG 
101 TGCTGACCCA ATTGCAAGGC CTTCAGAAAT AAATTTCGAT GGCATTAAGC 
151 TTGATCTGTC TCATGAATTT TCCTTAGTTG CTCCAAATAC TGAGGCAAAC 
201 AGTTTCGAAT CTAAAGATTA TCTCCAGGTT TGTCTTCGAA TAAGACCATT 
251 TACACAGTCA GAAAAAGAAC TTGAGTCTGA GGGCTGTGTG CATATTCTGG 
301 ATTCACAGAC TGTTGTGCTG AAAGAGCCTC AATGCATCCT TGGTCGGTTA 
351 AGTGAAAAAA GCTCAGGGCA GATGGCACAG AAATTCAGTT TTTCCAAGGT 
401 TTTTGGCCCA GCAACTACAC AGAAGGAATT CTTTCAGGGT TGCATTATGC 
4 51 AACCAGTAAA AGACCTCTTG AAAGGACAGA GTCGTCTGAT TTTTACTTAC 
501 GGGCTAACCA ATTCAGGAAA AACATATACA TTTCAAGGGA CAGAAGAAAA 
551 TATTGGCATT CTGCCTCGAA CTTTGAATGT ATTATTTGAT AGTCTTCAAG 
601 AAAGACTGTA TACAAAGATG AACCTTAAAC CACATAGATC CAGAGAATAC 
651 TTAAGGTTAT CATCAGAACA AGAGAAAGAA GAAATTGCTA GCAAAAGTGC 
7 01 ATTGCTTCGG CAAATTAAAG AGGTTACTGT GCATAATGAT AGTGATGATA 
751 CTCTTTATGG AAGTTTAACT AACTCTTTGA ATATCTCAGA GTTTGAAGAA 
801 TCCATAAAAG ATTATGAACA AGCCAACTTG AATATGGCTA ATAGTATAAA 
851 ATTTTCTGTG TGGGTTTCTT TCTTTGAAAT TTACAATGAA TATATTTATG 
901 ACTTATTTGT TCCTGTATCA TCTAAATTCC AAAAGAGAAA GATGCTGCGC 
951 CTTTCCCAAG ACGTAAAGGG CTATTCTTTT ATAAAAGATC TACAATGGAT 
1001 TCAAGTATCT GATTCCAAAG AAGCCTATAG ACTTTTAAAA CTAGGAATAA 
1051 AGCACCAGAG TGTTGCCTTC ACAAAATTGA ATAATGCTTC CAGTAGAAGT 
1101 CACAGCATAT TCACTGTTAA AATATTACAG ATT GAAGATT CTGAAATGTC 
1151 TCGTGTAATT CGAGTCAGTG AATTATCTTT ATGTGATCTT GCTGGTTCAG 
1201 AACGAACTAT GAAGACACAG AATGAAGGTG AAAGGTTAAG AGAGACTGGG 
1251 AATATCAACA CTTCTTTATT GACTCTGGGA AAGTGTATTA ACGTCTTGAA 
1301 GAATAGTGAA AAGTCAAAGT TTCAACAGCA TGTGCCTTTC CGGGAAAGTA 
1351 AACTGACTCA CTATTTTCAA AGTTTTTTTA ATGGTAAAGG GAAAATTTGT 
1401 ATGATTGTCA ATATCAGCCA ATGTTATTTA GCCTATGATG AAACACTCAA 
1451 TGTATTGAAG TTCTCCGCCA TTGCACAAAA AGTTTGTGTC CCAGACACTT 
1501 TAAATTCCTC TCAAGATAAA TTATTTGGAC CTGTCAAATC TTCTCAAGAT 
1551 GTATCACTAG ACAGTAATTC AAACAGTAAA ATATTAAATG TAAAAAGAGC 
1601 CACCATTTCA TGGGAAAATA GTCTAGAAGA TTTGATGGAA GACGAGGATT 
1651 TGGTTGAGGA GCTAGAAAAC GCTGAAGAAA CTCAAAATGT GGAAACTAAA 
1701 CTTCTTGATG AAGATCTAGA TAAAACATTA GAGGAAAATA AGGCTTTCAT 
1751 TAGCCACGAG GAGAAAAGAA AACTGTTGGA CTTAATAGAA GACTTGAAAA 
1801 AAAAACTGAT AAATGAAAAA AAGGAAAAAT TAACCTTGGA ATTTAAAATT 
1851 CGAGAAGAAG TTACACAGGA GTTTACTCAG TATTGGGCTC AACGGGAAGC 
1901 TGACTTTAAG GAGACTCTGC TTCAAGAACG AGAGATATTA GAAGAAAATG 
1951 CTGAACGTCG TTTGGCTATC TTCAAGGATT TGGTTGGTAA ATGTGACACT 
2001 CGAGAAGAAG CAGCGAAAGA CATTTGTGCC ACAAAAGTTG AAACTGAAGA 
2051 AGCTACTGCT TGTTTAGAAC TAAAGTTTAA TCAAATTAAA GCTGAATTAG 
2101 CTAAAACCAA AGGAGAATTA ATCAAAACCA AAGAAGAGTT AAAAAAGAGA 
2151 GAAAATGAAT CAGATTCATT GATTCAAGAG CTTGAGACAT CTAATAAGAA 
2201 AATAATTACA CAGAATCAAA GAATTAAAGA ATTGATAAAT ATAATTGATC 
2251 AAAAAGAAGA TACTATCAAC GAATTTCAGA ACCTAAAGTC TCATATGGAA 
2301 AACACATTTA AATGCAATGA CAAGGCTGAT ACATCTTCTT TAATAATAAA 
2351 CAATAAATTG ATTTGTAATG AAACAGTTGA AGTACCTAAG GACAGCAAAT 
2401 CTAAAATCTG TTCAGAAAGA AAAAGAGTAA ATGAAAATGA ACTTCAGCAA 
2 451 GATGAACCAC CAGCAAAGAA AGGGTCTATC CATGTTAGTT CAGCTATCAC 
2501 TGAAGACCAA AAGAAAAGTG AAGAAGTGCG ACCGAACATT GCAGAAATTG 
2 551 AAGACATCAG AGTTTTACAA GAAAATAATG AAGGACTGAG AGCATTTTTA 
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2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 
4651 
4701 
4751 
4801 
4851 
4901 
4951 
5001 
5051 
5101 
5151 
5201 
5251 
5301 
5351 
5401 
5451 
5501 
5551 
5601 
5651 
5701 
5751 
5801 
5851 
5901 
5951 
6001 
6051 
6101 
6151 
6201 
6251 



CTCACTATTG 
AAATAAACAG 
AGAATTTAAC 
ATTGCAATTG 
GGAAAAGATC 
TTACAAATAA 
CTACGTACTC 
TCTCAGGGAT 
AGTTAGACCT 
TATCGAATTC 
AGCTATTTGG 
GTCATCAGAT 
GTAAAAGGCT 
AAACCAAGAT 
AAGAAGAATT 
GTAGTTGAAG 
CTATAAGGCA 
TTGAACGTAG 
TCTATCATCT 
TCAGGATTCT 
TGAAAGAAGA 
TTACTTCAAT 
AAAATTGAAA 
AAGCAGATCT 
CTGACTGATG 
AATGCGTGAT 
AAAAGAAAAA 
CAGCAACTCA 
ACAGTATGAG 
AAGACATGCG 
GATCAAGTGC 
ATTGGAAAAA 
AAAGGTCAAA 
ACTAATCTTC 
TAGAAAGAAA 
AAGCACAGAA 
GAGCGTTTTT 
GACAGAGAAA 
TGGTTGCAGC 
CAGAAAGATA 
TAAAATAGAA 
CAGATCCTGA 
TCCAGAAATA 
GTCAACAGAA 
TTCAATTTAC 
TGTACCACAC 
TAATGAAATG 
CACCCAGAAC 
GTCAAAAAGG 
TTCTTTACGG 
AAAAAGAAGG 
TCAATTCTTC 
AAAGCTCTCA 
GAGCCAAACG 
TCAGGCCAAG 
GATTATCAAA 
GAAATGTTTA 
GTATTGTAAA 
GATTGCTGTT 
TTTGTATATT 
TTCCCTATTA 
GTTATTCTCA 
TTTACTTATA 
TACAACTGAT 
AAGTGTGTAC 
CAAGTCTTTA 
CATAGTACAT 
ATGCAATGTG 
AGCATTTTTC 
TTAAAAATAT 
ATCAGATAGT 
CCATCCCCAC 
ACCATTCTTC 
ATCCCATAAA 



AGAATGAACT 
ATTGTTCATT 
TTTAAGTAAA 
CTGAATTACA 
ATGAAATTGT 
TGTTTCACAA 
TTGATTCAGT 
CTGTCAAATG 
TTTAGGTAAT 
AAGAACCCAA 
GAAGAATGTA 
TGAGGAACTG 
ATAAGGATGA 
GACCTACTAA 
GCAAGAAAAA 
GAAAGAGAGC 
AAAATAAAGG 
TCATTCAGCC 
TAAAGCTAGA 
GTCAAAAACA 
AATCACACAG 
TAAAAGAAGA 
GAGGAACTCT 
TCAGAGGAAG 
CCAAAAAGCA 
GAGGATAAAT 
CCAGTGTTCT 
AGGAGCAGTT 
AGAGCATGCA 
AATGACACTA 
TTGAGGCTAA 
TGGAAGGAAA 
TAAAGAACAT 
AAGATGAGTT 
TGGTTAGAAG 
TATACGAAAT 
TTAAGCAACA 
GATAGTGACC 
TTTAGAAATA 
ATGAAATTGA 
ACACAAATCA 
CAAACTTCAA 
AAATAGAGGA 
AATGATCAAA 
ACCTTTACAG 
CAGTGACAGT 
GAGGAGGACT 
TAATTTGAAA 
AACAAAAGGT 
AGTCAGGCAT 
AACACTACAG 
AATCAAAAGC 
AATGTAGAAG 
GAAATTATAC 
TGATTTTAAT 
CGACGACTTC 
ATATAAATTT 
TATAAATGTA 
TTATACATAG 
TTTATAAGGC 
TCTCAGACAT 
TTTATTTTGC 
TTTAACAATG 
TTTACATATC 
AGATCACAAA 
ATTAGAATGT 
GTATATATTT 
AAATACGTGT 
CTTTGAATTA 
ACAAGTAAGT 
AGATCATTCT 
CTCCCCCTGC 
TACTCTGTAT 
TAAATGAGAA 



TAAAAATGAA 
TTCAGCAGGA 
GAGGTCCAAC 
TGTGCAGAAA 
CAAATGAGAT 
ATAAAATTAA 
TTCTCAGATT 
GTTCTGAGGA 
GATTATTTGG 
TAGGGAAAAT 
AAGAGATTGT 
GAACAACAAA 
AAACAATAGA 
AAGAAAAAGA 
AATGTTACTC 
GCTTTCAGAA 
AACTTGAAAC 
AAGTTAGAAC 
AAGAAATTTG 
CCAAAGATTT 
TTAACAAATA 
AGAAGAAGAA 
CTGCAAGCTC 
GAAGAAGATT 
GATTAAGCAA 
TACTGAGGAT 
CAGGAATTAG 
AAATAATCAG 
AAGATCTAAA 
GAAGAACAGG 
ATTAGAGGAA 
AATGCAATGA 
GAGAACAACA 
ACAGGAGTCT 
AAAAAATGAT 
AAAGAGATGA 
GAATGAAATG 
TTCAAAAGTG 
CAGCTAAAAG 
ACAACTAAAA 
TGGATATCAA 
ACTGAACCTC 
TGGATCTGTA 
GCACTCGATT 
CCAAACAAAA 
TGAGATTCCC 
TGGTGAAATG 
TTTCCTATTT 
TGCCATACGT 
CCATAATTGG 
AAATTTGGAG 
AAAGAAGATA 
CAAGTAAAGA 
ACAAGTGAAA 
GGACCAGAAA 
GAACAAAAAC 
TATAGTCATA 
TATATTATGC 
TATAATTTTA 
TTTTTTATAA 
TGGATCAGTG 
TATACAGGAT 
TCTTATGAAT 
TGTTTGGATT 
ACATGTATAT 
CTCACTTATT 
ACGGGGTATG 
ATCATGGAGA 
CAGATAATCC 
TATTATTGAT 
TTTTATCTTA 
AACCGTCAGT 
GCCCATGAGG 
CATGCAAAAA 



AAGGAAGAAA 
ACTTTCTCTT 
AAATTCAGTC 
AGTAAAAATC 
AGAAACTGCT 
TGCACACGAA 
TCAAACATAG 
GGATAATTTG 
TAAGTAAGCA 
TCTTTCCACT 
GAAGGCCTCT 
TTGAAAAATT 
CTAAAGGAGA 
AACTCTTATA 
TTGATGTTCA 
CTTACACAAG 
AATTTTAGAG 
AAGACATTTT 
AAGGAATTTC 
AAATGTAAAG 
ATTTGCAAGA 
ACCAACAGGC 
TGCTCGTACC 
ATGCTGACCT 
GTACAGAAAG 
TAAAATTAAT 
ATATGAAGCA 
AAAGTGGAAG 
TGTTAAAGAG 
AACAAACTCA 
GTTGAAAGGC 
TTTGGAAACC 
CAGATGTGCT 
GAACAGAAAT 
GCTTATCACT 
AAAAATATGC 
GAAATACTGA 
GCGAGAAGAA 
CACTGATATC 
AGGATCATAT 
GCCCAAACGT 
TATCGACAAG 
GTCCTTGACT 
TCCAAAACCT 
TGGCAGTGAA 
AAGGCTCGGA 
TGAAAATAAG 
CAGATGAIAG 
CCATCATCTA 
TGTAAACCTG 
ACTTCTTACA 
ATTGAAACAA 
AAATGTGTCT 
TTTCATCTCC 
ATGAAGGAGA 
AGCCAAATAA 
GTCATTGGAA 
ATTAAATCAC 
ATTCAATAAA 
TAGCTTCTTT 
AAGATCCTAG 
GTAATAGGTC 
TTTTTTTACT 
ATAGCTAGGA 
ACATTATTTA 
TTGTAAACAT 
TGAGATGTTT 
ATGAGGTATC 
AATTACATTC 
TATAGTCACT 
TTTGTTTTTG 
ACCCTTACCA 
TCAATTGATT 
AAAA 



AAGCAGAATT 
TCTGAAAAAA 
AAATTATGAT 
AAGAACAGGA 
ACAAGAAGCA 
AATAGACGAA 
ATTTGCTCAA 
CCAAATACAC 
AGTTAAAGAA 
CTAGTATTGA 
TCCAAAAAAA 
GCAGGCAGAA 
AGGAGCATAA 
CAGCAGCTGA 
AATACAGCAT 
GTGTTACTTG 
ACTCAGAAAG 
GGAAAAGGAA 
AAGAACATCT 
GAACTCAAGC 
TATGAAACAT 
AAGAAACAGA 
CAGAATCTGA 
GAAAGAGAAA 
AGGTATCTGT 
GAACTGGAGA 
GCGAACCATT 
AAGCTATACA 
AAAATAATTG 
GGTAGAACAG 
TGGCCACAGA 
AAAAACAATC 
TGGAAAGCTC 
ATAATGCTGA 
CAAGCGAAAG 
TGAGGACAGG 
CAGCCCAGCT 
CGAGATCAAC 
CAGTAATGTA 
CAGAGACTTC 
ATTAGTTCAG 
TTTTGAAATT 
CITGTGAAGT 
GAGTTAGAGA 
ACACCCTGGT 
AGAGGAAGAG 
AAGAATGCTA 
AAATTCTTCT 
AGAAAACATA 
GCCACTAAGA 
ACATTCTCCC 
TGAGCTCTTC 
CAACCAAAAC 
TATTGATATA 
GTGATCACCA 
ATCACTTATG 
CTTGCATCCT 
TCTGCATATA 
TGAGTCAAAA 
CAAACTGTAT 
GAAAGAGGCT 
AGGTATTTGG 
TTATCTGTTA 
TTTGGAGAAT 
GAAAAGATCT 
TTTGTGGGTA 
TGACACAGGC 
CATCCCCTCA 
TTTAGATCAT 
CTATTGTGCT 
TACCCATTAA 
GCCACTGGTA 
TTATTTTTAG 



BLAST Results 



Entry HS898149 from database EMBL: 
human STS WI-9217. 
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PCT/IB00/01496 



Score = 4247, P = 1.5e-187, identities = 855/862 



Medline entries 



94119956 : 

Cloning of cDNAs for M-phase phosphoproteins recognized 
by the MPM2 monoclonal antibody and determination of the 
phosphorylated epitope. 

98101856: 

Interaction of a Golgi-associated kinesin-like protein with 
Rab6. 

95122643: 

Identification and partial characterization of mitotic 
centromere-associated kinesin, a 

kinesin-related protein that associates with centromeres during 
mitosis . 



Peptide information for frame 3 



ORF from 48 bp to 5387 bp; peptide length: 1780 
Category: known protein 

Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (152-160) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



MESNFNQEGV 
NSFESKDYLQ 
LSEKSSGQMA 
YGLTNSGKTY 
YLRLSSEQEK 
ESIKDYEQAN 
RLSQDVKGYS 
SHSIFTVKIL 
GNINTSLLTL 
CMIVNISQCY 
DVSLDSNSNS 
KLLDEDLDKT 
IREEVTQEFT 
TREEAAKDIC 
RENESDSLIQ 
ENTFKCNDKA 
QDEPPAKKGS 
LLTIENELKN 
DIAIAELHVQ 
ELRTLDSVSQ 
EYRIQEPNRE 
EVKGYKDENN 
HVVEGKRALS 
ESIILKLERN 
HLLQLKEEEE 
KLTDAKKQIK 
IQOLKEQLNN 
QDQVLEAKLE 
LTNLQDELQE 
RERFFKQQNE 
VQKDNEIEQL 
ISRNKIEDGS 
GCTTPVTVEI 
SVKKEQKVAI 
PSILQSKAKK 
ISGQVILMDQ 



PRPSYVFSAD 
VCLRIRPFTQ 
QKFSFSKVFG 
TFQGTEENIG 
EEIASKSALL 
LNMANSIKFS 
FIKDLQWIQV 
QIEDSEMSRV 
GKCINVLKNS 
LAYDETLNVL 
KILNVKRATI 
LEENKAFISH 
QYWAQREADF 
ATKVETEEAT 
ELETSNKKII 
DTSSLIINNK 
IHVSSAITED 
EKEEKAELNK 
KSKNQEQEEK 
ISNIDLLNLR 
NSFHSSIEAI 
RLKEKEHKNQ 
ELTQGVTCYK 
LKEFQEHLQD 
ETNRQETEKL 
QVQKEVSVMR 
OKVEEAIQQY 
EVERLATELE 
SEQKYNADRK 
MEILTAQLTE 
KRIISETSKI 
VVLDSCEVST 
PKARKRKSNE 
RPSSKKTYSL 
IIETMSSSKL 
KMKESDHQII 



PIARPSEINF 
SEKELESEGC 
PATTQKEFFQ 
ILPRTLNVLF 
RQIKEVTVHN 
VWVSFFEIYN 
SDSKEAYRLL 
IRVSELSLCD 
EKSKFQQHVP 
KFSAIAQKVC 
SWENSLEDLM 
EEKRKLLDLI 
KETLLQEREI 
ACLELKFNQI 
TQNQRIKELI 
LICNETVEVP 
QKKSEEVRPN 
QIVHFQQELS 
IMKLSNEIET 
DLSNGSEEDN 
WEECKEIVKA 
DDLLKEKETL 
AKIKELETIL 
SVKNTKDLNV 
KEELSASSAR 
DEDKLLRIKI 
ERACKDLNVK 
KWKEKCNDLE 
KWLEEKMMLI 
KDSDLQKWRE 
ETQIMDIKPK 
ENDQSTRFPK 
MEEDLVKCEN 
RSQASIIGVN 
SNVEASKENV 
KRRLRTKTAK 



DGIKLDLSHE 
VHILDSQTVV 
GCIMQPVKDL 
DSLQERLYTK 
DSDDTLYGSL 
EYIYDLFVPV 
KLGIKHQSVA 
LAGS E RTMKT 
FRESKLTHYF 
VPDTLNSSQD 
EDEDLVEELE 
EDLKKKLINE 
LEENAERRLA 
XAELAXTKGE 
NIIDQKEDT1 
KDSKSKICSE 
IAEIEDIRVL 
LSEKKNLTLS 
ATRSITNNVS 
LPNTQLDLLG 
SSKKSHQIEE 
IQQLKEELQE 
ETQKVERSHS 
KELKLKEEIT 
TQNLKADLQR 
NELEKKKNQC 
EKIIEDMRMT 
TKNNQRSNKE 
TQAKEAENIR 
ERDQLVAALE 
RISSADPDKL 
PELEIQFTPL 
KKNATPRTNL 
LATKKKEGTL 
SQPKRAKRKL 



FSLVAPNTEA 
LKEPQCILGR 
LKGQSRLIFT 
MNLKPHRSRE 
TNSLNISEFE 
SSKFQKRKML 
FTKLNNASSR 
QNEGERLRET 
QSFFNGKGKI 
KLFGPVKSSQ 
NAEETQNVET 
KKEKLTLEFK 
IFKDLVGKCD 
LIKTKEELKK 
NEFQNLKSHM 
RKP.VNENELQ 
QENNEGLRAF 
KEVQQIQSNY 
QIKLMHTKID 
NDYLVSKQVK 
LEQQIEKLQA 
KNVTLDVQIQ 
AKLEQDILEK 
QLTNNLQDMK 
KEEDYADLKE 
SQELDMKQRT 
LEEQEQTQVE 
HENNTDVLGK 
NKEMKKYAED 
IQLKALISSN 
QTEPLSTSFE 
QPNKMAVKHP 
KFPISDDRNS 
QKFGDFLQHS 
YTSEISSPID 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b4, frame 3 

TREMBL : U93121_l product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds . , N = 1, Score = 3743, P = 0 
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PIR:A36881 MPM2-reactive phosphoprotein 1 - human (fragment), N = 2, 
Score = 2808, P = 2.5e-294 

TREMBL:AF070672_1 product: "rabkinesin6" ; Homo sapiens rabkinesin6 
mRNa, complete cds., N = 2, Score = 680, P = 2.6e-99 



>TREMBL:U93121_1 product: "M-phase phosphoprotein-1" ; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 
Length = 753 

HSPs : 

Score = 3743 (561.6 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 752/753 (99%), Positives = 753/753 (100%) 



Query: 


1028 


Sbjct: 


1 


Query: 


1088 


Sbjct: 


61 


Query: 


1148 


Sbjct: 


121 


Query: 


1208 


Sbjct: 


181 


Query: 


1268 


Sbjct: 


241 


Query: 


1328 


Sbjct: 


301 


Query : 


1388 


Sbjct: 


361 


Query: 


1448 


Sbjct: 


421 


Query: 


1508 


Sbjct: 


481 


Query: 


1568 


Sbjct : 


541 


Query: 


1628 


Sbjct: 


601 


Query: 


1688 


Sbjct: 


661 


Query: 


1748 


Sbjct: 


721 


Score 


= 197 


Identities : 


Query: 


692 


Sbjct: 


1 


Query: 


751 


Sbjct: 


59 


Query: 


808 



VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 



LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 



LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 



EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 



VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 



NVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 



NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIRNKEMKKY 



AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 



EQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCE 



VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTV+IPKARKRKSNEMEEDLVK 



CENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASI IGVNLATKKKE 



GTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 



P I DI SGQVI LMDQKMKESDHQI IKRRLRTKTAK 



114/542 (21%), Positives 



E + I++L+ 



11, P = 2.1e-ll 
253/542 (46%) 



+N R+KE + 



++D + E + L 



+ N 



+ + K 



+E 



K+KI E + + E + 



AK 



808 KGSIHVSSAITEDQKKSEEVRPNIAE-IEDIRVLQENNEGLRAFLLTIENELKNEK 8 62 



822 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



+ + S I + ++ +E + ++ + +++ + L L+ + + N L++ K 

Sbjct: 118 QDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQ 177 

Query: 863 — EEKAELNKQIVH-FQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEE 919 

EE+ E N+Q ++ELS S + L ++Q+ + +Y A+L K K + ++ 
SbjCt: 178 LKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDY ADL KEKLTDAKK 230 

Query: 920 KIMKLSNEIETATRSITNNVSQIKLMHTKI DEL-RTLDSVSQISNI DLLNLRDLSNGSEE 978 

+1 ++ E+ S+ + + KL+ KI+EL + + SQ +D+ R + E+ 

Sbjct: 231 QIKQVQKEV SVMRD — EDKLLRIKINELEKKKNQCSQ — ELDMKQ-RTIQQLKEQ 280 

Query: 979 DNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAIWEECKEIVKASSKKSHQI 1038 

N N + + + Y + K+ ++E E+ ++E + E + K ++ 

Sbjct: 281 LN — NQKVEEAIQQY — ERACKDLNVKEKIIED-MRMTLEEQEQTQVEQDQVLEAKLEEV 335 

Query: 1039 EELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEELQEKNVT 1094 

E L ++EK + + + +NN+ KEH+N D+L + L +L+E Q+ N 
Sbjct: 336 ERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKW 395 

Query: 1095 LDVQIQHVVEGKRA LSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

L++++ + KA ++ + + + E+E IL Q E+ + ++ 

Sbjct: 396 LEEKMMLITQAKEAENIRNKEMKKYAEDRERFFKQQNEME-ILTAQLTEKDSDLQKWRE- 453 

Query: 1148 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELK-LKEEITQLTNNLQDMKHLLQLK 1206 

E++ ++ LE LK + +V+ KD +1-+LK + E +++ + D+K + 
Sbjct: 454 -ERDQLVAALEIQLKAL ISSNVQ— KDNEIEQLKRIISETSKIETQIMDIK PKR 504 

Query: 1207 EEEEETNRQETEKLKEELSASSARTQN 1233 

+ ++ +TE L S + ++ 

SbjCt: 505 ISSADPDKLQTEPLSTSFEISRNKIED 531 

Score = 186 (27.9 bits), Expect = 3.2e-10, P = 3.2e-10 
Identities = 131/674 (19%), Positives = 294/674 (43%) 

Query: 673 LELKFNQIKAELAKTKGELIKT-KEELKKRENESDSLIQELETSNKKI ITQNQRIKELIN 731 

L+ K ++ + +L K K LI+ KEEL+++ D IQ + + + Q + 
SbjCt: 35 LKEKEHKNQDDLLKEKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKA 94 

Query: 732 IIDQKEDTINEFQNL-KSHMENTFKCNDKADTSSLIINNKLICNETVEVPKDSKSKICSE 790 

I + E TI E Q + +SH + D + S+I+ + EE +DS 
Sbjct: 95 KIKELE-TILETQKVERSHSAKLEQ— DILEKESIILKLERNLKEFQEHLQDS- VKN 147 

Query: 791 RKRVNENELQ-QDEPPAKKGSIHVSSAITEDQKKSEEV-RPNIAEI-EDIRVLQENNEGL 847 

K +N EL+ ++E ++ + + +++ EE R ++ E++ + L 

Sbjct: 148 TKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNL 207 

Query: 848 RAFLLTIENELKNEKEEKAELNKQI VHFQQELSLSEKKNLTLSKEVQQI QSNYDI 902 

+A L E + + KE+ + KQI Q+E+S+ ++ L ++ ++ Q + ++ 

Sbjct: 208 KADLQRKEEDYADLKEKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL 267 

Query: 903 AIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQI 961 

+ + +Q+ K Q +K+ + +EA + + I+M ++E +T Q+ 

Sbjct: 268 DMKQRTIQQLKEQLNNQKVEEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQV 327 

Query: 962 SNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRI — QEPNRENSFHSSIEA 1019 

L + L+ E+ L+ N + + + N ++ S + 

Sbjct: 328 LEAKLEEVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQK 387 

Query: 1020 IWEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQ — DDLLKEK 1077 

+ K+ ++ Q +E E K E+K Y ++ R +++++ + L EK 

SbjCt: 388 YNADRKKWLEEKMMLITQAKEAENIRNK EMKKYAEDRERFFKQQNEMEILTAQLTEK 444 

Query: 1078 ETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVER 1137 

+ + +Q+ +EE + L++Q++ ++ + + ++ ++ET + K +R 

Sbjct: 445 DSDLQKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKR 504 

Query: 1138 SHSAKLEQDI LEKES I ILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEITQLT 1193 

SA ++ E S ++ RN E + DS +N + + +L+ + T L 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQ 564 

Query: 1194 NNLQDMKH LLQLKEEEEETNRQETEKLKEEL-SASSARTQNLKADLQRKEEDYADLK 1249 

N +KH + + + ++++ +++E+L + + + +L+ D + 

Sbjct: 565 PNKMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSS 624 

Query: 1250 EKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL-DMKQRTIQQLKEQL 1308 

K + K 1+ K+ +R + + I +N KKK Q+ D Q + L+ + 

Sbjct: 625 VK-KEQKVAIRPSSKKTYSLRSQASI — IGVNLATKKKEGTLQKFGDFLQHSPSILQSKA 681 

Query: 1309 NNQKVEEAI QQYERACKDLNVKEKI I EDMR 1338 

+K+ E+ + + + + KE + + R 
Sbjct: 682 — KKIIETMSSSKLSNVEAS-KENVSQPKR 708 



823 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Score = 165 (24.8 bits), Expect = 5.8e-08, P = 5.8e-08 
Identities = 140/626 (22%), Positives = 271/626 (43%) 



Query : 


536 


VEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEK- 


594 






+EELE E E K +D + L+E + H+ + LL EL ++L E +EK 




Sbjct: 


11 


IEELEQQIEKLQAEVKGY-KDENNRLKEKE HKNQDDLLKEKETLIQQLKEELQEKN 


65 


Query : 


595 


LTLEFKIREEVT QEFTQYWAQREADFKE — TLLQEREILEENAERRLAIFKDLVG 


647 






+TL+ +1+ V E TQ +A KE T+L+ +++ E + +L +D++ 




Sbjct: 


66 


VTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV-ERSHSAKLE — QDILE 


122 


Query : 


648 


KCDT REEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENE 


704 






K E K+ ++ + T L +K ++K E+ + L K L+ +E E 




Sbjct : 


123 


KESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEE 


182 


Query : 


705 


SDSLIQELETSNKKI ITQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSS 


764 






++ QE E +++ + R + L + +KE+ ++ ++ K K+S 




Sbjct : 


183 


EETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQK-EVSV 


241 


Query : 


7 65 


LIINNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKS 


824 






+ +KL+ + E+ K K CS+ + + +QQ + V AI + ++ 




Sbjct: 


242 


MRDEDKLLRIKINELEK— KKNQCSQELDMKQRTIQQLKEQLNNQK— VEEAIQQYERAC 


297 


Query : 


825 


EEVRPNIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEK 


884 






+++ IED+R+ E E + + + L+ + EE L ++ ++ + + + E 




Sbjct: 


298 


KDLNVKEKI I EDMRMTLEEQEQTQ VEQDQVLEAKLEEVERLATELEKWKEKCNDLET 


354 


Query : 


885 


KNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNE-IETATRSITN N 


93B 






KN S + + ++N D+ + +L + + QE E+K + +E IT N 




Sbjct : 


355 


KNNQRSNK--EHENNTDV-LGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAEN 


411 


Que ry : 


939 


VSQIKLMHTKIDELRTLDSVSQISNIDL-LNLRD--LSNGSEEDNLPNTQLDLLGNDYLV 


995 






+ ++ D R +++ + L +D L EE + L++ + 




Sbjct: 


412 


IRNKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALIS 


471 


Query : 


996 


SKQVKEYRIQEPNRENSFHSSIEA-IKE-ECKEIVKASSKKSHQIEELEQQIEKLQAEVK 


1053 






S K+ I++ RSSIEI++KIA KQEL E+ +++ 




Sbjct: 


472 


SNVQKDNEIEQLKRI ISETSKIETQIMDIKPKRISSADPDKL-QTEPLSTSFEISRNKIE 


530 


Query : 


1054 


GYKDENNRLKEKEHKNQDDLLKEKE TLIQQLKEELQEKNVTIiDVQIQHVVEGKRA 


1108 






+ + +Q + E T +Q K ++ T V ++ KR 




Sbjct: 


531 


DGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKIPKARKRK 


590 


Query : 


1109 


LSELTQG-VTCYKAKIKELETILETQ-KVERSHSAKLEQDILEKES 1152 








+E+ + V C K T L+ +R+ S K EQ + + S 




Sbjct: 


591 


SNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPS 636 




Score 


= 143 


(21.5 bits), Expect = 1.3e-05, P = 1 . 3e-05 




Identities » 


= 164/684 (23%), Positives = 304/684 (44%) 




Query : 


295 


QKRKMLR-LSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASS 


349 






+K +++ L ++++ + D+Q V + K A L G+ +L 




Sbjct: 


49 


EKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV 


108 


Query : 


350 


-RSHSI-FTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGE-RLRETGNINTS 


406 






RSHS IL+ E + +E LS+KNE +L+E T+ 




Sbjct: 


109 


ERSHSAKLEQDILEKESIILKLERNLKEFQE-HLQDSVKNTKDLNVKELKLKEEITQLTN 


167 


Query: 


407 


LLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDET 


466 






L K + LK E+ +Q + +L+ N K + + Y E 




Sbjct : 


168 


NLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADL QRKEEDYADLKEK 


224 


Query: 


467 


LNVLKFSAIAQKVCVPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSL 


526 






L K I Q V ++ +DKL +K ++ + N S+ L++K+ TI 




Sbjct: 


225 


LTDAK-KQIKQ-VQKEVSVMRDEDKLLR-IKINE-LEKKKNQCSQELDMKQRTIQQLKEQ 


280 


Query: 


527 


EDLMEDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLKK 


585 






+ + E+ +++ E A + NV+ K++ ED+ TLEE + + E+ ++L+ +E++++ 




Sbjct: 


281 


LNNQKVEEAIQQYERACKDLNVKEKII-EDMRMTLEEQEQ — TQVEQDQVLEAKLEEVER 


337 


Query : 


586 


KLIN-EK-KEKLT-LEFKIREEVTQEFTQYWAQREADFKETLLQEREILEE NAERR 


638 






EK KEK LE K + +E + K T LQ+ E+ E NA+R+ 




Sbjct: 


338 


LATELEKWKEKCNDLETKNNQRSNKEHEN NTDVLGKLTNLQD-ELQESEQKYNADRK 


393 


Query: 


639 


LAIFKDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEEL 


698 






+ + ++ T+ + A++I K E ++ E F Q + E+ +L + +L 




Sbjct: 


394 


KWLEEKMM— LITQAKEAENI -RNK-EMKKYAEDRERFFKQ-QNEMEILTAQLTEKDSDL 


448 



Query: 699 KKRENESDSLIQELETSNKKIITQN-QR IKELINIIDQKEDTINEFQNLKSHMENTF 754 

+K E D It LE K +1+ N Q+ I++L II + + ++K ++ 
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Sbjct: 449 QKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKRISSA 508 

Query: 755 KCNDKADTSSLIINNKLICN — ETVEVPKDSKSKICSERK RVNENELQ-QDEP--PA 806 

DK T L + ++ N E V DS ++ +E R + EL+ Q P P 

SbjCt: 509 D-PDKLQTEPLSTSFEISRNKIEDGSVVLDS-CEVSTENDQSTRFPKPELEIQFTPLQPN 566 

Query: 807 KKGSIH — VSSAITEDQKKSEEVRPNI AEIEDIRVLQENNEGLRA FLLTIENELKNE 861 

K H +++T K+ + + NE + ++ + N R F+++ + 
Sbjct: 567 KMAVKHPGCTTPVTVKIPKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVK 62 6 

Query: 862 KEEKAEL NKQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQE 918 

KE+K + +K+ + + S+ NL K+ +Q D + +SK ++ 

Sbjct: 627 KEQKVAIRPSSKKTYSLRSQASIIGV-NLATKKKEGTLQKFGDFLQHSPSILQSKAKKII 685 

Query: 919 EKIM — KLSNEIETATRSITNNVSQIKLMHTKI — DELRT-LDSVSQISNID 965 

E + KLSN +E + HVSQ K K+ E+ + +D Q+ +D 

Sbjct: 686 ETMSSSKLSN-VEASKE NVSQPKRAKRKLYTSEISSPI DI SGQVILMD 732 

Score - 133 (20.0 bits). Expect = 1.6e-04, P - 1.6e-04 
Identities = 94/426 (22%), Positives = 188/426 (44%) 

Query: 527 EDLM-EDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLK 584 

+DL+ E E L+++L+ + +NV LD + +E +A + I++L+ 

Sbjct: 44 DDLLKEKETLIQQLKEELQEKNVT LDVQIQHVVEGKRALSELTQGVTCYKAKIKELE 100 

Query: 585 KKLINEKKEKLTLEFKIREEVTQ-EFTQYWAQREA-DFKETLLQEREILEENAERRLAIF 642 

L +K E+ + K+ +++ + E +R +F+E L + ++ + L + 

Sbjct: 101 TILETQKVER-SHSAKLEQDILEKESI ILKLERNLKEFQEHLQDSVKNTKDLNVKELKL- 158 

Query: 643 KDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRE 702 

K+++ +K+ K E EE + ++K EL+ + K +L+++E 

Sbjct: 159 KEEITQLTNNLQDMKHLLQLKEEEEETN RQETEKLKEELSASSARTQNLKADLQRKE 215 

Query: 703 NESDSLIQELETSNKKIITQNQRIKELINI IDQK-2DTINEFQNLKSHMENTFKCNDKA- 760 

+ L ++L T KK I Q Q+ ++ D+ INE + K+ + 

Sbjct: 216 EDYADLKEKL-TDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTI 274 

Query: 7 61 DTSSLIINNKLICNETVE VPKDS — KSKICSE-RKRVNENE LQQDEPPAKKGS 810 

+NN+ + E ++ KD KKI+R+EE ++QD+ K 

Sbjct: 275 QQLKEQLNNQKV-EEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQVLEAKLE 333 

Query: 811 IHVSSAITEDQKKSEEVRP-NIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELN 869 

V TE +K E+ + ENN + L +++EL+ E E+K + 

Sbjct: 334 -EVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQ-ESEQKYNAD 391 

Query: 870 KQIVHFQQELSLSEKKNLTL3KEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNEIE 929 

++ ++++ L +T +KE + I++ + K E E+ K NE+E 

Sbjct: 392 RK-KWLEEKMML ITQAKEAENI RNK EMKKYAEDRERFFKQQNEME 435 

Query: 930 TATRSITNNVSQIKLMHTKI DEL 952 

T +T S ++ + D+L 

Sbjct: 436 ILTAQLTEKDSDLQKWREERDQL 458 

Pedant information for DKFZphtes3_35b4 , frame 3 



Report for DKFZphtes3_35b4 . 3 

[LENGTH] 1780 

[MW] 206176.77 

[pi] 5.60 

[HOMOL] TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 
phosphoprotein-1 mRNA, partial cds. 0.0 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YEL061C] 2e-37 

[ FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

7e-30 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-30 

[FUNCAT] 30.05 organization of centrosome [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 11.01 stress response [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] le-21 
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[FUNCAT] 
[ FUNCAT] 
MYOl - myos 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
j annaschii 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-07 
[FUNCAT] 
[FUNCAT] 
3e-06 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
YAL035W] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[ SCOP] 
[SCOP] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 



99 unclassified proteins [S. cerevisiae, YLR309c] 6e-20 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
in-1 isoform] 4e-19 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 4e-19 

03.19 recombination and dna repair [S. cerevisiae, YNL250w] le-15 
1 genome replication, transcription, recombination and repair [M. 
MJ1322] 2e-14 

30.13 organization of chromosome structure [S. cerevisiae, YDR285w] 2e-09 
09.04 biogenesis of cytoskeleton [S. cerevisiae, YKL179c] 3e-09 

09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] 2e-07 

03.01 cell growth [S. cerevisiae, YNL079c] 2e-07 

08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 



03.22.01 cell cycle check point proteins 
10.05.99 other pheromone response activities 



[S. 



cerevisiae, YGL086w] le-06 
[S. cerevisiae, YHR158c] 



2e-04 



04.05.01.04 transcriptional control [S. cerevisiae, YDR217c] 4e-06 
98 classification not yet clear-cut [S. cerevisiae, YJR134c] 2e-05 
05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

r general function prediction [M. jannaschii, MJ1254] 0.001 

BL00387A 

BL00411H 

BL00411G 

BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411D Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 

d2kin.l 3.29.1.5.3 Kinesin [Rat (Rattus norvegicus) 2e-68 

d2tmab_ 1.105.4.1.1 Tropomyosin (rabbit (Oryctolagus cuniculus) 4e-05 

d3kar 3.29.1.5.4 Kinesin [Baker's yeast ( Saccharomyce 2e-09 

3.6.1.32 Myosin ATPase 5e-25 
nucleus 4e-27 
phosphotransferase 3e-16 
duplication' 6e-20 
citrulline 6e-18 
tandem repeat 4e-24 
heterodimer 3e-28 
endocytosis le-23 
heart le-17 ■ 

transmembrane protein 2e-28 

serine/threonine-specific protein kinase 3e-16 

zinc finger le-23 

surface antigen 2e-16 

DNA binding le-25 

metal binding le-23 

muscle contraction 4e-24 

heterotetramer 4e-24 

acetylated amino end 2e-19 

actin binding 5e-25 

mitosis 3e-58 

microtubule binding 3e-58 

ATP 3e-58 

thick filament 4e-24 

phosphoprotein 9e-29 

leucine zipper le-12 

skeletal muscle 8e-24 

disulfide bond le-12 

heterotrimer le-29 

calcium binding 6e-18 

alternative splicing 4e-21 

P-loop 2e-63 

coiled coil 3e-58 

heptad repeat le-25 

methylated amino acid 4e-24 

peripheral membrane protein le-23 

dimer le-12 

cardiac muscle le-17 

hydrolase 5e-25 

microtubule 6e-15 

muscle 7e-23 

membrane protein 6e-20 

GTP binding 8e-22 

EF hand 6e-18 

cell division le-25 

cytoskeleton 4e-24 

hair 6e-18 

Golgi apparatus 8e-24 
calmodulin binding le-23 
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[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 3e- 


[SUPFAM] 


myosin motor domain homology 5e-25 


[SUPFAM] 


alpha-actinin actin-binding domain homology le-13 


[SUPFAM] 


kinesin-related protein KIP1 9e-27 


[SUPFAM] 


kinesin-related protein CIN8 4e-36 


[ SUPFAM] 


kinesin heavy chain 4e-24 


[SUPFAM] 


plectin le-13 


[ SUPFAM] 


trichohyalin 6e-18 


[SUPFAM] 


kinesin-related protein KIF3 le-29 


[SUPFAM] 


kinesin-related protein KIF2 3e-20 


[SUPFAM] 


ribosomal protein S10 homology le-13 


[SUPFAM] 


giantin 8e-24 


[SUPFAM] 


protein kinase homology 3e-16 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 2e-13 


[SUPFAM] 


kinesin-related protein unc-104 8e-26 


[SUPFAM] 


human early endosome antigen 1 le-23 


[SUPFAM] 


unassigned kinesin-related proteins le-28 


[SUPFAM] 


Mycoplasma genitalium hypothetical protein MG218 4e-17 


[SUPFAM] 


myosin heavy chain 5e-25 


[SUPFAM] 


conserved hypothetical P115 protein 4e-20 


[SUPFAM] 


centromere protein E 5e-24 


[SUPFAM] 


calmodulin repeat homology 6e-18 


[SUPFAM] 


kinesin-related protein KLP61F le-25 


[SUPFAM] 


hypothetical protein MJ0914 3e-12 


[SUPFAM] 


kinesin-related protein MKLP-1 2e-63 


[ SUPFAM] 


pleckstrin repeat homology 8e-26 


[SUPFAM] 


hypothetical protein MJ1322 4e-13 


[SUPFAM] 


kinesin-related protein KT FIB 3e-28 


[SUPFAM] 


kinesin motor domain homology 2e-63 


[ SUPFAM] 


kinesin-related protein KLPA 7e-25 


[SUPFAM] 


kinesin-related protein nodA le-12 


[SUPFAM] 


kinesin-related protein Eg5 5e-30 


[PROSITE] 


ATP_GTP_A 1 


[PFAM] 


Kinesin motor domain 


[KW] 


Irregular 


[KW] 


3D 


[KW] 


LOW COMPLEXITY 7 . 53 % 


[KW] 


COILED COIL 19.78 h 



SEQ MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 

SEG 

COILS 

3kar- 

SEQ VCLRIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 

SEG 

COILS 

3kar- 

SEQ PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 

SEG 

COILS 

3kar- 

SEQ DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQIKEVTVHNDSDDTLYGSL 

SEG 

COILS 

3kar- 

SEQ TNSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKML 

SEG 

COILS 

3kar- EEEEEEEEEEETTEEEETTTCC CCEE 

SEQ RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 

SEG 

COILS 

3kar- eeetttte-eeeettcceeeccggghhhhhhhhhhhhccttttchhhhhhceeeeeeeee 

SEQ QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

SEG 

COILS 

3kar- E — EETTTTCEEEEEEEEEECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

SEQ EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDETLtJVLKFSAIAQKVC 

SEG 

COILS 

3 kar- TTTT — TCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHH 

SEQ VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IREEVTQEFTQYWAQREADFKETLLQEREILEENAERRLAIFKDLVGKCDTREEAAKDIC 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKII 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ TQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS CCCC 

3kar- 

SEQ QENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQIQSNY 

SEG xxxxxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- 

SEQ WEECKEIVKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ AKLEQDILEKESI ILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEI TQLTNNLQDMK 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG . xxxxxxxxxxxxxxxxxxx 

COILS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG 

COILS CCCCCCCCCCCC 

3kar- 

SEQ ERACKDLNVKEKI I EDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG xxxxxxxxxxxxxxxxx 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENIR 

SEG 

COILS CC 

3kar- 

SEQ NKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSN 

SEG 
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COILS 

3kar- 

SEQ VQKDNEIEQLKRIISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGS 

SEG 

COILS 

3kar- 

SEQ VVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNE 

SEG 

COILS 

3kar- 

SEQ MEEDLVKCENKKNATPRTNLKFFISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- 

SEQ YTSEISSPIDISGQVILMDQKMKESDHQIIKRRLRTKTAK 

SEG 

COILS 

3kar- 



Prosite for DKFZphtes3_35b4 . 3 



PS00017 152->160 ATP_GTP_A PDOC00017 

Pfara for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpWtGyktvhnghegds phks . 

R+RP+ + E++ + +V + ++++ ++ + ++ 
Query 64 RIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQK 112 

HMM FtFDHVFWWncTQedVYdtvAHPI VDDcFhGYNCTI FAYGQTGSGKTYTM 

F+F +VF++++TQ++ ++4 + V+D+++G IF+YG T SGKTYT 
Query 113 FSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTF 162 

HMM MGpggehPDHmGIIPRcCHDIFdrldkfqekDhdFW 

G +++GI + PR++-+ +FD++ + +++ 

Query 163 QG TEENIGILPRTLNVLFDSLQERL-YTKMNLKPHRSREYLRLSSE 207 

HMM 

Query 208 QEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESIKDYE 257 

HMM hVkCSYMEI YNEelYDLLCPnP . . . qhMkpLnlHEHPN 

+V +S++EIYNE+IYDL +P++ Q++K L++ + + 
Query 258 QANLNMANSIKFSVWVSFFEIYNEYI YDLFVPVSSKFQKRKMLRLSQDVK 307 

HMM MGpYVqGCTEf HVcS YeDachWIWqGnknRHVAaTnMNdhSSRSHt I FTI 

++++++ V +A +++ +G K+ VA T++N SSRSH+IFT+ 

Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTV 357 

HMM HVeQrHk . qcdehvcHSKMNLVDLAGSERvnrTGAEGQRIKEGcNINqSL 

++ Q + + +++S +4L DLAGSER+ +T+ EG RL+E +NIN SL 
Query 358 KILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSL 407 

HMM ttLGnVInaLaDgqTKYmYgghgHIPYRDSKLTWlLQDSLGGNcKTcMIA 
+TLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI+ 
Query 408 LTLGKCINVLKNSE KSKFQQHVPFRESKLTHYFQSFFNGKGKICMIV 454 

HMM CIWPadWNYEETLSTLRYAdRAKnlkNkPQINEDPca* 

+1+ + Y+ETL++L++ + A+++ + ++N+++++ 
Query 455 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 4 91 
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group: metabolism 

DKFZphtes3_35b5 encodes a novel 466 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase (V-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of speciali2ed cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E. The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 



strong similarity to bovine vacuolar ATPase (EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-terminus 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2043 bp 

Poly A stretch at pos. 2033, polyadenylation signal at pos. 2012 



1 GGCGGCCATG GCGACGGCTC GAGTGCGGAT GGGGCCGCGG TGCGCCCAGG 
51 CGCTCTGGCG CATGCCGTGG CTGCCGGTGT TTTTGTCGTT GGCGGCGGCG 
101 GCGGCGGCGG CAGCGGCGGA GCAGCAGGTC CCGCTGGTGC TGTGGTCGAG 
151 TGACCGGGAC TTGTGGGCTC CTGCGGCCGA CACTCATGAA GGCCACATCA 
201 CCAGCGACTT GCAGCTCTCT ACCTACTTAG ATCCCGCCCT GGAGCTGGGT 
251 CCCAGGAATG TGCTGCTGTT CCTGCAGGAC AAGCTGAGCA TTGAGGATTT 
301 CACAGCATAT GGCGGTGTGT TTGGAAACAA GCAGGACAGC GCCTTTTCTA 
351 ACCTAGAGAA TGCCCTGGAC CTGGCCCCCT CCTCACTGGT GCTTCCTGCC 
401 GTCGACTGGT ATGCAGTCAG CACTCTGACC ACTTACCTGC AGGAGAAGCT 
451 CGGGGCCAGC CCCTTGCATG TGGACCTGGC CACCCTGCGG GAGCTGAAGC 
501 TCAATGCCAG CCTCCCTGCT CTGCTGCTCA TTCGCCTGCC CTACACAGCC 
551 AGCTCTGGTC TGATGGCACC CAGGGAAGTC CTCACAGGCA ACGATGAGGT 
601 CATCGGGCAG GTCCTGAGCA CACTCAAGTC CGAAGATGTC CCATACACAG 
651 CGGCCCTCAC AGCGGTCCGC CCTTCCAGGG TGGCCCGTGA TGTAGCCGTG 
701 GTGGCCGGAG GGCTAGGTCG CCAGCTGCTA CAAAAACAGC CAGTATCACC 
751 TGTGATCCAT CCTCCTGTGA GTTACAATGA CACCGCTCCC CGGATCCTGT 
801 TCTGGGCCCA AAACTTCTCT GTGGCGTACA AGGACCAGTG GGAGGACCTG 
851 ACTCCCCTCA CCTTTGGGGT GCAGGAACTC AACCTGACTG GCTCCTTCTG 
901 GAATGACTCC TTTGCCAGGC TCTCACTGAC CTATGAACGA CTCTTTGGTA 
951 CCACAGTGAC ATTCAAGTTC ATTCTGGCCA ACCGCCTCTA CCCAGTGTCT 
1001 GCCCGGCACT GGTTTACCAT GGAGCGCCTC GAAGTCCACA GCAATGGCTC 
1051 CGTCGCCTAC TTCAATGCTT CCCAGGTCAC AGGGCCCAGC ATCTACTCCT 
1101 TCCACTGCGA GTATGTCAGC AGCCTGAGCA AGAAGGGTAG TCTCCTCGTG 
1151 GCCCGCACGC AGCCCTCTCC CTGGCAGATG ATGCTTCAGG ACTTCCAGAT 
1201 CCAGGCTTTC AACGTAATGG GGGAGCAGTT CTCCTACGCC AGCGACTGTG 

12 51 CCAGCTTCTT CTCCCCCGGC ATCTGGATGG GGCTGCTCAC CTCCCTGTTC 
1301 ATGCTCTTCA TCTTCACCTA TGGCCTGCAC ATGATCCTCA GCCTCAAGAC 

13 51 CATGGATCGC TTTGATGACC ACAAGGGCCC CACTATTTCT TTGACCCAGA 

14 01 TTGTGTGACC CTGTGCCAGT GGGGGGGTTG AGGGTGGGAC GGTGTCCGTG 
14 51 TTGTTGCTTT CCCACCCTGC AGCGCACTGG ACTGAAGAGC TTCCCTCTTC 
1501 CTACTGCAGC ATGAACTGCA AGCTCCCCTC AGCCCATCTT GCTCCCTCTT 
1551 CAGCCCGCTG AGGAGCTTTC TTGGGCTGCC CCCATCTCTC CCAACAAGGT 
1601 GTACATATTC TGCGTAGATG CTAGACCAAC CAGCTTCCCA GGGTTCGTCG 
1651 CTGTGAGGCG TAAGGGACAT GAATTCTAGG GTCTCCTTTC TCCTTATTTA 
17 01 TTCTTGTGGC TACATCATCC CTGGCTGTGG ATAGTGCTTT TGTGTAGCAA 
17 51 ATGCTCCCTC CTTAAGGTTA TAGGGCTCCC TGAGTTTGGG AGTGTGGAAG 
1801 TACTACTTAA CTGTCTGTCC TGCTTGGCTG CCGTTATCGT TTTCTGGTGA 
1851 TGTTGTGCTA ACAATAAGAA GTACACGGGT TTATTTCTGT GGCCTGAGAA 
1901 GGAAGGGACC TCCACGACAG GTGGGCTGGG TGCGATCGCC GGCTGTTTGG 
1951 CATGTTCCCA CCGGGAGTGC CGGGCAGGAG CATGGGGTGC TTGGTTGTTT 
2001 CCTTCCTAAT AAAATAAACG CGGGTCGCCA TGCAAAAAAA AAA 



BLAST Results 



No BLAST result 
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Medline entries 



95014142: 

A novel accessory subunit for vacuolar H(+|-ATPase from chromaffin 
granules . 

97215246: 

Identification of a rat brain gene associated with aging by 

PCR differential display method. 



Peptide information for frame 2 



ORF from 8 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 



1 MATARVRMGP RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VPLVLWSSDR 
51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLOEKLGA 
151 SPLHVDLATL RELKLNASLP ALLLIRLPYT ASSGLMAPRE VLTGNDEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA VVAGGLGRQL LQKQPVSPVI 
251 HPPVSYNDTA PRILFWAQNF SVAYKDQWED LTPLTFGVQE LNLTGSFWND 
301 SFARLSLTYE RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
351 YFNASQVTGP SIYSFHCEYV SSLSKKGSLL VARTQPSPWQ MMLQDFQIQA 
401 FNVMGEQFSY ASDCASFFSP GIWMGLLTSL FMLFIFTYGL HMILSLKTMD 
451 RFDDHKGPTI SLTQIV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b5, frame 2 

TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus 
norvegicus C7-1 protein (C7-1) mRNA, complete cds., N = 1, Score = 
2088, P = 3.8e-216 

PIR:A55116 vacuolar ATPase (EC 3.6.1.-) chain Ac45 - bovine, N = 1, 
Score =■ 2011, P - 5.5e-203 

PIR:I54197 hypothetical protein - human, N = 1, Score = 1464, P = 
5. le-150 



>TREMBL : AF03 538 7_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus 
C7-1 protein (C7-1) mRNA, complete cds. 
Length =4 63 

HSPs: 



Score = 2088 (313.3 bits), Expect = 3.8e-216, P = 3.8e-216 
Identities = 408/463 (88%), Positives = 426/463 (92%) 



Query: 


4 


ARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTHEGH 


63 






+R+R G R A LW + LSL A AAA AAEQQVPLVLWSSDRDLWAP ADTHEGH 




Sbjct: 


8 


SRIRTGTRWAPVLW LLLSLVAVAAAVAAEQQVPLVLWS S DRDLWAP VADTHEGH 


61 


Query: 


64 


ITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAF5NLENALDLA 


123 






ITSD+QLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 




Sbjct: 


62 


ITSDMQLSTYLDPALELGPRNVLLFLQDKLSI EDFTAYGGVFGNKQDSAFSNLENALDLA 


121 


Query: 


124 


PSSLVLPAVDWYAVSTLTTYLQEKLGASPT,HVDLATLRELKLNASLPALLLIRLPYTASS 


183 






PSSLVLPAVDWYA+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLIRLPYTASS 




Sbjct: 


122 


PSSLVLPAVDWYAISTLTTYLQEKLGASPLHVDLATLKELKLNASLPALLLIRLPYTASS 


181 


Query : 


184 


GLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQLLQK 


243 






GLMAPREVLTGNDEVIGQVLSTL+SEDVPYTAALTAVRPSRVARDVA+VAGGLGRQLLQ 




Sbjct: 


182 


GLMAPREVLTGNDEVIGQVLSTLESEDVPYTAALTAVRPSRVARDVAMVAGGLGRQLLQT 


241 


Query: 


244 


QPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWNDSFA 


303 






Q SP IHPPVS YNDTAPRILFWAQNFSVAYKD+W+DLT LTFGV+ LNLTGSFWNDSFA 




Sbjct: 


242 


QVASPAIHPPVSYNDTAPRILFWAQNFSVAYKDEWKDLTSLTFGVENLNLTGSFWNDSFA 


301 


Query: 


304 


RLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGPSI Y 


363 






LSLTYE LFG TVTFKFILA+R YPVSAR+WFTMERLE+HSNGSVA+FN SQVTGPSIY 
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Sbjct: 302 MLSLTYEPLFGATVTFKFILASRFYPVSARYWFTMERLEIHSNGSVAHFNVSQVTGPSIY 361 

Query: 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSPGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 

Sbjct: 362 SFHCEYVSSLSKKGSLLVTNV-PSLWQMTLHNFQIQAFNVTGEQFSYASDCAGFFSPGIW 420 

Query: 424 MGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 466 

MGLLT+LFMLFI FTYGLHMILSLKTMDRFDD KGPTI+LTQIV 
Sbjct: 421 MGLLTTLFMLFIFTYGLHMILSLKTMDRFDDRKGPTITLTQIV 463 



Pedant information for DKFZphtes3_35b5, frame 2 



Report for DKFZphtes3_35b5 . 2 



[LENGTH] 4 66 

[MW] 51621.44 

[pi] 5.73 

[HOMOL] TREMBL:AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 

protein (C7-1) mRNA, complete cds. 0.0 

[PIRKW] hydrolase 0.0 

[PROSITE] MYRISTYL 7 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOS YLATION 7 

[KW] SIGNAL_PEPTIDE 38 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 11.59 % 



SEQ MATARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTH 

SEG xxxxxxxxx 

PRD ccceeeecccchhhhhhhcccchhhhhhhhhhhhhhhhhccceeeecccccccccccccc 

MEM 

SEQ EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 

SEG 

PRD ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 

MEM 

SEQ DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

SEG xxxxxxxxxxxxxxx . . . 

PRD ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 

MEM 

SEQ ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQL 

SEG xxxxxxxxxxxxxxxxxxxx . . 

PRD cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 

MEM 

SEQ LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWND 

SEG 

PRD hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

MEM 

SEQ SFARLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 

SEG 

PRD hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

MEM 

SEQ SIYSFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSP 

SEG xxxxxxxxxx 

PRD ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 

MEM MMMMMM 

SEQ GIWMGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_35b5 . 2 

PS00001 166->170 ASN_GLYCOS YLATION PDOC00001 

PS00001 257->2 61 ASN_GLYCOS YLATION PDOC00001 

PS00001 269->273 ASN GLYCOS YLATION PDOC00001 
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PS00001 


292- 


>296 


ASN GLYCOSYLATION 


PDOC00001 


pc fiOOOl 


299- 


•>303 


ASN~ GLYCOSYLATION 


PDOC00001 


PS00001 


346- 


•>350 


ASNGLYCOSYLATION 


PDOC00001 


pcnoo 01 

IT tD \J \J v — ' -L 


353- 


•>357 


ASN - GLYCOSYLATION 


PDOC00001 


PS00004 


375- 


>379 


CAMP PHOSPHO SITE 


PDOC00004 


lT O v/ v w J-J 




3->6 


PKC PHOSPHO SITE 


PDOC00005 


L O u VJ V v J 


46 


l->51 


PKC~ PHOS PHO~~ S I TE 


PDOC00005 


PS00005 


159- 


■>162 


PKC _ PHOS PHO _ S I TE 


PDOC00005 


PS00005 


205- 


>208 


PKC~ PHOS PHO _ S I TE 


PDOC00005 


PS00005 


318- 


•>32 1 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


331- 


>334 


PKC _ PHOSPHO - SITE 


PDOC00005 


PSOO0Q5 


374- 


>377 


PKC _ PHOSPHO - SITE 


PDOC00005 


PS00005 


445- 


•>448 


PKC - PHOS PHO — S I TE 


PDOC00005 


PS00006 


4£ 


->52 


CK2~ PHOSPHO~SITE 


PDOC00006 


PS00006 


12 


:->7 6 


CK2 — PHOS PHO _ S I TE 


PDOC00006 


PS00006 


94 


->98 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 

o v_/ ^ v./ W \J 


114- 


>118 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


159- 


>163 


CK2 — PHOSPHO - SITE 


PDOC0000G 


PS00006 


193- 


>197 


C K2 _ PHOS P HO - S I T E 


PDOC00006 


PS00006 


255- 


>259 


CK2 - PHOSPHO _ SITE 


PDOC00006 


PS00007 


207- 


>214 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


102- 


>108 


MYRISTYL 


PDOC00008 


PS00008 


103- 


>109 


MYRISTYL 


PDOC00008 


PS00008 


200- 


>206 


MYRISTYL 


PDOC00008 


PS00008 


295- 


>301 


MYRISTYL 


PDOC00008 


PS00008 


314- 


>320 


MYRISTYL 


PDOC00008 


PS00008 


421- 


>427 


MYRISTYL 


PDOC00008 


PS00008 


425- 


>431 


MYRISTYL 


PDOC00G03 



(No Pfam data available for DKFZphtes3_35b5 . 2 ) 
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group: differentiation/development 

DKFZphtes3_35e21 . 2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7 . 

Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, and 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. 



similarity to interleukin-7 precursor 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2095 bp 

Poly A stretch at pos. 2085, polyadenylation signal at pos. 2067 



1 GGATGAAAGT GATTTAATTC ATTTTTAGAA TTTTTTTTTT GTTTTGTTTT 

51 AGCAACATGC TGAACAACTA ATTTACTTTA AAAATAAGCC AGTTAAAACA 

101 AAGGACGCTA AGCCCAAGTG GGGGGCAATA TTAGTCAGGA TCTTTGGGGT 

151 CTAATTCCAG ACCAACTTTC AGAAGCACTT CTTTGTCTCT GTTCTCACCT 

201 CTGCTGTCCC TCTCTTCCCT CATCCCCTAA GAGAGACAAA GATAAAAGCC 

251 CACCTGCATC CCTAAGTCTT ACTGAGATCA GCCACCCCAG GGGAGAGAAA 

301 CTGGATCTAC TTACAGCCAC CCCCTGTTTC CATCCATATA CTTACTTCCC 

351 CCAATTTGCA TGTGATTATG GAAACAAGTC ATGCTCATGA AAGCAACTGT 

401 AAAATAAAAG GTTATGGAGT AGTTCAGCAA CTTCTTCACA GCCAGCTTTG 

451 TGGAGCTGGG GAGGACTTAG GGCCCATTGG AGTCTCTTAT GTGTACAGCT 

501 TCAGGGCTGT CCCTTTCAGT TTGATTTTAA GCAATGCCTC ACTTCATAGC 

551 TTAGGGGGTA AGGATTCCAT TCAGGTAGGT TGTCTAAAGG AACTAATGGG 

601 ACCTCTCAGT GAATTAGCTG ACCAGATTTT AGGAAATCTT TTTAATTTCT 

651 ATGATTTTCC TTCTCACATT TTGAAATGGT AAAATTGACT GGAAATAATT 

701 TTTCTTGGTG CCTTATTGGT TTTCCTTGCA AACCTTTCTC ATATTTTCTC 

751 ATGACCATTG CCAGTGACCA AGGCCCATGT GTGTGTTGTG TGTAATTGTG 

801 GGCATGTACA AGCTTAAATA ACGTGCCGAC AGCACTGTTT CAAAGTTGGT 

8 51 ATTCATTAGG CTGTTGCCTC CTGGGCTGGA GCTGCGCTAA TCCTGACACC 

901 GGCTGCCAGG AGAAAACCTC ATGGATCACA CACCAAACCT TAATAACAGC 

951 ATCCGTGACC TGCACTCTCC AGTACAGAAT GGGAACCCCA GAGCTAGGAA 

1001 ATGTAGTTGT ATATTTTAAT GAACTGCTAC CCCAGCCAAA GAAGCTTCTT 

1051 TCACTTTTGT GCTCTACAGA AAGCCCAAGG GGGGTAGGAG GGACAGAGCT 

1101 TTGAATAACT GCTTTCTAAC ACTAAATGTG GCCAACAGGA CAGAGCACAT 

1151 CACACGTATA GGCAGGTGTG AGGGACAGTG GCTAAGAATT GCCTGCTCCC 

1201 TCTGCATGCT CTTTCTTGTT TCCAAAGTCC AATCAAGTGA TCCTGGGAAA 

1251 CAAATCTGTC TGGATTGCGG AGGGTGGTTC TGAAAGAACT GCCAAGACGT 

1301 TAAAGAAGGG TGAAGAGTAG GCAGAATATA AGTAGCTAAC CTGAGTCAAG 

1351 ACTCTCAAAA GCTAGCAGCC TGATGACAAT AGGATTTATT TCAGCCAGGA 

14 01 TAGTGTCTGT CTGTGAGTGC ATCATTTTAA GACAGTATGA CTTCATGTTG 

14 51 TTACAAACTA TGTATAGTAT GTATGTTTTG TGGGTTGTAT ATATACATAA 

1501 TATATATTAT ATATATATAT GAGAGATTTG GTGACTTTTG ATACGGGTTT 

1551 GGTGCAGGTG AATTTATTAC TGAGCCAAAT GAGGCACATA CCGAGTCAGT 

1601 AGTTGAAGTC CAGGGCATTC GATACTGTTT ATGATTTCCA TATATGTATA 

1651 GTGCCTATCC CATGCTGTAG TCACTGTTAT GTTAAATCCA GAAGTTACAC 

1701 TAGAGCCAGC GATACTTTAT TTGTAGACAA TCAATTTGAA TCCATATGTT 

1751 ATTACTGGCA GATGATACAT GATTACAGTT CTGAATCTGT AACACTTACA 

1801 AAAGGAAACC CAGAGCAGCT TGATGAGTTT TTGTTTCTGC TTCGTTCCTG 

1851 GGAGTCAGTA GAAACAGCAG TTGTATGTGG TTATGTTAGT CTCAAGATAC 

1901 TTAATTTGTT GACCTTACTT CAGAAAAATT TTGTATGTAT TATATTTGTG 

1951 GGAAGGTAAA ATAATCATTT GAGATTTTTA TCAAATATGA AGATTAGTTA 

2001 TTTATGAAAA ACAAAGAAAT GTCTATTTTT CTTTGTTCCC AATTAATGTA 

2051 GATAAATTTT AAAATGCATT AAAGTAATGG TCCGGAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



89098903: 

Human interleukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 



Peptide information for frame 2 



ORF from 368 bp to 679 bp; peptide length: 104 
Category: similarity to known protein 



1 METSHAHESN CKIKGYGVVQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLASTP hits 

Entry B32223 from database PIR: 
interleukin-7 precursor (clone 1) - human 

Score =66, P = 7.0e-01, identities = 21/70, positives = 33/70 



Alert BLASTP hits for DKFZphtes3_35e21, frame 2 

PIR:B32223 interleukin-7 precursor (clone 1) - human, N = 1, Score = 
66, P = 0.72 

TREMBL : PADAL1_1 gene: "dall"; P.abies dall mRNA, N = 2, Score = 59, P 
= 0.77 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N = 1, Score = 
66, P = 0.79 

TREMBL :PRU76726_1 gene: "PrMADS3"; product: "MADS-box protein"; Pinus 
radiata MADS-box protein (PrMADS3) mRNA, complete cds . , N = 2, Score = 
59, P = 0.94 



>PIR:B32223 interleukin-7 precursor (clone 1) - human 
Length = 133 

HSPs: 

Score = 66 (9.9 bits), Expect = 1.3e+00, P = 7.2e-01 
Identities = 21/68 (30%), Positives = 33/68 (48%) 

Query: 39 VSYVYSFRAVPFSLIL SNASLHSLGGK — DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L S+ + GK +S+ + + +L+ + E+ L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF— FKRHI 71 



Pedant information for DKFZphtes3_35e21, frame 2 



Report for DKFZphtes3_35e21 . 2 



[ LENGTH] 104 

[MW] 11339.12 

tpl] 5.87 

[PROSITE] MYRISTYL 2 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 



SEQ METSHAHESNCKIKGYGVVQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ SLGGKDSIQVGCLKELMGPLSELADQILGNLFNFYDFPSHILKW 
PRD cccccceeeccccccccccchhhhhhhhcccccccccccccccc 



Prosite for DKFZphtes3_35e21 . 2 

PS00001 56->60 ASN_GLYCOS YLATION PDOC00001 

PS00005 44->47 PKC_PHOSPHO_SITE PDOC00005 

PS00008 63->69 MYRISTYL PDOC00008 

PS00008 89->95 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35e21 . 2 ) 
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DKFZphtes3_35g6 



group: testes derived 



DKFZphtes3_35g6 encodes a novel 482 amino acid protein with high partial similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



strong similarity to R27216_l 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map="15" 
Insert length: 3177 bp 

Poly A stretch at pos . 3167, polyadenylation signal at pos. 3148 



1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 
51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 
101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 
151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 
201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 
251 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 
301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 
351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 
401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 
4 51 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 
501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 
551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 
601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 
651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 
701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 
751 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 
801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 
851 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 
901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 
951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 
1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 
1051 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 
1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 
1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 
1201 CTACAGATTA TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 
1251 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 
1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 
1351 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 
1401 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 
1451 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 
1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 
1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 
1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 
1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 
1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 
1751 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 
1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 
1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 
1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 
1951 AAGAGATGGG TCAGTATTCC TACAGAATTC TTATTAACTC AAATAACTAA 
2001 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 
2 051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 
2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 
2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 
2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 
2251 AACTGCACCA CTATACCTGT CTCTCTGTGT GGGGGACACT GCTGATGATT 
2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 
2 351 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 
2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 
2451 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 
2 501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 
2 551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 
2 601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 
2 651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 
2701 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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2751 AAGATACAAA AAATTTTCAT 
2801 GAAGGTAGGT ATATTGGTGG 
2 851 TTTTTCTATG GTAATGCTCT 
2 901 ATCTATGGGA TGTGTGGTTC 
2951 CTGTAGTAAC CATTACAGAA 
3001 CAGAGATGAG TTAGTGTTTC 
3051 TGTTGTACTG AACAATTGAA 
3101 CAGAACTGTT TACTAACTTT 
3151 TAAATATATA TAT AT AT AAA 



CTAAAGTAAT ATTTCACTTT ATATTGTAAA 
CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 
TACGGATATA AGCCTCAGTT AAATGGAATT 
TGGTTAACTA AAAATTAACC AGTAAACACT 
AATACTTCTG CCTTAAAAAA TATGATATGC 
TTGACGTTGG AGACCTATAA ATGCCTCATC 
ACTGCATGCA GCCATAAAAG GGACAAGAAA 
GGGACATCCC CTGGAGTTTT TAAAAATAAA 
AAAAAAA 



BLAST Results 



Entry G37753 from database EMBL : 
SHGC-63477 Human Homo sapiens STS genomic. 
Score = 1627, P = 3.0e-66, identities = 327/329 

Entry G37752 from database EMBL: 
SHGC-63476 Human Homo sapiens STS genomic. 
Score = 1578, P = 6.2e-64, identities = 320/324 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 84 bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 



MASLGPAAAG 
TKASLKERFA 
SAVFDAMFNG 
TLYTAKKYAV 
DTIDKSTMDA 
ECQRQQLPVT 
VNLFLHFTVN 
IRFTVNRRIS 
DGTANTFRVM 
ASKTVFFFFS 



EQASGAEAEP 
FLFNSELLSD 
GMATTSAEIE 
PALEAHCVEF 
ISAEGFTDID 
FGNKQKVLGK 
PKPRVEYIDR 
IVGFGLYGSI 
FKEPIEILPN 
SPGNNNGTSI 



GPAGPPPPPS 
VRFVLGKGRG 
LPDVEPAAFL 
LTKHLRADNA 
IDTLCAVLER 
ALSLIRFPLM 
PRCCLRGKEC 
HGPTDYQVNI 
VCYTACATLK 
EDGQIPEIIF 



PSSLGPLLPL 
AAAAGGPQRI 
ALLRFLYSDE 
FMLLTQARLF 
DTLSIRESRL 
TIEEFAAGPA 
CIKRS'QQVES 
QIIEYEKKQT 
GPDSHYGTKG 
YT 



QREPLYNWQA 
PAHRFVLAAG 
VQIGPETVMT 
DEPQLASLCL 
FGAVVRWAEA 
QSGILSDREV 
RWGYSGTSDR 
LGQNDTGFSC 
LKKVVHE'l'PA 



BLASTP hits 



Entry AC005306_2 from database TREMBL: 

product: "R27216_l"; Homo sapiens chromosome 19, cosmid R27216, 
complete sequence. 

Score = 1298, P = 1.9e-132, identities = 245/297, positives = 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: "F38H4.7"; Caenorhabditis elegans cosmid F38H4 

Score = 1237, P = 5.6e-126, identities = 248/446, positives - 322/446 
Entry AC004678_1 from database TREMBL: 

product: "R34094_l"; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score = 555, P = 1.0e-53, identities = 112/137, positives = 123/137 



Alert BLASTP hits for DKFZphtes3_35g6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35g6, frame 3 



Report for DKFZphtes3_35g6 . 3 



[LENGTH] 482 

[MW] 52771.47 

[pi] 5.79 
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[HOMOL] 


TREMBL:AC005306 2 product: "R27216 1"; Homo sapiens chromosome 19, cosmid 


R27216, complete sequence, le-142 




WT.ninT^n lrpt"at"o ^n/H V-int-\/»'jat~ei lr "i n ^ f aTni 1 »/ nrnhoi ns 




POZ domain homology 3e — 08 


[SUPFAM] 


A55R protein middle region homology 5e-06 


[SUPFAM] 


A55R protein 5e-06 


[SUPFAM] 


A55R protein carboxyl-terminal homology 5e-06 


[PROSITE] 


MYRISTYL 6 


[PROSITE] 


CAMP PHOSPHO SITE 2 


[PROSITE] 


CK2 PHOSPHO SITE 9 


[PROSITE] 


TYR PHOSPHO SITE 1 


[PROSITE] 


PKC PHOSPHO SITE 7 


[PROSITE] 


ASN GLYCOS YLATION 2 


[KW] 


Alpha Beta 


[KW] 


LOWCOMPLEXITY 11.20 % 



SEQ MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SEQ FLFNSELLSDVRFVLGKGRGAAAAGGPQRIPAHRFVLAAGSAVFDAMFNGGMATTSAEIE 

SEG xxxxxxxxxxx 

PRD hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

SEQ LPDVEPAAFLALLRFLYSDEVQIGPETVMTTLYTAKKYAVPALEAHCVEFLTKHLRADNA 

SEG 

PRD ecccchhhhhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

SEQ FMLLTQARLFDEPQLASLCLDTIDKSTMDAISAEGFTDIDI DTLCAVLERDTLSIRESRL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

SEQ FGAVVRWAEAECQRQQLPVTFGNKQKVLGKALSLIRFPLMTIEEFAAGPAQSGILSDREV 

SEG 

PRD hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceeecccccccccccccchhhhh 

SEQ VNLFLHFTVNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRIRFTVNRRIS 

SEG 

PRD hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

SEQ IVGFGLYGSIHCPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

SEG 

PRD eeeocccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

SEQ VCYTACATLKGPDSHYGTKGLKKVVHETPAASKTVFFFFSSPGNNNGTSIEDGQIPEIIF 

SEG xxxxxx 

PRD ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

SEQ YT 
SEG 

PRD cc 



Prosite for DKFZphtes3_35g6 . 3 



PS00001 


394- 


>398 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


466- 


•>470 


ASN GLYCOS YLATION 


pdocooooi 


PS00004 


357- 


•>361 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


387- 


>391 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


54 


:->57 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


154- 


■>157 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


234- 


>237 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


296- 


•>299 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


348- 


■>351 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


406- 


>409 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


428- 


•>431 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


14 


:->18 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


54 


->58 


CK2 PHOSPHO SITE 


PDOC0000S 


PS00006 


115- 


>119 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


206- 


>210 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


217- 


■>221 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


234- 


•>238 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


281- 


•>285 


CK2 PHOSPHO SITE 


PDOC00006 


psooooe 


296- 


■>300 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


468- 


>472 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


430- 


>437 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


8C 


i->86 


MYRISTYL 


PDOC00008 


PS00008 


110- 


>116 


MYRISTYL 


PDOC00008 


PS00008 


365- 


>371 


MYRISTYL 


PDOC00008 
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PS00008 392->398 MYRISTYL PDOC0000B 

PS00008 402->408 MYRISTYL PDOC00008 

PS00008 463->469 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35g6. 3) 
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DKFZphtes3_35kl6 



group: metabolism 

DKFZphtes3_35kl6 encodes a novel 666 amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases . 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found in several CoA synthetases, such as acetate-CoA ligase (EC 6.2.1.1), long- 
chain-f atty-acid-CoA ligase (EC 6.2.1.3), 

bile acid-CoA ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate . 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 



similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2520 bp 

Poly A stretch at pos. 2510, polyadenylation signal at pos . 2490 



1 CAGATGTCCC AGCTCCAGTG CTGTGGAGCA TGGTTTCTGC ACACCTGGAA 
51 TGACTGGAAC CCCAAAGACT CAAGAAGGAG CTAAAGATCT TGAAGTAGAC 
101 ATGAATAAAA CAGAAGTTAC TCCCAGGCTG TGGACCACCT GTCGAGATGG 
151 AGAAGTCCTT CTGAGGCTAT CCAAACACGG ACCAGGCCAT GAGACCCCGA 
201 TGACCATCCC TGAATTTTTT CGAGAGTCAG TCAACCGATT TGGAACTTAT 
251 CCAGCCCTCG CATCCAAGAA TGGCAAAAAG TGGGAAATTC TGAATTTCAA 
301 CCAGTACTAT GAGGCTTGTC GGAAGGCTGC AAAATCCTTG ATCAAGCTGG 
351 GTTTGGAGCG TTTCCACGCA GTTGGTATCC TGGGGTTTAA CTCTGCAGAG 
401 TGGTTTATCA CTGCTGTTGG TGCCATCCTA GCCGGGGGTC TTTGTGTTGG 
451 TATTTATGCC ACCAACTCTG CCGAGGCTTG TCAATATGTC ATCACTCATG 
501 CCAAAGTGAA CATCTTGCTG GTTGAGAATG ATCAACAGTT ACAGAAAATC 
551 CTTTCGATTC CACAGAGCAG CCTAGAGCCC CTAAAAGCGA TCATCCAGTA 
601 CAGACTGCCA ATGAAGAAGA ACAACAACTT GTACTCTTGG GATGATTTCA 
651 TGGAACTTGG CAGAAGTATC CCTGACACCC AACTGGAGCA GGTCATCGAG 
701 AGCCAGAAGG CGAATCAATG CGCAGTGCTC ATCTACACTT CAGGGACCAC 
751 AGGCATACCC AAGGGAGTGA TGCTCAGTCA TGACAACATC ACGTGGATTG 
801 CAGGAGCAGT GACAAAGGAC TTTAAACTGA CAGACAAGCA TGAGACGGTG 
851 GTTAGCTACC TCCCACTCAG CCATATTGCA GCACAGATGA TGGACATCTG 
901 GGTACCCATA AAGATTGGGG CGCTCACATA CTTTGCTCAA GCAGATGCTC 
951 TCAAGGGCAC CTTGGTAAGT ACTCTAAAGG AGGTAAAACC TACTGTCTTC 
1001 ATTGGAGTGC CTCAAATTTG GGAGAAGATA CATGAGATGG TGAAGAAAAA 
1051 TAGTGCCAAG TCCATGGGCT TGAAGAAGAA GGCATTCGTG TGGGCAAGAA 
1101 ACATTGGCTT CAAGGTCAAC TCAAAAAAGA TGTTGGGGAA ATATAATACT 
1151 CCCGTGAGCT ACCGCATGGC TAAGACTCTC GTGTTCAGCA AAGTCAAGAC 
1201 ATCCCTTGGC TTGGATCACT GTCACTCTTT TATCAGTGGG ACTGCGCCCC 
1251 TCAACCAAGA GACTGCCGAG TTCTTTCTAA GCTTGGACAT ACCTATAGGC 
1301 GAGTTGTATG GGTTGAGTGA GAGCTCGGGA CCCCACACGA TATCCAACCA 
1351 GAATAACTAC AGGCTTCTAA GCTGTGGCAA GATCTTGACT GGGTGTAAGA 
14 01 ATATGCTGTT CCAGCAGAAC AAGGATGGCA TTGGGGAGAT CTGCCTCTGG 
14 51 GGTAGGCACA TCTTCATGGG CTATCTGGAA AGTGAGACTG AAACTACAGA 
1501 GGCCATCGAT GATGAAGGCT GGCTACACTC TGGGGATCTG GGCCAGCTGG 
1551 ACGGTCTGGG TTTCCTCTAT GTCACCGGCC ACATCAAAGA AATCCTTATC 
1601 ACTGCTGGTG GTGAAAATGT GCCCCCCATT CCTGTTGAGA CCTTGGTTAA 
1651 GAAGAAGATC CCCATCATCA GTAACGCCAT GTTAGTAGGA GATAAACTGA 
17 01 AGTTTCTGAG CATGTTGCTG ACGCTGAAGT GTGAGATGAA TCAGATGAGC 

17 51 GGAGAACCTC TGGACAAGCT GAACTTCGAG GCCATCAACT TCTGTCGGGG 
1801 TCTGGGCAGC CAGGCATCCA CCGTGACTGA GATGGTGAAG CAGCAAGACC 

18 51 CCCTGGTCTA CAAGGCCATC CAGCAAGGCA TCAATGCTGT GAACCAGGAA 
1901 GCCATGAACA ATGCACAGAG GATTGAAAAG TGGGTCATCT TGGAGAAGGA 
1951 CTTTTCCATC TATGGTGGAG AGCTAGGTCC AATGATGAAA CTTAAGAGAC 
2001 ATTTTGTAGC CCAGAAATAC AAAAAACAAA TTGATCACAT GTACCACTGA 
2051 CTGCTTTGAT GGAGCTGCTC TCAGCTGTTC TGATGCCTTC AGCAGGAAGA 
2101 CCTCATTGCA ATAAGTGAAA TGCTGCTCTA GGTAGAAGCT CTCCCTGCTG 
2151 TTTTTAAGAA GCCACATTCC TCATTGGTCA GTTTCTTGAT TGTTCGTCTG 
2201 TTGGAGAGGT GCTCCCTAGA AGAACCTGCC ATACGTTTCA AAGCAATAAA 
2251 ATCACTGTAT ATCTTTCTAA GGACCTTCAA GTCATGACTC CAGGGAAGCC 
2301 TATTGGGAAG TCTACTAAAA ACTGCCTGAT TTACAAGAAA GACCTGAACT 
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2 351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 

24 01 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 

24 51 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 

2501 TTCAGGGTCC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 



1 MTGTPKTQEG AKDLEVDMNK TEVTPRLHTT CRDGEVLLRL SKHGPGHETP 

51 MTIPEFFRES VNRFGTYPAL ASKNGKKWEI LNFNQYYEAC RKAAKSLIKL 

101 GLERFHGVGI LGFNSAEWFI TAVGAILAGG LCVGIYATNS AEACQYVITH 

151 AKVNILLVEN DQQLQKILSI PQSSLEPLKA IIQYRLPMKK NNNLYSWDDF 

201 MELGRSIPDT QLEQVIESQK ANQCAVLIYT SGTTGIPKGV MLSHDNITWI 

251 AGAVTKDFKL TDKHETVVSY LPLSHIAAQM MDIWVPIKIG ALTYFAQADA 

301 LKGTLVSTLK EVKPTVFIGV PQIWEKIHEM VKKNSAKSMG LKKKAFVWAR 

351 NIGFKVNSKK MLGKYNTPVS YRMAKTLVFS KVKTSLGLDH CHSFISGTAP 

401 LNQETAEFFL SLDIPIGELY GLSESSGPHT ISNQNNYRLL SCGKILTGCK 

451 NMLFOQNKDG IGEICLWGRH IFMGYLESET ETTEAIDDEG WLHSGDLGQL 

501 DGLGFLYVTG HIKEILITAG GENVPPIPVE TLVKKKIPII SNAMLVGDKL 

551 KFLSMLLTLK CEMNQMSGEP LDKLNFEAIN FCRGLGSQAS TVTEMVKQQD 

601 PLVYKAIQQG INAVNQEAMN NAQRIEKWVI LEKDFSIYGG ELGPMMKLKR 

651 HFVAQKYKKQ IDHMYH 



BLASTS hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35kl6, frame 2 

TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds . , N = 1 , Score = 1641, P 
= 8.9e-169 

PIR:E70937 probable fadD15 - Mycobacterium tuberculosis (strain H37RV) , 
N = 2, Score = 532, P - 3.6e-62 

PIR:H64041 long-chain-fatty-acid — CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20) , N = 2, Score = 486, P = 6.5e-59 



>TREMBL:AB014 531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length = 634 

HSPs: 



Score = 1641 (246.2 bits), Expect = 8.9e-169, P = 8.9e-169 
Identities = 319/628 (50%), Positives = 440/628 (70%) 



Query: 


38 


LRLSKHGPGHETPMTI PEFFRESVNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSL 


97 




LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 




Sbjct: 


2 


LRIDPSCP— QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 


59 


Query: 


98 


IKLGLERFHGVGILGFNSAEWFITAVGAILAGGLCVGIYATNSAEACQYVITHAKVNILL 


157 




+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N+++- 




Sbjct: 


60 


LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGIYTTSSPEACQYIAYDCCANVIM 


119 


Query: 


158 


VENDQQLQKILSIPQSSLEPLKAIIQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 


216 




V+ +QL+KIL I L LKA++ Y+ P K N+Y+ ++FMELG +P+ L+ +1 




Sbjct: 


120 


VDTQKQLEKILKI-WKQLPHLKAVVI YKEPPPNKMANVYTMEEFMELGNEVPEEALDAII 


178 


Query: 


217 


ESQKANQCAVLI YTSGTTGIPKGVMLSHDNITWIA — GAVTKDFKLTD-KHETVVS YLPL 


273 



842 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



++Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ D + + + E VVSYLPL 
Sbjct: 179 DTQQPNQCCVLVYTSGTTGNFKGVMLSQDNITWTARYGSQAGDIRPAEVQQEVVVSYLPL 238 

Query: 274 SHIAAQMMDIWVPIKIGALTYFAQADALKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKK 333 

SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 
Sbjct: 239 SHIAAQI YDLWTGIQWGAQVCFAEPDALKGSLVNTLREVEPTSHMGVPRVWEKIMERIQE 298 

Query: 334 NSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHS 393 

+A+S +++K +WA ++ + N G P + R+A LV +KV+ +LG C 

Sbjct: 299 VAAQSGFIRRKMLLWAMSVTLEQNLT-CPGSDLKPFTTRLADYLVLAKVRQALGFAKCQK 357 

Query: 394 FISGTAPLNQETAEFFLSLDIPIGELYGLSESSGPHTISNQNNYRLLSCGKILTGCKNML 453 

G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 
SbjCt: 358 NFYGAAPMMAETQHFFLGLNIRLYAGYGLSETSGPHFMSSPYNYRLYSSGKLVPGCRVKL 417 

Query: 454 FQQNKDGIGEICLWGRHTFMGYLESETETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIK 513 

Q+ +GIGEICLWGR IFMGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 
Sbjct: 418 VNQDAEGIGEICLWGRTIFMGYLNMEDKTCEAIDEEGWLHTGDAGRLDADGFLYITGRLK 477 

Query: 514 EILITAGGENVPPIPVETLVKKKI PIISNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDK 573 

E++ITAGGENVPP+P+E VK ++PIISNAML+GD+ KFLSMLLTLKC ++ + + D 
Sbjct: 478 ELIITAGGENVPPVPIEEAVKMELPIISNAMLIGDQRKFLSMLLTLKCTLDPDTSDQTDN 537 

Query: 574 LNFEAINFCRGLGSQASTVTEMVKQQDPLVYKAIQQGINAVNQEAMNNAQRIEKWVILEK 633 

L +A+ FC+ +GS+A+TV+E+++++D VY+AI++GI VN A I+KW ILE+ 

Sbjct: 538 LTEQAVEFCQRVGSRATTVSEI IEKKDEAVYQAIEEGIRRVNMNAAARPYHIQKWAILER 597 

Query: 634 DFSI YGGELGPMMKLKRHFVAQKYKKQIDHMY 665 

DFSI GGELGP MKLKR V +KYK ID Y 
Sbjct: 598 DFSISGGELGPTMKLKRLTVLEKYKGIIDSFY 629 



Pedant information for DKFZphtes3_35kl6, frame 2 



Report for DKFZphte.s3_35k1 6 . 2 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 
mRNA for KIAAO 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 
2e-29 

[FUNCAT] 
2e-23 

[FUNCAT] 

palmitylation, 

[BLOCKS] 

[SCOP] 

[EC] 

[EC] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 



666 

74344.97 
8. 67 

TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo sapiens 
631 protein, partial cds. le-176 
i lipid metabolism (H. influenzae, HI0002] 2e-55 
08.10 peroxisomal transport [S. cerevisiae, YER015w] 2e-29 
30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

01.06.13 lipid and fatty-acid transport [S. cerevisiae, YER015w] 2e-29 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YER015w] 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YMR246w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
f arnesylation and processing) [S. cerevisiae, YMR246w] 2e-23 

BL00455 

dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

1.13.12.7 Photinus-lucif erin 4-monooxygenase (ATP-hydrolysing ) 9e-17 
6.2.1.3 Long-chain-fatty-acid — CoA ligase 4e-34 

5.1.1.11 Phenylalanine racemase (ATP-hydrolysing) 6e-08 

6.2.1.12 4-Coumarate — CoA ligase 8e-18 
duplication 6e-07 
phosphopantetheine 3e-12 
multifunctional enzyme 3e-06 

ligase 6e-08 

acid-thiol ligase 4e-34 

transmembrane protein 5e-22 

monooxygenase 9e-17 

hydrolase 4e-34 

peroxisome 9e-15 

antibiotic biosynthesis 3e-12 

isomerase 6e-08 

flavonoid biosynthesis le-17 

magnesium 9e-15 

ATP 5e-22 

oxidoreductase 9e-17 
liver 2e-31 

alpha-aminoadipyl-cysteinyl-valine synthetase 3e-07 
human long-chain-fatty-acid — CoA ligase 4e-34 
gramicidin S synthetase I 6e-08 
peptide synthetase ppsE 7e-06 

gramicidin S synthetase I repeat homology 3e-12 
peptide synthetase ppsD 2e-07 
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[SUPFAM] 


probable acyl-CoA ligase medium chain 2e-09 


[SUPFAM] 


acetate — Cofl ligase 


8e-10 


[SUPFAM] 


acetate--CoA ligase 


homology 4e-54 


[SUPFAM] 


surfactin synthetase 


3e-12 


[SUPFAM] 


4-coumarate--CoA ligase 8e-18 


[SUPFAM] 


short-chain alcohol 


dehydrogenase homology 8e-07 


[SUPFAM] 


acyl carrier protein homology 2e-29 


[PROSITE) 


MYRISTYL 12 




[PROSITE] 


AMP BINDING 1 




[PROSITE] 


AMIDATION 1 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2_PHOSPHO_SITE 


9 


[PROSITE] 


TYR PHOSPHO SITE 


3 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASN_GL YCOS YLAT I ON 


2 


[PFAM] 


AMP-binding enzymes 




[KW] 


Irregular 




[KW] 


3D 




[KW] 


LOW_COMPLEXITY 1 


.80 % 



SEQ MTGTPKTQEGAKDLEVDMNKTEVTPRLWTTCRDGEVLLRLSKHGPGHETPMTI PEFFRES 

SEG 

llci- 

SEQ VNRFGT YPALASKNGKKWEILNFNQYYEACRKAAKSLIKLGLERFHGVGI LGFNSAEWFI 

SEG 

llci- 

SEQ TAVGAILAGGLCVGI YATNSAEACQYVITHAKVNILLVENDQQLQKILSI PQSSLEPLKA 

SEG 

llci- 

SEQ I IQYRLPMKKNNNLYSWDDFMELGRS IPDTQLEQVIESOKANQCAVLI YTSGTTGIPKGV 

SEG 

llci- 

SEQ MLSHDNITWIAGAVTKDFKLTDKHETVVSYLPLSHIAAQMMDIWVPIKIGALTYFAQADA 

SEG 

llci- 

SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG 

llci- 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFLSLDIPIGELY 

SEG 

llci- TTTTCEEETTTTCCCHHHHHHHHHHCCCCBCEE 

SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHIFMGYLESET 

SEG 

llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEETTTTCCEETTTHH 

SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPI PVETLVKKKI PI I 

SEG xxxxxxxxxxxx 

llci- HHHHHBTTTTCEEEEEEEEETTTTCEEE ECEEETTEEECHHHHHHHHHHT-TTE 

SEQ SNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDKLNFEAINFCRGLGSQASTVTEMVKQQD 

SEG 

llci- EEEEEEE 

SEQ PLVYKAI QQGINAVNQEAMNNAQRI EKWVI LEKDFSI YGGELGPMMKLKRHFVAQKYKKQ 

SEG 

llci- 

SEQ IDHMYH 

SEG 

llci- 



Prosite for DKFZphtes3_35kl6.2 



PS00001 
PS00001 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



19->23 
246->250 
332->336 
4->7 
24->27 
30->33 
218->221 
261->264 



ASN_GL YCOS YLAT I ON 
ASN_GL YCOS YLAT I ON 
CAMP_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 



844 



12/13/10, EAST Version: 2.4.2 



WO 01/12659 



PCT/IB00/01496 



PS00005 


308- 


>311 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


335- 


>338 


PKC _ PHOSPHO" 


SITE 


PDOC00005 


PS00005 


358- 


>361 


PKC PHOSPHO^ 


SITE 


PDOC00005 


PS00005 


370- 


>373 


PKC~PHOSPHO" 


SITE 


PDOC00005 


PS00005 


558- 


>561 


PKC PHOSPHO - 


SITE 


PDOC00005 


PS00006 


3C 


i->34 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


52 


->56 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


173- 


■>177 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


196- 


>200 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


206- 


>210 


CK2~PHOSPHO" 


SITE 


PDOC00006 


PS00006 


210- 


>214 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


308- 


>312 


CK2 _ PHOSPHO~ 


SITE 


PDOC00006 


PS00006 


478- 


>482 


CK2 _ PH0SPH0" 


"site 


PDOC00006 


PS00006 


591- 


>595 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00007 


659- 


>666 


TYR PHOSPHO 


SITE 


PDOC00007 


PS00007 


658- 


>666 


TYR _ PHOSPHO" 


SITE 


PDOC00007 


PS00007 


597- 


>605 


TYR PHOSPHO" 


SITE 


PDOC00007 


PS00008 




3->9 


myrYstyl 




PDOC00008 


PS00008 


65 


.->71 


MYRISTYL 




PDOC00008 


PS00008 


124- 


>130 


MYRISTYL 




PDOC00008 


PS00008 


130- 


>136 


MYRISTYL 




PDOC0C008 


PS00008 


134- 


>140 


MYRISTYL 




PDOC00008 


PS00008 


235- 


>241 


MYRISTYL 




PDOC00008 


PS00008 


239- 


>245 


MYRISTYL 




PDOC00008 


PS00008 


303- 


>309 


MYRISTYL 




PDOC0C008 


PS00008 


387- 


>393 


MYRISTYL 




PDOC00008 


PS00008 


421- 


>427 


MYRISTYL 




PDOC00008 


PS00008 


498- 


>504 


MYRISTYL 




PDOC00008 


PS00008 


586- 


>592 


MYRISTYL 




PDOC00008 


PS00009 


74 


->78 


AMI DAT ION 




PDOC00009 


PS00455 


227- 


>239 


AMP BINDING 




PDOC00427 



Pfam for DKFZphtes3_35kl6.2 



HMM_NAME AMP-binding enzymes 

HMM *TYRELNERANRLARHLRsekGIrPGDiVgIMMDRSMWMIVaMLGIWKAG 
+ + +E +A L+ +G VGI+ +S + ++ G + AG 

Query 82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWFITAVGAILAG 129 

HMM GAYVPIOPeYPdERIqYMLEDSGArLLITQrh . . . . HmqRIPdemwwvdH 

G +V I +E QY++ ++ ^ >L+++ + + IP++++ + 

Query 130 GLCVGIYATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLK 179 

HMM IiviDWe WddlWWHedeeNpqpWvdPeDLAYI IY 

+I++ + + ++++ + E ++ ++++ A +IY 

Query 180 AIIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIY 229 

HMM TSGTTGKPKGVMIEHrNIvNycqWMnWRYgMteeDDRILWFtSDpYWFDa 
TSGTTG PKGVM++H NI+ + +++ +T+ +++ + + ++ A 

Query 230 TSGTTGIPKGVMLSHDNITWI AGAVTKDFKLTDKHETVVS YLP-LSHIAA 278 

HMM SVWDMFWpLLnGaTLYIpPeEtRrDPerWWqYIqRHglTWWylTPSMFRM 

+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

Query 279 QMMDIWVPIKIGALTYFAQADAL--KGTLVSTLKEVKPTVFIGVPQIWEK 326 

HMM LMpd 

+ + 

Query 327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 37 6 

HMM pSLRhVMFgGEpLsPehWdWWRkrf gf kgRI INMYWPT 

++ + +++G PL++E+++ ++ + ++I Y+ + 
Query 377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD — IPIGELYGLS 423 

HMM ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQIQPiGViGE 
E++ T+ + + R +++G+ + + + + +N G IGE 

Query 424 ESSGPHTISNQNN — Y RLLSCGKILTGCKNMLFQQN KDG-IGE 463 

HMM LYIgGWPGVARGYWNRPELTEERFipNPFWPGEYRrGWNrRMYRTGDLAR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
Query 4 64 ICLWG-RHIFMGYLESETETTEAIDDEGW LHSGDLGQ 499 

HMM WIPDGnlEYLGRID. DQVKIRGYRIELGEIEhqLr . qHPglqEAVV* 

+ G+++ G I + G+++ + +E+ + ++P 1+ A 
Query 500 LDGLGFLYVTGH I KEI LITAGGENVPPI PVETLVKKKI PI ISNAML 545 
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DKFZphtes3_35k24 



group: transmembrane protein 

DKFZphtes3_35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 
The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown ,- 

membrane regions: 5 

Summary DKFZphtes3_35k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and C.elegans, specific for 
mammalians? 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2706 bp 

Poly A stretch at pos . 2696, polyadenylation signal at pos . 2675 



1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 

51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 

101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 

151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 

201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 

251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 

301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 

351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 

401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 

4 51 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 

501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 

551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 

601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 

651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 

7 01 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 

751 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 

801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 

851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 

901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 

951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 

1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 

1051 GAATATGGGC AATATATCGG CCCGGGGCAG AAGATATATA CAGTGAAAGA 

1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 

1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAACATATGT TGAGGGAGAC 

1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 

1251 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 

1301 GGTTCTTTGG ACGATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 

1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 

1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 

14 51 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 

15 51 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA GATAACACAA AAAGCAACCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
17 01 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
17 51 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2 351 AACAAAGGTT AAGAGACACA 

2401 TACACAAAGT CCCAGACAAC 

2451 CAGCACATCC CACCATTTAC 

2501 TCTGGATAGT GAAAATTGAA 

2551 CCTCAAAAAA TCATGCAGCG 

2 601 AAAGAATTTG TTTAATGTCT 

2 651 TTTTAAGAAC TAAATATTGC 

2701 AAAAAA 



GTTGGGCGAA CTCTCAAATT TATTGGCATT 
CAAGGAACTG AAGTTTTCAT CATATGAGAG 
AATATTCGTA TATCTTTCTG CAAATATGGC 
AAACATATGC CAACCCTGAG CAAGGGAACT 
GAACCTTGTC AGGTAGAGAA GCCGTGCATG 
TGTTTTGCGT ATGTGTTTTT TGTTTTTGTT 
ACATTAATAA ATAAGAATTA TACAGCAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 



1 MGKDFRYYFQ HPWSRMIVAY LVIFFNFLIF AEDPVSHSQT EANVIVVGNC 
51 FSFVTNKYPR GVGWRILKVL LWLLAILTGL IAGKFLFHQR LFGQLLRLKM 
101 FREDHGSWMT MFFSTILFLF IFSHIYNTIL LMDGNMGAYI ITDYMGIRNE 
151 SFMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SVVVLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 
251 LLIVMQDWEF PHFMGDVDVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 
351 DLNRTKLSWE WRSNHTNPRT NKTYVEGDMF LHSRFIGASL DVKCLAFVPS 
401 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLNESTSATE 
501 ADQDPTTSKS TPTN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35k24 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35k24, frame 1 



Report for DKFZphtes3_35k24 . 1 



[LENGTH] 514 

[MW] 60185.03 

[pi] 8.67 

[PROSITE] MYRISTYL 5 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 8 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GL YCOS YLAT I ON 6 

[KW] SIGNAL_PEPTIDE 32 

[KW] TRANSMEMBRANE 5 

[KW] LOW_COMPLEXITY 15.37 % 



SEQ MGKDFRYYFQHPWSRMIVAYL VI FFNFL I FAEDPVSHSQTEANVIVVGNC FSFVTNKYPR 

SEG 

PRD cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRILKVLLWLLAILTGLIAGKFLFHQRLFGQLLRLKMFREDHGSWMTMFFSTILFLF 

SEG xxxxxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SEQ I FSHI YNTILLMDGNMGAYI ITDYMGIRNESFMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

SEG xxx 

PRD hhhhhhhhhhccccccceeeeecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 

MEM MMMMMMMMMMMM 

SEQ YPDWGKSARAFWKKGNVRITLFWTVLFTLTSVVVLVITTDWISWDKLNRGFLPSDEVSRA 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FLASFILVFDLLIVMQDWEFPHFMGDVDVNLPGLHTPHMQFKIPFFQKIFKEEYRIHITG 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ KWFNYGIIFLVLILDLNMWKNQIFYKPHEYGQYIGPGQKIYTVKDSESLKDLNRTKLSWE 

SEG 

PRD ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 

MEM 

SEQ WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

SEG xxxxxxxxxxxxxx. . . 

PRD hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 

MEM MMMMMMMMMMMMMMMMM 

SEQ EPRMENQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

SEG 

PRD cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

MEM 

SEQ SHLTSENLSSQLNESTSATEADQDPTTSKSTPTN 

SEG 

PRD cccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKFZphtes3_35k2<3 . 1 



PS00001 


149- 


>153 


ASN_GLYCOSYLATION 


PDOC00001 


PS00001 


353- 


>357 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


364- 


■>368 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


371- 


•>375 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


487- 


■>491 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


493- 


>497 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


435- 


•>439 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


55 


.->58 


PKC_PHOSPHO SITE 


PDOC00005 


PS00005 


187- 


•>190 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


299- 


>302 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


342- 


•>345 


PKC PHOSPHO SITE 


PDOC0 30 05 


PS00005 


348- 


>351 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


370- 


■>373 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


507- 


>510 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


38 


->42 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00006 


342- 


■>346 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


348- 


>352 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


373- 


>377 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


438- 


>442 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


456- 


>460 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


499- 


>503 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


32 6- 


>334 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


48 


->54 


MYRISTYL 


PDOC00008 


PS00008 


7S 


->85 


MYRISTYL 


PDOC00008 


PS00008 


106- 


>112 


MYRISTYL 


PDOC00008 


PS00008 


134- 


>140 


MYRISTYL 


PDOC00008 


PS00008 


159- 


>165 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_35k24 . 1) 
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DKFZphtes3_35nl2 



group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP , ATP 
carrier T (ANT) proteins. 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP/ATP translocator , or adenine nucleotide translocator (ANT) , a protein most 
abundant in mitochondria. In its functional state, it is a homodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria . 



strong similarity to ADP/ATP carrier proteins 

EST hits to mouse and drosophila 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1803 bp 

Poly A stretch at pos . 1793, polyadenylation signal at pos . 1772 



1 AGCGTCCCAA GAGCCACTTT CTCGCCAGTA CGATGCTGCA GCGGTTTTCC 

51 GGTTTTCCGC TTCCCTTCAT CGTAGCTCCC GTACTCATTT TTAGCCACTG 

101 CTGCCGGTTT TTATATCCTT CTCCATCATG CATCGTGAGC CTGCGAAAAA 

151 GAAGGCAGAA AAGCGGCTGT TTGACGCCTC ATCCTTCGGG AAGGACCTTC 

201 TGGCCGGCGG AGTCGCGGCA GCTGTGTCCA AGACAGCGGT GGCGCCCATC 

251 GAGCGGGTGA AGCTGCTGCT GCAGGTGCAG GCGTCGTCGA AGCAGATCAG 

301 CCCCGAGGCG CGGTACAAAG GCATGGTGGA CTGCCTGGTG CGGATTCCTC 

351 GCGAGCAGGG TTTCTTCAGT TTTTGGCGTG GCAATTTGGC AAATGTTATT 

4 01 CGGTATTTTC CAACACAAGC TCTAAACTTT GCTTTTAAGG ACAAATACAA 

451 GCAGCTATTC ATGTCTGGAG TTAATAAAGA AAAACAGTTC TGGAGGTGGT 

501 TTTTGGCAAA CCTGGCTTCT GGTGGAGCTG CTGGGGCAAC ATCCTTATGT 

551 GTAGTATATC CTCTAGATTT TGCCCGAACC CGATTAGGTG TCGATATTGG 

601 AAAAGGTCCT GAGGAGCGAC AATTCAAGGG TTTAGGTGAC TGTATTATGA 

651 AAATAGCAAA ATCAGATGGA ATTGCTGGTT TATACCAAGG GTTTGGTGTT 

701 TCAGTACAGG GCATCATTGT GTACCGAGCC TCTTATTTTG GAGCTTATGA 

7 51 CACAGTTAAG GGTTTATTAC CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 

801 CCTTTTTCAT TGCTCAAGTT GTGACTACAT GCTCTGGAAT ACTTTCTTAT 

851 CCCTTTGACA CAGTTAGAAG ACGTATGATG ATGCAGAGTG GTGAGGCTAA 

901 ACGGCAATAT AAAGGAACCT TAGACTGCTT TGTGAACATA TACCAACATG 

951 AAGGAATCAG TTCCTTTTTT CGTGGCGCCT TCTCCAATGT TCTTCGCGGT 

1001 ACAGGGGGTG CTTTGGTGTT GGTATTATAT GATAAAATTA AAGAATTCTT 

1051 TCATATTGAT ATTGGTGGTA GGTAATCGGG AGAGTAAATT AAGAAATAAC 

1101 ATGGATTTAA CTTGTTAAAC ATACAAATTA CATAGCTGCC ATTTGCATAC 

1151 ATTTTGATAG TGTTATTGTC TGTATTTTGT TAAAGTGCTA GTTCTGCAAT 

12 01 AAAGCATACA TTTTTTCAAG AATTTAAATA CTAAAAATCA GATAAATGTG 

1251 GATTTTCCTC CCACTTAGAC TCAAACACAT TTTAGTGTGA TATTTCATTT 

1301 ATTATAGGTA GTATATTTTA ATTTGTTAGT TTAAAATTCT TTTTATGATT 

1351 AAAAATTAAT CATATAATCC TAGATTAATG CTGAAATCTA GGAAATGAAA 

1401 GTAGCGTCTT TTAAATTGCT ATTCATTTAA TATACCTGTT TTCCCATCTT 

1451 TTGAAGTCAT ATGGTATGAC ATATTTCTTA AAAGCTTATC AATAGATGTC 

1501 ATCATATGTG TAGGCAGAAA TAAGCTTTGT TCTATATCTC TTCTAAGACA 

1551 GTTGTTATTA CTGTGTATAA TATTTACAGT ATCAGCCTTT GATTATAGAT 

1601 GTGATCATTT AAAATTTGAT AATGACTTTA GTGACATTAT AAAACTGAAA 

1651 CTGGAAAATA AAATGGCTTA TCTGCTGATG TTTATCTTTA AAATAAATAA 

1701 AATCTTGCTA GTGTGAATAT ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 

17 51 TTAGTTTGTA TATTTTGTTG ACAATAAAGG AAGCTTAACT GTTAAAAAAA 

1801 AAA 



BLAST Results 



No BLAST result 



Medline entries 



96289608 : 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 



ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MITOCH_CARRIER (40-50) 

MITOCHCARRIER (145-15 5) 

MITOCH_CARRIER (242-252) 



1 MHREPAKKKA EKRLFDASSF GKDLLAGGVA AAVSKTAVAP IERVKLLLQV 
51 QASSKQISPE ARYKGMVDCL VRIPREQGFF SFWRGNLANV IP.YFPTQALN 
101 FAFKDKYKQL FMSGVNKEKQ FWRHFLANLA SGGAAGATSL CVVYPLDFAR 
151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 
201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ VVTTCSGILS YPFDTVRRRM 
251 MMQSGEAKRQ YKGTLDCFVK IYQHEGISSF FRGAFSNVLR GTGGALVLVL 
301 YDKIKEFFHI DIGGR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35nl2, frame 2 

PIR:S37210 ADP , ATP carrier protein Tl - mouse, N = 1, Score = 1127, p = 
2.7e-114 

PIR:A44773 ADP, ATP carrier protein Tl - human, N = 1, Score = 1125, P = 
4.4e-114 

TREMBL : DMADPATPT_2 product: "ADP/ATP translocase"; Drosophila 
melanogaster gene encoding ADP/ATP translocase, N = 1, Score = 1124, P 
= 5.6e-114 

PIR:XWBO ADP, ATP carrier protein Tl - bovine, N = 1, Score - 1121, P = 
1.2e-113 



>PIR:S37210 ADP, ATP carrier protein Tl - mouse 
Length = 298 

HSPs: 



Score - 1127 (169.1 bits), Expect = 2.7e-114, P = 2.7e-114 
Identities = 214/293 (73%), Positives = 248/293 (84%) 



Query: 


17 


ASS FGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQIS PEAR YKGMVDCLVRI PRE 


76 






A SF KD LAGG+ AAAVS KT A VA P I ERVKLLLQVQ +SKQIS E +YKG++DC+VRI P+E 




Sbjct: 


5 


ALSFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGIIDCVVRIPKE 


64 


Query: 


77 


QGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNKEKQFWRWFLANLASGGAAG 


136 






QGF SFWRGNLANV I RYFPTQALNFAFKDKYKQ+F+ GV++ KQFWR+F NLASGGAAG 




Sbjct: 


65 


QGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQIFLGGVDRHKQFWRYFAGNLASGGAAG 


124 


Query: 


137 


ATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSDGIAGLYQGFGVSVQGI 


196 






ATSLC VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 




Sbjct: 


125 


ATSLCFVYPLDFARTRLAADVGKGSSQREFNGLGDCLTKIFKSDGLKGLYQGFSVSVQGI 


184 


Query: 


197 


IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQVVTTCSGILSYPFDTVRRRMMMQSGE 


256 






I+YRA+YFG YDT KG+LP PK +VS+ IAQ VT +G++S YPFDTVRRRMMMQSG 




Sbjct: 


185 


1 1 YRAAYFGVYDTAKGMLPDPKNVHI IVSWMIAQSVTAVAGLVS YPFDTVRRRMMMQSGR 


244 


Query: 


257 


— AKRQYKGTLDCFVKI YQHEGISSFFRGAFSNVLRGTGGALVLVLYDKIKEF 307 








A Y GTLDC+ KI + EG ++FF+GA+SNVLRG GGA VLVLYD+IK++ 




Sbjct: 


245 


KGADIMYTGTLDCWRKIAKDEGANAFFKGAWSNVLRGMGGAFVLVLYDEIKKY 297 





Pedant information for DKFZphtes3_35nl2, frame 2 



Report for DKFZphtes3_35nl2 .2 

[LENGTH] 315 
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[MM] 
[pi] 
[HOMOL] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
le-13 
[ FUNCAT] 
[ FUNCAT] 
6e-12 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 



35022.03 
9.91 

PIR:S37210 ADP, ATP carrier protein Tl - mouse le-115 

07.16 purine and pyrimidine transporters [S. cerevisiae, YBL030c] 2e-72 

08.04 mitochondrial transport [S. cerevisiae, YBL030c] 2e-72 

30.16 mitochondrial organization [S. cerevisiae, YBL030c] 2e-72 

01.03.19 nucleotide transport [S. cerevisiae, YBL030c] 2e-72 

01.07.10 transport of vitamins, cofactors, and prosthetic groups [S. 
YIL006w] 2e-14 

07.99 other transport facilitators [S. cerevisiae, YIL006w] 2e-14 
01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 5e-14 

07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021C] 5e-14 
07.04.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 



02.13 respiration [S. cerevisiae, YBR192w] 4e-13 

01.05.04 regulation of carbohydrate utilization [S. 



cerevisiae, YJR095w] 



13.04 homeostasis of other ions 
01.04.07 phosphate transport [S. 
01.01.07 amino-acid transport 
07.10 amino-acid transporters 



[S. cerevisiae, YLR34Bc] 4e-10 
cerevisiae, YLR348c] 4e-10 

[S. cerevisiae, YOR130c] le-06 

[S. cerevisiae, YOR130c] le-06 



99 unclassified proteins [S. cerevisiae, YPR128c] 2e-06 

04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 2e-06 

BL00215B Mitochondrial energy transfer proteins 

BL00215A Mitochondrial energy transfer proteins 

duplication le-115 

phosphate transport 2e-09 

heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-09 
acetylated amino end le-115 
adipose tissue 5e-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR045w 3e-07 
ADP, ATP carrier protein le-115 
Btl protein 2e-14 

ADP, ATP carrier protein repeat homology le-115 

probable carrier protein YPR021c le-12 

MITOCH_CARRIER 3 

Mitochondrial carrier proteins 

TRANSMEMBRANE 2 

LOW COMPLEXITY 4.76 % 



SEQ MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPE 

SEG 

PRD ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ARYKGMVDCLVRI PREQGFFS FWRGNLANVI RY FPTQALNFAFKDKYKQLFMSGVNKEKQ 

SEG 

PRD hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 

MEM 

SEQ FWRWFLANLASGGAAGATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

SEG xxxxxxxxxxxxxxx 

PRD eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

MEM 

SEQ GIAGLYQGFGVSVQGIIVYRASYFGAYDTVKGLLPKPKKTPFLVSFFI AQVVTTCSGILS 

SEG 

PRD cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhhhheeeeec 

MEM . . . . MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ YPFDTVRRRMMMQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGTGGALVLVL 

SEG 

PRD cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 

MEM MMMMMMMMMMM 

SEQ YDKIKEFFHIDIGGR 

SEG 

PRD hhhhhhheeeecccc 

MEM 
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Prosite for DKFZphtes3_35nl2 . 2 



PS00215 40->50 MITOCH_CARRIER PDOC00189 

PS00215 145->155 MITOCHCARRIER PDOC00189 

PS00215 242->252 MITOCH CARRIER PDOC00189 



Pfam for DKFZphtes3_35nl2 . 2 
HMM_NAME Mitochondrial carrier proteins 

HMM *pFwkdFLAGGIAGmMeHTvMFPIDtIKTRMQlQgEMpM. .ahpRYkGMI 

+ F+KD+LAGG+A+++++T+++PI+++K+++Q+Q +++ RYKGM+ 
Query 19 SFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKOISPEARYKGMV 67 

HMM dCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFYEFMKeMFiDyfge 
DC+ +I++++G++++WRG++ANVIRY+P++A++F+F++ +K +F + +++ 
Query 68 DCLVRIPREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 117 

HMM ddnyWmWFwmnYMaGsmAGEwisvIitYPMWvVKTRLQaDqkHphsQp.R 
++W+WF+ N+++G++AG ++S+ ++YP+++++TRL D +++++ R 
Query 118 EKQFWRWFLANLASGGAAG-ATSLCVVYPLDFARTRLGVD--IGKGPEER 164 

HMM hYNGvWNcWrklYReEGgFkGLYRGWtPTWMRMIPYqmiYFf vYEtLKeW 

+++G+ +C KI +++G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
Query 165 QFKGLGDCIMKIAKSDG-IAGLYQGFGVSVQGIIVYRASYFGAYDTVKGL 213 

HMM lynYtgYnPgprelCMddsPwWhWilgWmlAGMiaWivSYPfDVVRTRMM 
L +++ + + + ++++I + + ++ + + + H + SYPFD+VR+RMM 

Query 214 LP KPK— KTPFLVSFFIAQVVT-TCSGILSYPFDTVRRRMM 251 

HMM Mdsm.edhkYqSmlDCWMqIYKnEGFkGFWKGFWPRIMRiMPWtAIMFml 
M+S+ ++++Y+++LDC+++IY++EG+ +F++G+ +++R+ ++A+++++ 
Query 252 MQSGEAKRQYKGTLDCFVKIYQHEGISSFFRGAFSNVLRGT-GGALVLVL 300 

HMM YEqMKwFL* 
Y+ +K+F+ 

Query 301 YDKIKEFF 308 
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DKFZphtes3_35n24 



group: testes derived 

DKFZphtes3_35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite Ig (Immunoglubulin) -MHC pattern. This pattern represents a 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (ilg domain!). Thus, the novel protein is a new member of the Ig-superf amily . 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 



complete cDNA, complete cds, EST hits 



Sequenced by DKFZ 
Locus : unknown 



Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos . 1560 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CGATCGTCAC 
TTAGAGACCA 
AGTCCCCAAG 
AGCGGGTGTG 
CAGAAGGCTG 
TCCACTGCGC 
ATGGCCTGCA 
TACACCATAG 
ACCAGCAGCT 
GCTCCGTAGA 
GGTCTGGGCC 
GACAGTCCTC 
ATCGGAATCT 
CGTTATCATC 
AGAGGACATT 
ATGACCTTAA 
GAGATCTGGC 
TCACATCCAA 
GCTTGGATGA 
TTGAACATTC 
TGTTCTGAAG 
AGGCACAGGA 
CTTGATGTCC 
AACTGAAGAC 
TTATTCCAGG 
CTTTGAGGTA 
GCACACATAG 
AGCTTTAGGC 
CTTGATTTAT 
AATATGGTAT 
CTTCCAACGA 
GGGGGTAGGG 



GTGACGCCGG 
GCACTGCTGG 
GGGCGCAGAC 
CGCGGCCTGC 
ACTGGGACAG 
ACTTCCATGC 
GCAGCTGCAG 
CCCAGAAATA 
TTGCAGTCCC 
GCTTGTGCCT 
GAATCGTTCA 
AAATCAACTG 
GGGACTTCTC 
TGGCCAATGA 
AGGACTTCAG 
AAAGTTGGAC 
ATGCATATTT 
CAAATGGATT 
AGCCCAAGAA 
GAGAATCTAC 
ATCCTGGTCA 
ATATGGCATG 
ATGAGCAAAG 
CATCCCATTA 
GGCTACTGAA 
CTGTAGACTG 
CTGTTATTTT 
ATAGAAATCA 
CATGACTTTG 
TTGTAATTAA 
TGCATGTTTC 
AATAAAGCTA 



GGTTCAGCGT 
CTGCACCATG 
TCTGCTGTGA 
ACAGTCACTT 
CATCCATGAG 
CCTTCTACAA 
CAGCGGCAGA 
CCTCTTTGAA 
TTCGCTTCCG 
GCTTACCCGC 
GGCTGAAGAA 
ACTGTAGTAA 
TATATAGCTA 
7ATTTATTTT 
GAGGCTACTT 
CTGGCAGACA 
GAACAATCAC 
TACTGGGCAA 
GCAGAAGCCA 
ATCTGACAAA 
TGCTTTACTA 
AGGGCCCTCA 
CACCATTCAA 
CTTAGTGACC 
GATCTAATAT 
CTGAAGTTTC 
TTTCTTACAC 
CTAAAAACTG 
TATGACTGAG 
ACTACAAATA 
ATACACTTTT 
TATTGGAACA 



ATCCTTGCTG 
AATGTGATCT 
GGTGTGCGAA 
ATTACTGTGG 
AAAATATGTC 
TTCAGAGGAA 
AGTATTTGAT 
GGGAAACACG 
TGTGAAGCTG 
TG'I'TGGCCGA 
TATCTATTCC 
TGCCACCCAC 
AGAAAAACTA 
GCCAGTTGTG 
CCACCTGGCT 
CATTGTACAC 
TATCAAGTCC 
ACTATTTGAG 
TTCGCATCCT 
GCCCCCCAAA 
CCTGATGATG 
GTCTAGCCAA 
GAGTTATTAA 
CATGAGCTCT 
ATTCCAGCCT 
CACCCTCTTC 
AGCATATTAA 
TGTTTGTCAT 
TAATATGTAG 
GTTTGTCATT 
GCTAAAGGAG 
AAAAAAAAA 



GGCAACCGTC 
ACCCACTGGC 
GCCCCAGCCG 
GGTGGTACAT 
AGCTCTTGAT 
GAACGGCAGC 
TGAATTCTGC 
AAGATGCTGT 
TATGGCCTGA 
GGCCAGCCTT 
AAGCCCAGTG 
TCTTTACTGC 
TGAAGAGGCC 
CATTTGGAAC 
AATATATTCT 
C AAGGTCTCT 
TCTCACAGGC 
AATGACACTG 
GACTTCAATC 
AAACCATCTT 
AATTCTTCAA 
AGAACAACAG 
GTCTCATTTC 
GCATCAAGGG 
TGCACAACTG 
CCCTGGGATT 
GGGAATATAA 
GACCTTTGTA 
TCAGATCACT 
TCCCAGAAGT 
GGGTAAAGGA 



BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosite motifs: IG MHC (35-42) 



1 MNVIYPLAVP 
51 EKICQLLIPL 
101 EGKHEDAVPA 
151 EYLFQAQWTV 
201 FASCAFGTED 
251 HYQVLSQAHI 
301 KAPQKTIFVL 
351 QELLSLISTE 



KGRRLCCEVC 
RTSMPFYNSE 
ALQSLRFRVK 
LKSTDCSNAT 
IRTSGGYFHL 
QQMDLLGKLF 
KILVMLYYLM 
DHPIT 



EAPAERVCAA 
EERQHGLQQL 
LYGLSSVELV 
HSLLHRNLGL 
ANIFYDLKKL 
ENDTGLDEAQ 
MNSSKAQEYG 



CTVTYYCGVV 
QQRQKYLIEF 
PAY PLLAE AS 
LYIAKKNYEE 
DLADTLYTKV 
EAEAIRILTS 
MRALSLAKEQ 



HQKADWDSIH 
CYTIAQKYLF 
LGLGRIVQAE 
ARYHLANDIY 
SEIWHAYLNN 
ILNIRESTSD 
QLDVHEQSTI 



BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n24 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35n24, frame 3 

Report for DKFZphtes3_35n24 . 3 



[LENGTH] 365 

[MW] 41768.24 

[pi] 5.82 

[BLOCKS] BL00273 Heat-stable enterotoxins proteins 

[PROSITE] MYRISTYL 1 

[PROSITE] IG_MHC 1 

[PROSITE] AMI DAT ION 1 

[PROSITE] CK2_PHOSPHO_SITE 7 

[PROSITE] TYR_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.11 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MNVIYPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGVVHQKADWDSIHEKICQLLIPL 
ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

RTSMPFYNSEEERQHGLQQLQQRQKYLIEFCYTIAQKYLFEGKHEDAVPAALQSLRFRVK 

xxxxxxxxxxxxxxx 

cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

LYGLSSVELVPAYPLLAEASLGLGRIVQAEEYLFQAQWTVLKSTDCSNATHSLLHRNLGL 

hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

LYIAKKNYEEARYHLANDIYFASCAFGTEDIRTSGGYFHLANIFYDLKKLDLADTLYTKV 

eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEIWHAYLNNHYQVLSQAH I QQMDLLGKL FEN DTGLDEAQEAEAIRILTS ILNIRESTSD 

hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

KAPQKTIFVLKILVMLYYLMMNSSKAQEYGMRALSLAKEQQLDVHEQSTIQELLSLISTE 

ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

DHPIT 

ccccc 
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PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 



168->172 
272->276 

322- >326 
114->117 
299->302 

323- >326 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00006 


48->52 


CK2_PHOSPHO 


SITE 


PDOC00006 


PS00006 


69->73 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


125->129 


CK2_PH0SPH0" 


SITE 


PDOC00006 


PS00006 


274->278 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


297->301 


CK2_PHOSPHO 


"site 


PDOC00006 


PS00006 


349->353 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


358->362 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00007 


85->93 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


1B6->194 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


186->194 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


185->194 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


275->281 


MYRISTYL 




PDOC00008 


PS00009 


11->15 


AMIDATION 




PDOC00009 


PS00290 


35->42 


IG MHC 




PDOC00262 
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DKFZphtes3_35n9 



group: metabolism 

DKFZphf tes3_35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase (EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 458-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 



carboxylesterase, splice variant 

5' extension of mRNA and N-terminal elongation of protein (64 aa) , 
missing exon! aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2888 bp 

Poly A stretch at pos . 2878, no polyadenylation signal found 



1 CTCGGCCTGA GGTGCGAGAG AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
51 CCCGGGAACA TGATGGTCGC TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
101 GCGGCGCTGG GTCGTGCGAG CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 
151 ACCAGGTCTC AGGGGGCACT AAAGGCGGTC GGAGGTAATC CCCACACCGC 
201 TTCCTCCTGG AAGTCAGGCT GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
251 GTCGCCTCTG GCCTGGCAGG GATCCTAGTG TCTCGGGACC TCCCGGTGAC 
301 GCGCCTGCCT CCCCTGCTGC ACCATAGGCC CGGGAGTACG GCGTCCCCAC 
351 AGCTTGGACC GGCAGGGGCT CGTGAAATGT TTGTCAAGTG GATAAATGAC 
401 CATGGCCGTG GTCTCCGCGG GAGGTGAGGA AACTGAAAGC CACCGAGGAA 
451 AAGGGGGGCG CTCCTTAAGA AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 
501 AATGGCGTGA CTGAGTAGGG AGGGGACCGC GGAGACCCTC AGACCCTGGA 
551 CTGTAAGGAG ATGAGGGGCC GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 
601 AAAGCAAGGA GGAACTTCCA GAATGAAGGG CGCCGACACT CCTTCCTGCC 
551 TTTGCTCAAG CGGTTCCTTC ACCCCGATCA AGTTCCTTCC CATTTCTCCA 
701 TCTGGGGGAT CCTGAACGTG CACATCCTCA GAGAAGCCCT CCTGGGGTCT 
751 CCAATTCTAG TTTATTGCCC CCTCCTATCG ATCCCCCAGC GCGCTCATCG 
801 GGCCTGTGGA CAAGGACAGG TTTGAAGAGA GGATTCCCTG GATCGCGGAA 
851 GGGCTGCAGG AATGGCACAG CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 
901 CAAAGGAAAG TGGCCGTGCC CGGGCCTGCC TACCACTAGA TCCCCACCCA 
951 CCTATGACTG CTCAGTCCCG CTCTCCTACC ACACCCACCT TTCCCGGCCC 
1001 AAGCCAGCGC ACCCCGCTGA CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 
1051 GCAAGGCACT GATCCACTGC TGGACAGACC CGGGGCAGCC TCTGGGTGAA 
1101 CAGCAGCGTG TCCGCCGGCA GCGAACCGAG ACCAGCGAGC CGACCATGCG 
1151 GCTGCACAGA CTTCGTGCGC GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 
1201 TGCTTCTTGT CCGGGGCCAG GGCCAGGACT CAGCCAGTCC CATCCGGACC 
1251 ACACACACGG GGCAGGTGCT GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 
1301 TGCCGGGGTC CAAACCTTCC TGGGAATTCC ATTTGCCAAG CCACCTCTAG 
1351 GTCCGCTGCG ATTTGCACCC CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 
1401 AGGGATGGAA CCACCCATCC GGCCATGTGT CTACAGGACC TCACCGCAGT 
1451 GGAGTCAGAG TTTCTTAGCC AGTTCAACAT GACCTTCCCT TCCGACTCCA 
1501 TGTCTGAGGA CTGCCTGTAC CTCAGCATCT ACACGCCGGC CCATAGCCAT 
1551 GAAGGCTCTA ACCTGCCGGT GATGGTGTGG ATCCACGGTG GTGCGCTTGT 
1601 TTTTGGCATG GCTTCCTTGT ATGATGGTTC CATGCTGGCT GCCTTGGAGA 
1651 ACGTGGTGGT GGTCATCATC CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 
1701 AGCACTGGAG ACAAGCACGC AACCGGCAAC TGGGGCTACC TGGACCAAGT 
1751 GGCTGCACTA CGCTGGGTCC AGCAGAATAT CGCCCACTTT GGAGGCAACC 
1801 CTGACCGTGT CACCATTTTT GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 
1851 TCGCTTGTTG TGTCCCCCAT ATCCCAAGGA CTCTTCCACG GAGCCATCAT 
1901 GGAGAGTGGC GTGGCCCTCC TGCCCGGCCT CATTGCCAGC TCAGCTGATG 
1951 TCATCTCCAC GGTGGTGGCC AACCTGTCTG CCTGTGACCA AGTTGACTCT 
2001 GAGGCCCTGG TGGGCTGCCT GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 
2051 AATTAACAAG CCTTTCAAGA TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 
2101 TGCCCAGGCA CCCCCAGGAG CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 
2151 CCTAGCATTG TTGGTGTCAA CAACAATGAA TTCGGCTGGC TCATCCCCAA 
2201 GGTCATGAGG ATCTATGATA CCCAGAAGGA AATGGACAGA GAGGCCTCCC 
2251 AGGCTGCTCT GCAGAAAATG TTAACGCTGC TGATGTTGCC TCCTACATTT 
2 301 GGTGACCTGC TGAGGGAGGA GTACATTGGG GACAATGGGG ATCCCCAGAC 
2351 CCTCCAAGCG CAGTTCCAGG AGATGATGGC GGACTCCATG TTTGTGATCC 
2 4 01 CTGCACTCCA AGTAGCACAT TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 
24 51 TACGAGTTCC AGCATCAGCC CAGCTGGCTC AAGAACATCA GGCCACCGCA 
2 501 CATGAAGGCA GACCATGTTA AATTCACTGA GGAAGAGGAG CAGCTAAGCA 
2551 GGAAGATGAT GAAGTACTGG GCCAACTTTG CGAGAAATGG GAACCCCAAT 
2 601 GGCGAGGGTC TGCCACACTG GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2 651 GCAGCTGAAC CTACAGCCTG CGGTGGGCCG GGCTCTGAAG GCCCACAGGC 
2701 TCCAGTTCTG GAAGAAGGCG CTGCCCCAAA AGATCCAGGA GCTCGAGGAG 
2751 CCTGAAGAGA GACACACAGA GCTGTAGCTC CCTGTGCCGG GGAGGAGGGG 
2 801 GTGGGTTCGC TGACAGGCGA GGGTCAGCCT GCTGTGCCCA CACACACCCA 
2851 CTAAGGAGAA AGAAGTTGAT TCCTTCATAA AAAAAAAA 



BLAST Results 



Entry D50579 from database EMBL : 

Homo sapiens mRNA for carboxylesterase, complete cds. 
Score = 7197, p = 0.0e+00, identities = 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 

Score = 2808, P = 1.2e-291, identities = 542/559, positives = 543/559, 
frame +3 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOXYLESTERASE_B_l (279-295) 
CARBOXYLESTERASE B_2 (185-196) 



1 MTAQSRSPTT PTFPGPSQRT PLTPCPVQTP RLGKALIHCW TDPGQPLGEQ 

51 QRVRRQRTET SEPTMRLHRL RARLSAVACG LLLLLVRGQG QDSASPIRTT 

101 HTGQVLGSLV HVKGANAGVQ TFLGIPFAKP PLGPLRFAPP EPPESWSGVR 

151 DGTTHPAMCL QDLTAVESEF LSQFNMTFPS DSMSEDCLYL SIYTPAHSHE 

201 GSNLPVMVWI HGGALVFGMA SLYDGSMLAA LENVVVVIIQ YRLGVLGFFS 

251 TGDKHATGNW GYLDQVAALR WVQQNIAHFG GNPDRVTIFG ESAGGTSVSS 

301 LVVSPISQGL FHGAIMESGV ALLPGLI ASS ADVISTVVAN LSACDQVDSE 

351 ALVGCLRGKS KEEILAINKP FKMIPGVVDG VFLPRHPQEL LASADFQPVP 

401 SIVGVNNNEF GWLIPKVMRI YDTQKEMDRE ASQAALQKML TLLMLPPTFG 

451 DLLREEYIGD NCDPQTLQAQ FQEMMADSMF VIPALQVAH? QCSRAPVYFY 

501 EFQHQPSWLK NIRPPHMKAD HVKFTEEEEQ LSRKMMKYWA NFARNGNPNG 

551 EGLPHWPLFD QEEQYLQLNL QPAVGRALKA HRLQFWKKAL PQKIQELEEP 
601 EERHTEL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35n9, frame 3 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human, N = 1, Score = 2808, 
P = 1.9e-292 



TREMBL:HSU60553_1 gene: "hCE-2"; product: "carboxylesterase"; Human 
carboxylesterase (hCE-2) mRNA, complete cds., N = 1, Score = 2761, P = 
1.8e-287 



PIR:A34329 60K esterase (EC 3.1.1.-) isoform 2 - rabbit, N = 1, Score = 
1985, P = 3.1e-205 

TREMBL:D50580_1 product: "carboxylesterase precursor"; Rattus 
norvegicus mRNA for carboxylesterase, partial cds., N = 1, Score = 
1984, P = 4e-205 



>PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 
Length = 559 



HSPs : 



Score = 2808 (421.3 bits), Expect = 1.9e-292, P = 1.9e-292 
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Identities = 542/559 (96%), Positives = 543/559 (97%) 

Query: 65 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 124 

MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
Sbjct: 1 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 60 

Query: 12 5 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 184 

IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 
Sbjct: 61 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 120 

Query: 185 EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVIIQYRLG 244 

EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI IQYRLG 
Sbjct: 121 EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI IQYRLG 180 

Query: 245 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 304 

VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVVS 
SbjCt: 181 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTI FGESAGGTSVSSLVVS 240 

Query: 305 PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 364 

PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 
Sbjct: 241 PISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKSKEEI 300 

Query: 365 LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRI YDTQ 424 

LAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRI YDTQ 
Sbjct: 301 LAI NKPFKMIPGVVDGVFLPRHPQELL AS ADFQPVPSI VGVNNNEFGWLIPKVMRI YDTQ 360 

Query: 425 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 484 

KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMNADSMFVIPA 
Sbjct: 361 KEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMFVIPA 420 

Query: 485 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
Sbjct: 421 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHGDELPFVFRSFFGGNYIKFTEEE 480 

Query: 529 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 588 

EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 
Sbjct: 481 EQLSRKMMKyWAN FARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 540 

Query: 589 ALPQKIQELEEPEERHTEL 607 

ALPQKIQELEEPEERHTEL 
Sbjct: 541 ALPQKIQELEEPEERHTEL 559 



Pedant information for DKFZphtes3_35n9, frame 3 



Report for DKFZphtes3_35n9 . 3 



[LENGTH] 607 

[MW] 67051.20 

[pi] 6.11 

[HOMOL] PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 0.0 

[BLOCKS] BL01173A Lipolytic enzymes "G-D-X-G" family, histidine 

[BLOCKS] BL00122G 

[BLOCKS) BL00122F 

[BLOCKS] BL00122E 

[BLOCKS] BL00122D Carboxylesterases type-B serine proteins 

[BLOCKS] BL00122C Carboxylesterases type-B serine proteins 

[BLOCKS] BL00122B Carboxylesterases type-B serine proteins 

[BLOCKS] BL00122A Carboxylesterases type-B serine proteins 

[SCOP] dlakn 3.56.1.1.4 Bile-salt activated lipase [Bovine (Bos taurus le-158 

[SCOP] d2ack 3.56.1.1.1 Acetylcholinesterase [Electric ray (Torped le-170 

[SCOP] dlthg 3.56.1.9.7 type-B carboxylesterase/lipase [fungu le-149 

[EC] 3.1.1.13 Sterol esterase le-52 

[EC] 3.1.1.7 Acetylcholinesterase 5e-74 

[EC] 3.1.1.1 Carboxylesterase 0.0 

[EC] 3.1.1.8 Cholinesterase 5e-68 

[EC] 3.1.1.59 Juvenile-hormone esterase le-34 

[EC] 3.1.1.3 Triacylglycerol lipase 3e-52 

[PIRKW] duplication 2e-47 

[PIRKW] homotetramer 3e-67 

[PIRKW] transmembrane protein 9e-44 

[PIRKW] microsome le-130 

[PIRKW] pancreas 3e-52 

[PIRKW] endoplasmic reticulum le-134 

[ PIRKW] homotrimer le-134 

[PIRKW] phosphatidylinositol linkage 5e-74 

[PIRKW] synapse 3e-73 

[PIRKW] liver le-131 

[PIRKW] heparin binding 3e-52 



858 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



^ E X L\t\VV J 


phosphoprotsin 1q~ 2 5 






L * X r\J\W J 


giyLopiutcni ic IJfl 






r PTRKWl 


1"h vmi H hnrninno V\i no\;nt"h(»t!'i <? 


2e- 


•47 


f PTRKWl 
[el r\rvw j 


t-di.JJtJAyj.xt_ cslci nycix oxa b t= u 


. 0 




[ P I RKW ] 


itionoms r 2 s — 4 2 






TPTRKW1 


Hi enl f- -i Ha hnn H O o T 1 






r PTRKWl 
L it x r\i\ vv j 


m^Tnrn^i t~\/ nl^nr) To - •iO 

ILlCHvLiLLdXy ^±U11V^ *J C JiL 








aiLcxildtlvc t>pxxc_xng jc / 1 






f PTRKWl 
in r\.i\Y* j 


T f"nH 1 T1 ^ O & — / 






[PI RKW ] 


pyrogXut srnic 3 c i ci 6s — 39 






r PTRKWl 








[PIRKW] 


ttiii^pIp 3p— 73 






f PTRTfWl 


LIlyLUlU y laliU *i / 






[PI RKW ] 
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[ P I RKW ] 


neurotransmitter degradation 


3e- 


•73 


r DTRfcTW 1 
l it X r\r\ vv J 


cholesterol 3e-52 






f pTRKWl 


homodimer 2e-47 






T DTRT^Wl 
l r XJaJXW J 


nerve 3e-73 






f QnPTTQMl 


cholinesterase 0.0 






r sttpfami 

L jutr mi j 


triacylglycerol lipase le-32 






f c;nPTraM i 

[OUcI rti'J J 


cholinesterase homology 0.0 






[ jurr rii i j 


thyroglobulin 2e-47 






[ SUPFAM] 


thyroglobulin type I repeat homology 2e-47 


[SUPFAM] 


juvenile-hormone esterase 2e- 


-35 




[SUPFAM] 


probable lipolytic protein ybaC 


le-07 


[PROSITE] 


CARBOXYLESTERASE B 2 1 






[PROSITE] 


CARB0XYLESTERASE_B_1 1 






[PFAM] 


Carboxyleste rases 






[KW] 


Alpha Beta 






[KW] 


3D 






[KW] 


L0W_C0MPLEXITY 3.95 % 







SEQ MTAQSRSPTTPTFPGPSQRTPLTPCPVQTPRLGKALIHCWTDPGQPLGEQQRVRRQRTET 

SEG xxxxxxxx . . . 

lacj- 

SEQ SEPTMRLHRLFARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQ 

SEG xxxxx 

lacj- ETTEEEECEEEEETTEE— EE 

SEQ TFLGIPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPS 

SEG 

lacj- EEEEEECEETTTGGGTTTCCEECCCCCCEEECCCCCCBCCCCCCTTTTTT-HHHHHCCCC 

SEQ DSMSEDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVIIQ 

SEG 

lacj- CCBTTTTCEEEEEET--TTTTTTEEEEEEECTTTTTTCTTTTGCHHHHHHHHCCEEEECC 

SEQ YRLGVLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTI FGESAGGTSVSS 

SEG 

lacj- CCCCGGGCCCTTTTTTTCCHHHHHHHHHHHHHHHCGGGGCEEEEEEEEEEECHHHHHHHH 

SEQ LVVSPISQGLFHGAIMESGVALLPGLI ASSADVI STVVANLSACDQVDSEALVGCLRGKS 

SEG 

lacj- HHHCGGGTTTTCEEEEETTTTTTTTTTBCHHHHHHHHHHHHC-CCCCCHHHHHHHHHHCC 

SEQ KEEILAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRI 

SEG 

lacj- HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEEEEETBTHHHHHHTTTTT 

SEQ YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMF 

SEG 

lacj- TTTCCCCCHHHHHHHHHHHTTTTCHHHHHHHHHHCTTTTTTTHHHH-HHHHHHHHHHHHH 

SEQ VIPALQVAH FQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHVKFTEEEEQLSRKMMKYWA 

SEG 

lacj- HHHHHHHHHHHHCCCCEEEEEECCCCGGGTTBTTTHHHCGGGCCCHHHHHHHHHHHHHHH 

SEQ NFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKKALPQKIQELEEP 

SEG xxxxx 

lacj - HHHHHCCCCCCC — CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH 

SEQ EERHTEL 

SEG xxxxxx . 

lacj- 
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PS00122 279->295 CARBOXYLESTERASE_B_l PDOC00112 

PS00941 185->196 CARBOXYLESTERASE B 2 PDOC00112 



Pfara for DKFZphtes3_35n9 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Carboxylesterases 



69 



*MfMnwlimFLLwmItWIi.WheqaprpPdPyiVdtnnCGkIRGmNedtD 
+ +L+++ ++++++++ ++Q++++P I T+ G + G ++ + 
RLRARLSAVACGLLLLLVRGQGQDSASP IRTTHT-GQVLGSLVHVK 



NG . . pYYvFlGIPYAEPPVGNLRFKePQPYhePWtNVWNATnYPPMCMQW 
+ + +FLGIP+A+PP+G LRF +P+P +E W++V++ T+ P MC+Q+ 
114 GANAGVQTFLGIPFAKPPLGPLRFAPPEP-PESWSGVRDGTTHPAMCLQD 

ndFGFWIFdmieMWNeniP. . eMSEDCLYLNVWTPWnrkPNskLPVMVWI 
+++ +4+N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
163 LTAV — ESEFLSQFNMTFPSDSMSEDCLYLSIYTPAHSHEGSNLPVMVWI 

HGGGFMFGSGhsYPliqYDgeylMMeeNVIVVtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+VV I+YRLG++GF+STGD + 

211 HGGALVFGMA SLYDGSMLAALENVVVVI I QYRLGVLGFFSTGDKH 

lPPHGNWGLWDQRMALQWVQDNIAnFGGDPNNITIFGESAGGMSVHIHML 
+ GNWG++DQ++AL+WVQ+NIA+FGG+P+++TIFGESAGG+SV+ ++ 
256 AT--GNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLVV 

SYGGDNPPmfKqLFHRAIMQSGsAmcPWvIQsnyNaRqRAfRFArimGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

304 S PISQGLFHGAIMESGVALLPGLIASSA— DVI STVVANLSACD 

rmDssEMIqCLRsKPwEELWdAtWnFWmHfYf PFlPWFFgPVIDGDDaPE 
+ DS++++ CLR K+ EE++++++ +F + + +DG+ 
34 6 QVDSEALVGCLRGKSKEEILAINK PFKMIPGV VDGV 

aFIPDHPeeMIkEGkFnDVPWllGYNnDEGiWFapMmMnfnWfdEDeWId 
F+P+HP+E++-1-+ F VP I+G+NN E++W++P M + + +E++ 
382 -FLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRIYDT-QKEMDR 

itNedWyeWMPYHFYrddmsNikDMDDYiDkvyEeYPgWWDrFPqESYW 
++ + ++ M +L + + + D ++EEY+G+ + PQ 

430 EASQAALQKMLTLLMLPPT-F GDLLREEYIGDNGD-PQTLQA 

nLqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFmWR 
++Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
4 70 QFQEMMADSMFVIP--ALQVAHFQCSRAPVYFYEFQHQPSW LKN 

WWPpWMgvdH* 
+PP+M++DH 
512 IRPPHMKADH 521 

*tEEEiissMRmMMNYWINFAKhGNPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+NFA++GNPN++ GL++WP ++++EQY++ + 
525 TEEEEQLS-RKMMKYWANFARNGNPNGE— GLPHWPLFDQEEQYLQLNL 



tllraiQraCrrarDPYCNFW* 
+ +4+++ + FW 

571 QPAVGRALKAHR--LQFW 



58S 
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113 
162 
210 
255 
303 
345 
381 
429 
469 
511 

570 
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DKFZphtes3_35pl7 



group: testes derived 

DKFZphtes3_35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
eukaryotes. Some of them, like armadillo, beta-catenin and plakoglobins have dual functions 
intercellular junctions and signalling cascades. Others, belonging to the importin-alpha- 
subfamily are involved in NLS recognition and nuclear transport, while some members of the 
armadillo family have as yet unknown functions. The novel protein shows similarity to S. 
cerevisiae protein Yel013p (VAC 8) and Danio rerio b-catenin, but contains no armadillo (arm) 
repeats . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . ■ 



similarity to S. cerevisiae VAC 8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos. 1956, polyadenylation signal at pos . 1935 



1 AAGTCAAATG TAAGATTGGT TCATTAAAAA TACTGAAGGA AATCAGTCAT 

51 AATCCTCAAA TCAGACAGAA TATTGTTGAC CTTGGGGGCT TACCAATTAT 

101 GGTGAATATA CTTGATTCTC CACACAAGAG TCTAAAATGT TTGGCAGCCG 

151 AGACTATCGC GAATGTTGCC AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 

201 CAGCACGGGG GTATCACCAA ACTGGTTGCT CTACTAGACT GTGCACATGA 

251 TTCCACAAAA CCTGCCCAAT CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 

301 TGGCTCGCTG TGGGGCACTG GCCCTGTGGA GCTGCAGTAA GAGTCATACG 

351 AATAAAGAAG CCATCCGCAA AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 

401 GCTGAAGACT TCTCATGAAA ACATGCTAAT TCCAGTGGTG GGGACATTGC 

4 51 AAGAGTGTGC ATCAGAGGAA AACTACCGGG CTGCAATCAA AGCAGAAAGG 

501 ATCATTGAAA ACCTTGTCAA GAACCTAAAT AGTGAGAATG AGCAGCTGCA 

551 GGAGCACTGC GCCATGGCCA TTTACCAGTG TGCTGAAGAT AAGGAAACCC 

601 GGGACCTCGT TAGGCTGCAC GGAGGACTTA AGCCCTTGGC CAGTCTACTC 

651 AATAACACTG ACAATAAAGA GCGGTTAGCT GCTGTCACAG GGGCTATATG 

701 GAAATGTTCC ATCAGCAAAG AGAATGTTAC CAAGTTTCGG GAATACAAAG 

751 CCATTGAAAC CTTGGTGGGA CTTCTAACAG ATCAGCCTGA AGAAGTACTT 

801 GTGAATGTGG TTGGGGCCTT GGGAGAATGC TGCCAAGAAC GTGAAAACCG 

851 AGTCATTGTC CGGAAATGTG GTGGCATTCA ACCACTTGTG AACCTCCTTG 

901 TTGGAATAAA CCAAGCTCTT CTTGTGAATG TTACAAAAGC AGTTGGTGCT 

951 TGTGCAGTAG AACCTGAAAG TATGATGATA ATTGATCGCT TAGATGGAGT 

1001 TCGTTTGTTG TGGTCCCTGC TGAAAAATCC TCACCCAGAC GTGAAGGCCA 

1051 GCGCAGCATG GGCACTCTGT CCATGCATCA AAAATGCAAA GGATGCTGGG 

1101 GAAATGGTTC GTTCCTTTGT TGGTGGTTTG GAACTTATTG TCAATTTACT 

1151 GAAATCAGAT AACAAAGAAG TTCTGGCAAG TGTATGTGCT GCCATTACCA 

1201 ACATAGCAAA AGATCAAGAA AATTTAGCTG TTATCACAGA TCATGGAGTT 

1251 GTTCCTTTAT TGTCCAAACT GGCAAATACA AATAACAATA AATTGAGACA 

1301 TCATCTAGCA GAAGCTATTT CACGTTGCTG TATGTGGGGC AGGAATAGAG 

1351 TGGCCTTCGG TGAGCACAAA GCAGTGGCTC CACTAGTGCG TTATCTGAAA 

1401 TCAAATGACA CCAACGTGCA TCGGGCGACA GCTCAGGCCT TGTACCAACT 

1451 CTCAGAAGAC GCCGATAACT GCATCACCAT GCATGAGAAT GGTGCAGTAA 

1501 AGCTTCTACT GGATATGGTT GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 

1551 GCAGCTGGTT GTATATCCAA TATCCGCAGG CTGGCTCTTG CTACAGAGAA 

1601 GGCAAGATAC ACTTGAAATT TAAATGGACA TTACAAGCTA TCAAATTCTA 

1651 CATGACACAG GACATGTCAC TCCCATGGCC AGAAAGCCTA AATTGGGAAA 

1701 CAGTTGTTAG CAAACCCTTT CAACCATCTA AATGAAAACA CACAAATTGA 

1751 AAATGCACAG AATGTTTTTC ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 

1801 TATTTAATGT TTTCGAGATA TGACATGTGA TAAGATGGAA AGCCAATAAA 

1851 CCTGTGATAA GTTTCTAAGA ATATGAGAAT ATACGTATAT GATGTATTTT 

1901 TAGTTCAGTG ATGCTTTTGT ATTTGTGGCG ATTTTAATAA AGGATATGGC 

1951 CTTCCCAAAA AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98413148: 

Yel013p (Vac8p) , an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane. 

98330438: 

YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharomyces 

cerevisiae vacuolar membrane that 

functions in vacuole fusion and inheritance. 

98158703: 

Vac8p, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole. 



Peptide information for frame 3 



ORF from 99 bp to 1613 bp; peptide length: 505 
Category: similarity to known protein 
Classification: unset 



1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RVVRQHGGIT KLVALLDCAH 
51 DSTKPAQSSL YEARDVEVAR CGALALWSCS KSHTNKEAIR KAGGIPLLAR 
101 LLKTSHENML IPVVGTLQEC ASEENYRAAI KAERIIENLV KNLNSENEQL 
151 QEHCAMAIYQ CAEDKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 
201 WKCSISKENV TKFREYKAIE TLVGLLTDQP EEVLVNVVGA LGECCQEREN 
251 RVIVRKCGGI QPLVNLLVGI NQALLVNVTK AVGACAVEPE SMMIIDRLDG 
301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELI VNL 
351 LKSDNKEVLA SVCAAITNIA KDQENLAVIT DHGVVPLLSK LANTNNNKLR 
401 HHLAEAISRC CMWGRNRVAF GEHKAVAPLV RYLKSNDTNV HRATAQALYQ 
451 LSEDADNCIT MHENGAVKLL LDMVGSPDQD LQEAAAGCIS NI RRLALATE 
501 KARYT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35pl7, frame 3 

PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae), N = 1, 
Score = 237, P = 7.8e-n 

PIR:T00403 T13E15.9 protein - Arabidopsis thaliana, N = 1, Score = 215, 
P = 4.9e-14 

TREMBL : DR4 108 1_1 product: "b-catenin" ; Danio rerio b-catenin mRNA, 

complete cds., N = 1, Score = 195, P = 5.8e-12 



>PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae) 
Length = 578 

HSPs : 



Score = 237 (35.6 bits), Expect = 7.8e-17, P = 7.8e-17 
Identities = 106/401 (26%), Positives = 177/401 (44%) 



Query: 


92 


AGGI PLLARLLKTSHENMLIPVVGTLQECASEENYRAAIKAERI IENLVKNLNSENEQLQ 


151 






+GG PL A +N+ + L E Y + E ++E ++ L S++ Q+Q 




Sbjct: 


45 


SGG-PLKALTTLVYSDNLNLQRSAALAFAEITEKYVRQVSRE-VLEPILILLQSQDPQIQ 


102 


Query: 


152 


EHCAMAIYQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVT 


211 






A+ A + E + L+ GGL+PL + + DN E G I + +N 




Sbjct: 


103 


VAACAALGNLAVNNENKLLI VEMGGLEPLINQMMG-DNVEVQCNAVGCITNLATRDDNKH 


161 


Query: 


212 


KFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGIN 


271 






K A+ L L + V N GAL ENR + G + LV+LL + 




Sbjct: 


162 


KIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTD 


221 


Query: 


272 


QALLVNVTKAVGACAVEPESMMI I DRLDG — VRLLWSLLKNPHPDVKASAAWALCPC IKN 


329 






+ T A+ AV+ + + + + V L SL+ +P VK A AL + 








862 
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Sbjct : 


222 


PDVQYYCTTALSNI AVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASD 


281 


Query : 


330 


AKDAGEMVRSFVGGLELIVNLLKSDNKE-VLASVCAAITNIAKDQENLAVITDHGVV-PL 


387 






E+VR+ GGL +V L++SD+ VLASV A I NI+ N +1 D G + PL 




Sbjct: 


282 


TSYQLEIVRA--GGLPHLVKLIQSDSIPLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 


338 


Query : 


388 


LSKLANTNNNKLRHHLAEAISRCCMWG-RNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQ 


446 






+ L ++ +++ H + +NR F E AV + +V ++ 




Sbjct: 


339 


VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 


397 


Query: 


447 


ALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCISNI 4 92 








A++ AD + + + E +L+MS +Q++ AA ++N+ 




Sbjct : 


398 


ACFAI LALADVS KL DLLE ANI LDAL I PMTFSQNQEVSGNAAAALAN L 4 44 




Score 


= 213 


(32.0 bits), Expect = 3.6e-14, P = 3.6e-14 




Identities ■ 


- 81/341 (23%), Positives = 163/341 (47%) 




Query: 


163 


EDKETRDLVRLHGGLKPLASLLNNTD-NKERLAAVTGAIWKCSISKENVTKFREYKAIET 


221 






EDK+ D G LK L 4-L+ + + N +R AA+ A I+++- V + + +E 




Sbjct: 


36 


EDKDQLDFYS-GGPLKALTTLVYSDNLNLQRSAALAFA EITEKYVRQVSR-EVLEP 


89 


Query : 


222 


LVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKA 


281 






++■ LL Q ++ V ALG EN++++ + GG++PL+N ++G N + N 




Sbjct: 


90 


ILILLQSQDPQIQVAACAALGNLAVNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGC 


149 


Query: 


282 


VGACAVEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFV 


341 






+ A ++ I -1- L L K+ H V+ +A AL + ++ E+V + 




Sbjct: 


150 


ITNLATRDDMKHKI ATSGALI PLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA — 


207 


Query: 


342 


GGLELI VNLLKSDNKEVLASVCAAITNI AKDQENLAVT — TDHGVVPLLSKLANTNNNKL 


399 






G + ++V+T.T, 9 + +V A++NIA D+ K + T+ +V L L ++ + + + + 




Sbjct: 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query : 


400 


RHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCI 


459 






+ A+ ++ + LV+ ++S+ + A+ + +S N 




Sbjct: 


268 


KCQATLALRNLASDTSYQLEI VRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query: 


460 


TMHENGAVKLLLDMVGSPDQDLQEAAAGCISNIRRLALATEKAR 503 








+ + G +K L+ ++ D + E +S +R LA ++EK R 




Sbjct: 


328 


LIVDAGFLKPLVRLLDYKDSE — EIQCHAVSTLRNLAASSEKNR 369 




Score 


= 180 


(27 0 bifsl ExDect = 1 Se-10. P = 1 6e-10 




Identities = 


= 80/346 (23%), Positives = 142/346 (41%) 




Query: 


145 


SENEQLQEHCAMAI YQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCS 


204 






S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 




Sbjct: 


58 


SDNLNLQRSAALAFAEITE-KYVRQVSR — EVLEPILILLQSQDPQIQVAACA-ALGNLA 


113 


Query: 


205 


ISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGALGECCQERENRVIVRKCGGIQPLV 


264 






+ + EN E +E L+ + EV N VG + +N+ + G + PL 




Sbjct: 


114 


VNNENKLLIVEMGGLEPLINQMMGDNVEVQCNAVGCITNLATRDDNKHKI ATSGALI PLT 


173 


Query: 


265 


NLLVGINQALLVNVTKAVGACAVEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWALC 


324 






L + + N T A+ E+ + V +L SLL + PDV+ AL 




Sbjct: 


174 


KLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTALS 


233 


Query: 


325 


PCIKNAKDAGEMVRSFVGGLEL I VNLLKSDNKEVLASVCAAITNI AKDQENLAVITDHGV 


384 






++++++ + +V+L+ S + V A+ N+A D I G 




Sbjct: 


234 


NIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 


293 


Query : 


385 


VPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 


444 






+ P L KL +++ L I + N + + PLVR L D+ + 




Sbjct: 


294 


LPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEIQCH 


353 


Query : 


445 


A-QALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCIS 490 








A L L+ ++ N E+GAV+ ++ +Q + C + 




Sbjct: 


354 


AVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFA 401 




Score 


= 155 


(23.3 bits), Expect = 8.8e-08, P = 8.8e-08 




Identities - 


= 88/401 (21%), Positives = 175/401 (43%) 




Query: 


60 


LYEARD — VEVARCGALALWSCSKSHTNKEAIRKAGGI-PLLARLLKTSHENMLIPVVGT 


116 






L +++D ++VA C AL + 4- ++ NK I + GG+ PL+ +++ + E + VG 




Sbjct : 


93 


LLQSQDPQIQVAACAALG — NLAVNNENKLLIVEMGGLEPLINQMMGDNVE-VQCNAVGC 


149 


Query : 


117 


LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLHG 


175 






+ A+ ++ + I + L K S++ ++Q + A+ +E R +LV G 




Sbjct: 


150 


I TNLATRDDNKHKI ATSGALI PLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA-G 


208 


Query: 


176 


GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR — EYKAIETLVGLLTDQPEEV 


233 






+ L SLL++TD + T A+ ++ + N K E + + LV L+ V 
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Sbjct : 


209 


AVPVLVSLLSSTDPDVQYYCTT-ALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query : 


234 


LVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMM 


293 






AL + ++ 4- + GG+ LV L+ + L++ + ++ P + 




Sbjct: 


268 


KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query : 


294 


IIDRLDGVRLLWSLLK-NPHPDVKASAAWALCPCIKNA-KDAGEMVRSFVGGLELIVNLL 


351 






+ 1 ++ L LL +++ A L ++ K+ E S G +E L 




Sbjct: 


328 


LIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFES— GAVEKCKELA 


385 


Query : 


352 


KSDNKEVLA — SVCAAITNI AKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAI SR 


409 






V + SCAI +AD L+++ ++ L + + N++ + A A++ 




Sbjct: 


386 


LDSPVSVQSEISACFAILALA-DVSKLDLL-EANILDALIPMTFSQNQEVSGNAAAALAN 


443 


Query : 


410 


CCMWGRNRVAFGE HKAVAP-LVRYLKSNDTNVHRATAQALYQLSE 453 








C N E ++ + L+R+LKS+ + QL E 




Sbjct: 


444 


LCSRVNNYTKI IEAWDRPNEGI RGFLI RFLKSDYATFEHI ALWTILQLLE 493 




Score 


= 139 


(20.9 bits). Expect = 5.0e-06, P = 5.0e-06 




Identities = 80/329 (24%), Positives = 142/329 (43%) 




Query: 


37 


GGITKLVALLDCAHD-STKPAQ SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKA 


92 






fTTT n U 4-T & X T J 1 t- + XT D RT x + C ± a 

tj li h U H tl A ' J_j t+t T V K ALl T + O WTt T f\ 




Sbjct : 


148 


GCITNLATRDDNKHKIATSGALI PLTKLAK3KHIRVQRNATGALLNMTHSEENRKELVNA 


207 


Query: 


93 


GGIPLLARLLKTSHENMLIPVVGTLQECASEE-NYRAAIKAE-RI IENLVKNLNSENEQL 


150 






(j Tf-rL LiJj tt tt Li A +ti W + + £j K+t LiV TTb t tt 




Sbjct: 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query: 


151 


QEHCAMAIYQCAEDKETR-DLVRLHGGLKPLASLLNNTDNKERLAAVTGAIMKCSISKEN 


209 






+ +A+- A U t t+VK WjJj Jj 1j+ + V+ + A i ol IN 




Sbjct: 


268 


KCQATLALRNLASDTSYQLEIVRA-GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLN 


325 


Query: 


210 


VTKFREYKAIETLVGLLT-DQPEEVLVNVVGALGECCQERE-NRVIVRKCGGIQPLVNLL 


267 






4- ii. ju Tt rp+ + U I IT MP + fl +4- T 
T T T !_■ V Li Li Cj L T i V i_i £i NK T Kj T T ±j 




Sbjct: 


326 


EGLIVDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAAESEKNRKEFFESGAVEKCKELA 


385 


Query : 


268 


VG — INQALLVNVTKAVGACA-VEPESMMI IDRLDGVRLJjWSLLKNPHPDVKASAAWA-L 


323 




++ a+ a a v ++ + T.n + + + +u A4-AA a t 

T TT ■ r rtl M rt V TT T i_i D T T T < rJ r\i rtrt t\ Jj 




Sbjct: 


386 


LDSPVSVQSEISAC FAILALADVSKLDLLEANILDAL-IPMTFSQNQEVSGNAAAALANL 


444 


Query: 


324 


CPCIKN-AKDAGEMVRSFVGGLELIVNLLKSD 354 








^ • Iri t\ t\ >J ' ' Xj £\ O L/ 




Sbjct: 


445 


CSRVNNYTKI IEAWDRPNEGIRGFLIRFLKSD 47 6 




Score 


= 136 






Identities = 


= 72/304 (23%), Positives = 133/304 (43%) 




Query: 


58 


SSLYEARDVEVARCGALALWSCSKSHTNKEAIRKAGGI PLLARLLKTSHENMLIPVVGTL 


117 






+ L +++ + V R AL + + S N++ + AG +P+L LL ++ + + L 




Sbjct: 


173 


TKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTAL 


232 


Query: 


118 


QECASEE-NYRAAIKAE-RIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETR-DLVRLH 


174 






A +E N + + E R++ LV ++S + +++ +A+ AD + ++VR 




Sbjct: 


233 


SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 


291 


Query: 


175 


GGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLT-DQPEEV 


233 






GGL L L+ + D+ + A I SI N + ++ LV LL EE+ 




Sbjct: 


292 


GGLPHLVKLIQS-DSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPLVRLLDYKDSEEI 


350 


Query: 


234 


LVNVVGALGECCQERE-NRVIVRKCGGIQPLVNLLVG--INQALLVNVTKAVGACA-VEP 


289 






+ V L E NR + G ++ L + ++ ++ A+ A A V 




Sbjct: 


351 


QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFAILALADVSK 


410 


Query: 


290 


ESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-LCPCIKN-AKDAGEMVRSFVGGLELI 


347 




++ + LD + + + +N A+AA ALC+NK RG + 




Sbjct : 


411 


LDLLEANILDAL-IPMTFSQNQEVSGNAAAALANLCSRVNNYTKIIEAWDRPNEGIRGFL 


469 


Query: 


348 


VNLLKSD 354 








+ LKSD 




Sbjct: 


470 


IRFLKSD 476 




Score 


= 114 


(17.1 bits), Expect = 2.7e-03, P = 2.7e-03 




Identities = 


= 71/335 (21%), Positives = 132/335 (39%) 




Query: 


1 


MVNILDSPHKSLKCLAAETI ANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 


60 






+ + SH++A +N+ +R++ G + LV+LL ST P 




Sbjct: 


172 


LTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLS STDP 


222 


Query: 


61 


YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGI PLLARLLKTSHENMLI PVVGTLQEC 


120 






DV+ AL+ + +++ KA++LL++ + L+ 
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Sbjct: 223 DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 278 

Query: 121 ASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 130 

AS+ +Y+ I + +LVK + S++ L I + L+ G LKPL 

Sbjct: 279 ASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEGLIVDAGFLKPL 338 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKE-NVTKFREYKAIETLVGLLTDQPEEVLVNVVG 239 

LL+ D++E 4 + S E N +F E A+E L D P V + 

SbjCt: 339 VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISA 398 

Query: 240 ALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVWVTKAVG-ACAVEPESMMIIDRL 298 

+++ + + 4 L+ + NQ + N A+ C+ 11+ 

SbjCt: 399 C FAI LALADVS KLDLLEAN I L DAL I PMTFSQNQEVSGNAAAALAN LC SRVNN YTKI I EAW 458 

Query: 299 D GVR-LLWSLLKNPHPDVKASAAWALCPCIKNAKDAGE 335 

D G+R L LK+ + + A W 4 +++ D E 
Sbjct: 459 DRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLESHNDKVE 500 

Score = 106 (15.9 bits), Expect = 2.0e-02, P = 2.0e-02 
Identities = 49/204 (24%), Positives = 89/204 (43%) 

Query: 65 DVEVARCGALA-LWSCSKSHTNKEAI RKAGGI PLLARLLKTSHENMLIPVVGTLQECA-S 122 

+VEV 4C A4 4 4 4 NK I +G + L +L K+ H + G L S 

SbjCt: 139 NVEV-QCNAVGCITNLATRDDNKHKI ATSGALIPLTKLAKSKHI RVQRNATGALLNMTHS 197 

Query: 123 EENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRD-LVRLHGGL-KPL 180 

EEN + + A + LV L4S 4 4Q +C A+ A D+ R L + L L 
Sbjct: 198 EENRKELVNAGAV-PVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKL 256 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 240 

SL+++ 44 4 A T A4 + + 4 LV L+ 44+ V 

SbjCt: 257 VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

Query: 241 LGECCQERENRVIVRKCGGIQPLVNLL 267 

4 N 44 G 44PLV LL 

Sbjct: 316 IRNISIHPLNEGLIVDAGFLKPLVRLL 342 

Pedant information for DKFZphtes3_35pl7, frame 3 



Report for DKFZphtes3_35pl7 . 3 

[LENGTH] 505 

[MW] 55224.34 

[pi] 8.43 

[HOMOL] PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae) 2e-16 

[FUNCAT] 30.25 vacuolar and lysosomal organization [S. cerevisiae, YEL013w] 8e-18 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

8e-18 

[FUNCAT] 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YEL013w] 8e-18 

[FUNCAT] 08.01 nuclear transport [S. cerevisiae, YNL189w] 3e-06 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YNL189w] 3e-06 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YNL189w] 3e-06 

[BLOCKS ] BL01265C 

[BLOCKS] BL00242A Integrins alpha chain proteins 

[SCOPJ d3bct 1.91.1.1.1 beta-Catenin [Mouse (Mus musculus) 7e-18 

[PIRKW] cytosol 3e-ll 

[PIRKW] apoptosis 3e-ll 

[PIRKW] carcinogenesis 3e-ll 

[PIRKW] cell adhesion 3e-ll 

[PIRKW] cytoskeleton 3e-12 

[SUPFAM] pendulin le-07 

[KW] All_Alpha 

[KW] 3D 

[KW] LOW_COMPLEXITY 2.38 % 

SEQ MVNILDSPHKSLKCLAAETIANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 

SEG xxxxxxxxxxxx 

2bct- HH 



SEQ YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPVVGTLQEC 

SEG 

2bct- HHCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCHHHHHHHGGGCCCHHHHHHHHHHHHHH 

SEQ ASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAIYQCAEDKETRDLVRLHGGLKPL 

SEG 

2bct- HHTTTHHHHHHHHCHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHHCHHHHH 
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SEQ ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 

SEG 

2bct- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVI VRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMMI IDRLDG 

SEG 

2bct- H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFVGGLELIVNLLKSDNKEVLA 

SEG 

2bct- HHHHHHHHHTTTHHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAF 

SEG 

2bct- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHHCHHHHH 

SEQ GEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCITMHENGAVKLLLDMVGS PDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCI SNIRRLALATEKARYT 

SEG 

2bCt- HHHHHHHHH 



(No Prosite data available for DKFZphtes3_35pl7 . 3) 
(No Pfam data available for DKFZphtes3_35pl7 . 3) 
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DKFZphtes3_35p22 



group: cell cycle 



DKFZphtes3_35p22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 (tre-2 
locus) . 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiquitination . 

strong similarity to oncogene 1 (tre-2 locus) 
membrane regions : 1 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: map="17" 
Insert length: 2072 bp 

Poly A stretch at pos . 2062, polyadenylation signal at pos . 2039 



1 GTTACACACA GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 
51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 
101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 
151 TCATTATGAA ATACGAAAAG GGACACCGAG CTGGGCTGCC AGAGGACAAG 
201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 
251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 
301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 
351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 
401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 
451 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT " ACCAGATCAT GAAGGAGAAG 
501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 
551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 
601 GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 
651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 
701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 
751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 
801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 
851 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 
901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 
951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 
1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 
1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 
1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 
1151 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 
1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 
1251 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 
1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 
1351 GTGGGGCTGT CCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 
1401 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 
14 51 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 
1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 
1551 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 
1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 
1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 
1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 
1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 
1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 
1851 TATTAAGAAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 
1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 
1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 
2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 
2051 CCTGTTGAAA TGAAAAAAAA AA 



BLAST Results 



Entry AC00397 6 from database EMBL: 

Homo sapiens chromosome 17, clone hCIT . 91_J_4, complete sequence. 
Score = 4385, P = 0.0e+00, identities = 881/886 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS A001W35. 
Score = 850, P = 1.9e-32, identities = 170/170 



Medline entries 



92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast DOA4 gene encodes a deubiquitinating enzyme 
related to a product of the human tre-2 oncogene. 

95176708: 

UBP5 encodes a putative yeast ubiquitin-specif ic protease 
that is related to the human Tre-2 oncogene product. 



Peptide information for frame 3 



ORF from 99 bp to 1745 bp; peptide length: 549 
Category: strong similarity to known protein 



1 MDVVEVAGSW WAQEREDIIM KYEKGHRAGL PEDKGPKPFR SYNNNVDHLG 

51 IVHETELPPL TAREAKQIRR EISRKSKWVD MLGDWEKYKS SRKLI DRAYK 

101 GMPMNIRGPM WSVLLNTEEM KLKNPGRYQI MKEKGKKSSE HIQRIDRDVS 

151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHIAALFLLY 

201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 

251 MGHQDKKDLC GQCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 

301 TRIAFKVQQK RLTKTSRCGP WARFCNRFVD TWARDEDTVL KHLRASMKKL 

351 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 

4 01 IWSASPPRAP RSSTPCPGGA VREDTYPVGT QGVPSPALAQ GGPQGSWRFL 

451 QWNSMPRLPT DLDVEGPWFR HYDFRQSCWV RAISQEDQLA PCWQAEHPAE 

501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35p22, frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N = 1, Score = 
2181, P = 5.5e-226 

PIR:S57867 oncogene 1 - human, N - 1, Score - 1536, P = 1.2e-157 

>PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 
Length = 786 

HSPs: 

Score = 2181 (327.2 bits), Expect = 5.5e-226, P = 5.5e-226 
Identities = 405/500 (81%), Positives = 440/500 (88%) 

MDWEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 
MD+VE A S AQER+DI+MKY+KGHRAGLPEDKGP+? N+++D GI+HETELPP+ 



TAREAK+IRRE++R SKW++MLG+WE YK S KLIDR YKG+PMNIRGP+WS VLLN +E+ 



KLKNPGRYQIMKE+GK+SSEHI ID DV TLR H+FFRDRYG KQREL +ILLAY EY 



NPEVGYCRDLSHI ALFLLYLPEEDAFWALVQLLASERKSL GFHSPNGGTVQGLQDQQE 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


60 


Query: 


121 


Sbjct: 


120 


Query: 


181 


Sbjct: 


180 
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Query: 241 HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILIDGISLGLTLRLWDVYLVEGEQALMPI 300 

HVV SQPKTM HQDK+ LCGQC+ LGCL+R LIDGI SLGLTLRLWDV YLVEGEQ LMPI 
Sbjct: 240 HVVPKSQPKTMWHQDKEGLCGQCASLGCLLRNLIDGI SLGLTLRLWDV YLVEGEQVLMP I 299 

Query: 301 TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T IA KVQQKRL KTSRCG WAR N+F DTWA ++DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct: 300 TSIALKVQQKRLMKTSRCGLWARLRNQFFDTWAMNDDTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 480 

VREDTYPVGTQGVPS ALAQGGPQGSWRFL+W SMPRLPTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSLALAQGGPQGSWRFLEWKSMPRLPTDLDIGGPWFPHYDFERSCWV 479 



Query: 481 RAISQEDQLAPCWQAEHPAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 4 80 RAISQEDQLATCWQAEHCGE 4 99 



Pedant information for DKFZphtes3_35p22, frame 3 



Report for DKFZphtes3_35p22 . 3 



[LENGTH] 549 

[MW] 62159.16 

[pi] 9.23 

[HOMOL] PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 0.0 

[FUNCAT] 11.01 stress response [S. cerevisiae, YGRlOOw] 2e-16 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YGRlOOw] 2e-16 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YNL293w] 3e-15 

[PIRKW] transmembrane protein 6e-14 

[PROSITE] MYRISTYL 6 

[PROSITE] AM I DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PH0SPHO_SITE 10 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 5.28 % 



SEQ MDVVEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

SEG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMFMNIRGPMWSVLLNTEEM 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHIFFRDRYGTKQRELLHILLAYEEY 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVV ATSQPKTMGHQDKKDLCGQCSPLGCL I RI LIDGI SLGLTLRLWDV YLVEGEQALMP I 

SEG 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM MMMMMMMMMMMMMMMMMM 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRS STPCPGGA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 
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SEQ VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 

SEG 

PRD cccccccccccccccccccccccccceeeeeccccccccccccccccccccccccccccc 

MEM 

SEQ RAISQEDQLAPCWQAEHPAERVRSAFAAPSTDSDQGTPFRARDEQQCAPTSGPCLCGLHL 

SEG 

PRD cchhhhhhhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccceeee 

MEM 

SEQ ESSQFPPGF 

SEG 

PRD CCCCCCCCC 

MEM 



Prosite for DKFZphtes3_35p22 . 3 



PS00004 


136- 


>140 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


310- 


>314 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


348- 


>352 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


61 


->64 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


73 


->76 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


9C 


l->93 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


152- 


>155 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


215- 


>219 


PKC PHOSPHO 


"site 


PDOC000 05 


PS00005 


282- 


>285 


PKC PHOSPHO" 


"silt: 


PDOC00005 


PS00005 


315- 


>318 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


346- 


>349 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


351- 


>354 


PKC PHOSPHO_ 


"site 


PDOC00005 


PS00005 


446- 


>449 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


61 


->65 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


460- 


>464 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


484- 


>488 


CK2 PHOSPHO 


"she 


PDOC00006 


PS00006 


511- 


>515 


CK2_PHOSPHO 


"site 


PDOC00006 


PS00007 


93- 


>100 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


92- 


>100 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


6 


->14 


MYRISTYL 




PDOC00008 


PS00008 


101- 


>107 


MYRISTYL 




PDOC00008 


PS00008 


230- 


>236 


MYRISTYL 




PDOC00008 


PS00008 


27 6- 


>282 


MYRISTYL 




PDOC00008 


PS00008 


366- 


>372 


MYRISTYL 




PDOC00008 


PS00008 


441- 


>447 


MYRISTYL 




PDOC00008 


PS00009 


134- 


>138 


AMI DAT ION 




PDOC00009 



(No Pfam data available for DKFZphtes3_35p22 . 3) 
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DKFZphtes3_4b4 



group: testes derived 

DKFZphtes3_4b4 encodes a novel 497 amino acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes or as a new protease inhibitor. 



strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 
Sequenced by AGOWA 

Locus: /map="333.4 cR from top of Chrl6 linkage group" 
Insert length: 4574 bp 

Poly A stretch at pos . 4551, polyadenylation signal at pos . 4539 



1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 
51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 
101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 
151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 
201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 
251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 
301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 
351 AGCCATCCCC AGGGAGGACA AGGAGGAGAT CCTCATGCTG CACAACAAGC 
401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 
451 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 
551 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA AGGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC ACACAGATAG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
751 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 
1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AGAAAACCTC TGCGGTCAAC 
1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 
1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 
1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 
1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 
1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAGAGACACG 
1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 
1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 
1401 CCCGTTTGAA AAGCCAGCAA CTCACTGCCC AAGAATCCAT TGTCCGGCAC 
1451 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 
1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 
1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAAAAGA 
1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 
1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 
1701 TTCCAGCACC AGGGGAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 
1751 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 
1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 
1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 
1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGGGTCTCCA TCTGGACGTC 
1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 
2 001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 
2 051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 
2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 
2151 AAGTAGAAGG TAGTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 
2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 
2251 AGTAAGAGGG CTGCGGGTAT GAGAGACCCC GGCTCCGCCC TGGCACGTGT 
2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 
2351 GCTCTATACA CAGCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 
2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATGGC CCACCTGTTT 
2451 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 TTACCCCCTA CCCATTGTGG 
2551 TCCTGGTTCA CACCCAGGAC 
2 601 CAAGTCTTAA CTCCTGGTCT 
2 651 GAACAACCAA AGAAGGCCTG 
2701 AATGTGCAGA TTCCCCACGC 
2 751 GGAATGGAGT CTTTGGTACA 
2801 GGTTATGAAA CCGTCTGTGG 
2851 AGAAGGATCT CTTTTCCTGT 
2 901 CGAGGCGCCA AGGAGTGTAG 
2 951 TGCTTCATGA GCCCAGACCA 
3001 GTAAATAGCA TTTTTTTGCA 
3051 GCCAGCCAAT AGATCACTTT 
3101 ATATTTCTTA GGTGAAAGAA 
3151 AGACTGGACA AGAAATTCTA 
3201 TGATTGCCTT TCTAATAAAT 
3251 AAAACAAAAA CCCACCCCTT 
3301 TAGCTTGACT GAGCTAAAAT 
3351 TCTAGTCGTA ATTCATAGGT 
34 01 GAAGAATTCG GTCAGCCTGT 
34 51 CTGGGAAACT TCTGGGTGCT 
3501 TGTCTGTGTC TGCAAGATAA 
3551 AGTGAAGGGT CCAGGACGAT 
3601 TCAAGGGAGA CTTGAAACTT 
3651 TAAAGTCCCC GGGTTCCTTA 
3701 ATAGAAAGTC CTTGCCCAGA 
3751 TTTCCCGAGA CCAAGTTTCA 
3801 TGATCTCTGC TCATTGCAAC 
3851 GCATCAGCCT CCCAAGTACC 
3901 GCTAATTTTT GTATTTTTAG 
3951 CTGGTCTCGA ACTCCTTACC 
4001 AGTGCTGGGA TTACAGGCAT 
4 051 GTCTTTATCA TCCCCACAAA 
4101 AAATGGAAAC AAGACTATAA 
4151 CCTGTGTGTG GAATAGAGGC 
4201 TAAAAAGATC TTGTACCAAG 
4251 GGATGAACAT TTTCGGCTTC 
4301 GCGTGTGCTG GTTTCTCATA 
4351 ATGTGTGTGC TTTTTTCTAT 
4 4 01 TACAAAGTTT TATTGTAAAT 
4451 CGTTGTTGCA ATTGTTTCAG 
4501 GTAACATATC TTTTATGAAC 
4 551 GAAAACCAAA AAAAAAAAAA 



CTCCCACCCT GCCTCGGACT GGTTTACGTG 
TTTTCTTTGC AAGCGAACCT GTTTGAAGCC 
CGTAAGGTTC CACTGAGACG AGATGTCTGA 
CTCTTTGCTG CTTTTAAAAA ATGACAATTA 
ACCCGATGAC CTATTTTTTC AGCCGTGGGA 
TTCCTCACCG AGGTTAGCAG CTCAGTTTGT 
CCTCATGACA GCGAGAGATG GGAATACACT 
TTTCGTGAAA CGACTCTTGC CAAACGTTCC 
TACACCCTGG CTGCCATCAC TCTATAAAAG 
AAAGCCCACA GTGAAATGAA GTACCCTTTT 
GAAGGTGAAA ATTCCACTCT CTACCACCGG 
GGTGAATGCT AGTTTCAAAT TTGATTCAAA 
CTAGCAGAAA GTCAAAAACT AAGATACTGT 
CCTGGGCACC TAGGTGATGC CTTCTTTCTT 
GCAGAATCTG AAGGTAAATA GGTTTAAAAC 
TAAGGAGTTG GTAAAAAGCA GTTCAACTCT 
TCACAGGACT ACGTGCTTTG TGCATTGTAG 
ACTGACTCCT CAGCCCCAAA TGTCGGAGAG 
CAGGTCGTGA GTCCAGTTAC CACCAAACAT 
GGGTGCTCTG CTGCTGGACT TTTGTGGCTG 
ATTAGATCGC CCTGTGGGGT TTGCAGAATT 
CCCAGTGGGC TCGCTTCCAA AGCATCCCAC 
CCAGTGTGAG TTGACCCCAT CATTTAAAAA 
ATGCCTCCTT CACTGGGCCT TCCTAGCAGG 
GCAGGACCTG GCTGTCTTTT TTTTTTTTTT 
CTCTGTTGCC CAAGGTAGAG TGCAGTGGCG 
TGCCGCCTCC CGGGTTCAAG CAATTCTCAT 
TGGGACTACA GGCGTGAGCT ACCATGCCCG 
TAGAGATGGG GTTTCATTAT GTTGGCCAGG 
TCAGGTGATC CACCCACCTT GGCCTCCCGA 
GAGCCACTGC GCCCGGCCAT GGACCTGGCT 
CATTTTGAAA CTGGAATATT TGTCTTCAGA 
ATGATAAGCC CTGTCCCTAG CACCACCTCT 
CCCTCGTGCT ACCAACACTT ACCCTGTGTT 
CCAACGGCGT TCCTGGCTCT CCTGCCCACA 
CTTAGGAGTT TTGCCCTACC GTATTCCAAA 
TTGTCTGTAG GCTCACTCAG CCCGCAGTTT 
GAAAAATGAT GTATTTTGCT ACTTCCTGTG 
GTTTTTTGTG CTTTGCATGA ACAGGGGCCA 
TAGAACTGGT TTGATTTCTA AAATGTTCCT 
AAATCTGAAC AATTTGTGAA ATAAAACATT 
AAAA 



BLAST Results 



Entry HS834352 from database EMBL : 
human STS WI-15502. 
Score = 1331, P = 5.4e-54, identities = 287/301 



Medline entries 



98146272 : 

cDNA cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 



Peptide information for frame 1 



ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 



1 MSCVLGGVIP LGLLFLVCGS QGYLLPNVTL LEELLSKYQH NESHSRVRRA 

51 IPREDKEEIL MLHNKLRGQV QPQASNMEYM TWDDELEKSA AAWASQCIWE 

101 HGPTSLLVSI GQNLGAHWGR YRSPGFHVQS WYDEVKDYTY PYPSECNPWC 

151 PERCSGPMCT HYTQIVWATT NKIGCAVNTC RKMTVWGEVW ENAVYFVCNY 

201 SPKGNWIGEA PYKNGRPCSE CPPSYGGSCR NNLCYREETY TPKPETDEMN 

251 EVETAPI PEE NHVWLOPRVM RPTKPKKTSA VNYMTQVVRC DTKMKDRCKG 

301 STCNRYQCPA GCLNHKAKI F GTLFYESSSS ICRAAIHYGI LDDKGGLVDI 

351 TRNGKVPFFV KSERHGVQSL SKYKPSSSFM VSKVKVQDLD CYTTVAQLCP 

401 FEKPATHCPR IHCPAHCKDE PSYWAPVFGT NIYADTSSIC KTAVHAGVIS 
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451 NESGGDVDVM PVDKKKTYVG SLRNGVQSES LGTPRDGKAF RIFAVRQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4b4, frame 1 

TREMBLNEW:AF109674_1 gene: "Lgll"; product: "late gestation lung 
protein 1"; Rattus norvegicus late gestation lung protein 1 (Lgll) 
mRNA, complete cds . , N = 1, Score = 968, P = 1.9e-97 

TREMBL:D45027_1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., N = 1, Score = 736, P = 
4 .5e-73 

TREMBL:AB00 9609_1 gene: "HrTT-1"; Halocynthia roretzi HrTT-1 mRNA, 
complete cds., N = 1, Score = 345, P = 2e-31 

PIR:JC5308 testis-specif ic, vespid, and pathogenesis-related protein 1 
precursor - human, N = 1, Score = 337, P = 1.7e-30 



>TREMBLNEW: AF1096741 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds . 

Length = 188 

HSPs: 



Score 


= 968 


(145.2 bits), Expect = 1.9e-97, P = 1.9e-97 




Identities = 


= 160/185 (86%), Positives = 170/185 (91%) 




Query: 


61 


MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 


120 






MLHNKLRGQV P ASNMEYMTWD+ELE+SAAAWA +C+WEHGP SLLVSIGQNL HWGR 




Sbjct: 


1 


MLHNKLRGQVYPPASNMEYMTWDEELERSAAAWAQRCLWEHGPASLLVSIGQNLAVHWGR 


60 


Query: 


121 


YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 


180 






YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MCTHYTQ+VWATTNKIGCAV+TC 




Sbjct: 


61 


YRSPGFHVQSWYDEVKDYTYPYPHECNPWCPERCSGAMCTHYTQMVWATTNKIGCAVHTC 


120 


Query: 


181 


RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 


240 




R M+VWG++WENAVY VCNYSPKGNWIGEAPYK+GRPCSECP SYGG CRNNLCYREE Y 




Sbjct : 


121 


RSMSVWGDIWENAVYLVCNYSPKGNWIGEAPYKHGRPCSECPSSYGGGCRNNLCYREEHY 


180 


Query: 


241 


TPKPE 245 








KPE 




Sbjct: 


181 


HQKPE 185 





Pedant information for DKFZphtes3_4b4 , frame 1 



Report for DKFZphtes3_4b4 . 1 



[LENGTH] 497 

[MW] 55920.00 

[pi] 8. 36 

[HOMOL] TREMBL:D45027_1 product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA for 
kDa trypsin inhibitor, complete cds. 6e-78 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YJL078C] 8e-12 

[BLOCKS] BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009D Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009C Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[PIRKW] glycoprotein 5e-22 

[PIRKW] blocked amino end 5e-13 

[PIRKW] brain 9e-30 

[PIRKW] hydrolase 4e-09 

[PIRKW] hemolymph coagulation 4e-09 

[PIRKW] zymogen 4e-09 

[PIRKW] alternative splicing 4e-09 

[PIRKW] sperm 5e-22 

[PIRKW] viroid-induced protein 2e-ll 

[PIRKW] venom 6e-18 

[PIRKW] pyroglutamic acid 2e-ll 

[PIRKW] transmembrane protein 2e-10 

[PIRKW] serine proteinase 4e-09 

[SUPFAM] C-type lectin homology 4e-09 

[SUPFAM] trypsin homology 4e-09 
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[SUPFAM] complement factor H repeat homology 4e-09 

[SUPFAM] cysteine-rich secretory protein 1 6e-24 

[SUPFAM] pathogenesis-related leaf protein 7e-15 

[PROSITE] MYRISTYL 8 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE] ASN_GLYCOSYLATION 3 

[PROSITE] SCP_AG5_PR1_SC7_2 1 

[PFAM] SCP-like extracellular Proteins 

[KW] All_Beta 

[KW] SIGNAL_PEPTIDE 23 

[KW] LOW_COMPLEXITY 1.21 % 

SEQ MSCVLGGVIPLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAIPREDKEEIL 
SEG xxxxxx 



PRD ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 

SEQ MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 

SEG 

PRD hhhhhhhcccccccccchhhrihhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

S EQ YRS PGFHVQSWYDEVKD YT YPYPSECNPWC PERCSGPMCTHYTQI VWATTNK I GCAVNTC 

SEG 

PRD ccccchhhhhhhhhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

SEQ RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 

SEG 

PRD cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

SEQ TPKPETDEMNEVETAPIPEENHVWLQPRVMRPTKPKKTSAVNYMTQVVRCDTKMKDRCKG 

SEG 

PRD cccccccccccccccccccceeeeecccccccccccceeeeeeeeeeeeecccccccccc 

SEQ STCNRYQCPAGCLNHKAKIFGTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

SEG 

PRD ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

SEQ KSERHGVQSLSKYKPSSSFMVSKVKVQDLDCYTTVAQLCPFEKPATHCPRIHCPAHCKDE 

SEG 

PRD eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

SEQ PSYWAPVFGTNIYADTSSICKTAVHAGVISNESGGDVDVMPVDKKKTYVGSLRNGVQSES 

SEG 

PRD ccceeeeeceeeccccceeeeeeeeccccccccccccceeecccceeeeeecccceeeee 

SEQ LGTPRDGKAFRIFAVRQ 

SEG 

PRD ccccccccceeeeeccc 



Prosite for DKFZphtes3_4b4 . 1 



PS00001 


27 


->31 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


41 


->45 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


451- 


>455 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


181- 


>185 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


276- 


>280 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


464- 


>468 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00005 


170- 


>173 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


179- 


>182 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


201- 


>Z04 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


228- 


>231 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


241- 


>244 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


362- 


>365 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


471- 


>474 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


483- 


>486 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


29 


->33 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


75 


->79 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


81 


->85 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


130- 


>134 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


453- 


>457 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


385- 


>393 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


111- 


>117 


MYRISTYL 


PDOC00008 


PS00008 


115- 


>121 


MYRISTYL 


PDOC00008 


PS00008 


174- 


>180 


MYRISTYL 


PDOC00008 


PS00008 


204- 


>210 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS01010 



227->233 
300->306 
447->453 
470->476 
195->207 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 

SCP AG 5 PR1 SC7 2 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00772 



Pfam for DKFZphtes3_4b4 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



SCP-like extracellular Prcteins 



52 



*PQDEQDEWLNkHNDFRQQVGRGLETRGNPGPQPPAsNMnPMVWNDELAt 
P + ++E+L HN +R QV P ASNM M+W+DEL + 

PREDKEE I LMLHNKLRGQVQ PQASNMEYMTWDDELEK 



IAQnWANQCiFDHHDCCWNHsnYPYGQNIAWWSsTANnPWnWssMIQMWY 
A WA+QCI +H ++ + S GQN+ + + ++++ +Q+WY 
8 9 SAAAWASQCIWEHGPTSLLVSI GQMLGAHWG RYRS PGFHVQSWY 

NEvkDYN YNWNTCkGG NN FmVCGH YTQM VW RrVT f r I GCGR Y I C YC 

+EVKDY Y + + +C HYTQ+VW+ T +IGC+ C+ 

133 DEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTCRK 

NNNWrKPDPWKhkWYYVCNYCPpGNYmN* 
+ W + W+ +Y VCNY P+GN+++ 
183 MTVW — GEVWENAVYFVCNYSPKGNWIG 208 



88 



132 



182 
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DKFZphtes3_4f 17 



group: testes derived 

DKFZphtes3_4f 17 encodes a novel 656 amino acid protein with weak similarity to methyl-CpG- 
binding proteins. 

Methylation at the DNA sequence 5 ' -CpG is required for mammalian development. Methyl-CpG- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does not contain such a motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to methyl-CpG-binding protein 

extension of HS557771/HSZ78337, 

there are some differences to these sequences 



Sequenced by AGOWA 
Locus: /map="18" 



Insert length: 2320 bp 

Poly A stretch at pos. 2266, polyadenylation signal at pos . 2251 



1 GGCAGGTTCG CGGGTCGCTG GCGGGGGTCG TGAGGGAGTG CGCCGGGAGC 
51 GGAGATATGG AGGGAGATGG TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
101 GGACAGCAAG TCCGAGAATG GGGAGAATGC GCCCATCTAC TGCATCTGCC 
151 GCAAACCGGA CATCAACTGC TTCATGATCG GGTGTGACAA CTGCAATGAG 
201 TGGTTCCATG GGGACTGCAT CCGGATCACT GAGAAGATGG CCAAGGCCAT 
251 CCGGGAGTGG TACTGTCGGG AGTGCAGAGA GAAAGACCCC AAGCTAGAGA 
301 TTCGCTATCG GCACAAGAAG TCACGGGAGC GGGATGGCAA TGAGCGGGAC 

3 51 AGCAGTGAGC CCCGGGATGA GGGTGGAGGG CGCAAGAGGC CTGTCCCTGA 

4 01 TCCAGACCTG CAGCGCCGGG CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
4 51 TTGCTCGGGG CTCTGCTTCG CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
501 GCCACACCCA GCCAGCATCA CCAGCAGCAG CAGCAGCAGA TCAAACGGTC 
551 AGCCCGCATG TGTGGTGAGT GTGAGGCATG TCGGCGCACT GAGGACTGTG 
601 GTCACTGTGA TTTCTGTCGG GACATGAAGA AGTTCGGGGG CCCCAACAAG 
651 ATCCGGCAGA AGTGCCGGCT GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
7 01 GTACAAGTAC TTCCCTTCCT CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
751 TGCCAAGGCC CCGCCGGCCA CTGCCCACCC AACAGCAGCC ACAGCCATCA 
801 CAGAAGTTAG GGCGCATCCG TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
851 AGTCAAGCAG CCTCCTGAGG CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
901 AGGACCTACC TCTGGATCCT GACCTGTATC AGGACTTCTG TGCAGGGGCC 
951 TTTGATGACC ATGGCCTGCC CTGGATGAGC GACACAGAAG AGTCCCCATT 

1001 CCTGGACCCC GCGCTGCGGA AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
1051 GTCGGGAGAA GAAGTCTGAG AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
1101 CGGCAGAAGC AGAAGCACAA GGATAAATGG AAACACCCAG AGAGGGCTGA 
1151 TGCCAAGGAC CCTGCGTCAC TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
1201 GCCCCGCCCA GCCCAGCTCC AAGTATTGCT CAGATGACTG TGGCATGAAG 
1251 CTGGCAGCCA ACCGCATCTA CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
1301 GCAGCAGAGC CCTTGCATTG CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 
1351 GCATTCGCCG AGAGCAGCAG AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
1401 CGCCGATTCC ATGAGCTTGA GGCCATCATT CTACGTGCCA AGCAGCAGGC 
1451 TGTGCGCGAG GATGAGGAGA GCAACGAGGG TGACAGTGAT GACACAGACC 
1501 TGCAGATCTT CTGTGTTTCC TGTGGGCACC CCATCAACCC ACGTGTTGCC 
1551 TTGCGCCACA TGGAGCGCTG CTACGCCAAG TATGAGAGCC AGACGTCCTT 
1601 TGGGTCCATG TACCCCACAC GCATTGAAGG GGCCACACGA CTCTTCTGTG 
1651 ATGTGTATAA TCCTCAGAGC AAAACATACT GTAAGCGGCT CCAGGTGCTG 
1701 TGCCCCGAGC ACTCACGGGA CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
1751 GTGCCCCCTT GTACGTGATG TCTTTGAGCT CACGGGTGAC TTCTGCCGCC 
18 01 TGCCCAAGCG CCAGTGCAAT CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
1851 GCGGAAGTGG ACTTGGAGCG CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
1901 GTTTGAGCAG GAGCGCAATG TGCGCACAGC CATGACAAAC CGCGCGGGAT 
1951 TGCTGGCCCT GATGCTGCAC CAGACGATCC AGCACGATCC CCTCACTACC 
2001 GACCTGCGCT CCAGTGCCGA CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
2051 ACCCTGCATT CCAGATGGGG GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
2101 CCACTCATCT GTTTCTCCGG TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
2151 CCCATCTGCC TTTATCAGAG GGACTGTCCC CGTCGACATG TTCAGTGCCT 
2201 GGTGGGGCTG CGGAGTCCAC TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
22 51 TAATAAAATT TTGAAGAAAC CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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Entry HS557771 from database EMBLEST : 

Human chromosome 18 clone 2 mRNA sequence. 

Score = 7582, P = O.Oe+00, identities = 1560/1598 

Entry HSZ78337 from database EMBLEST: 

H. sapiens mRNA, expressed sequence tag ICRFp507H02194 (5') 
Score = 6339, P = 9.0e-281, identities = 1307/1347 

Entry HS095149 from database EMBL: 
human STS WI-6941. 
Score = 1210, P = 2.2e-49, identities = 246/251 



Medline entries 



98449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins . 

9824997: 

Gene silencing by methyl-CpG-binding proteins. 



Peptide information for frame 3 



ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 



1 MEGDGSDPEP 
51 HGDCIRITEK 
101 EPRDEGGGRK 
151 PSQHHQQQQQ 
201 QKCRLRQCQL 
251 LGRIREDEGA 
301 DHGLPWMSDT 
351 KQKHKDKWKH 
401 ANRIYEILPQ 
451 FHELEAIILR 
501 HMERCYAKYE 
551 EHSRDPKVPA 
601 VDLERVRVWY 
651 RSSADR 



PDAGEDSKSE 
MAKAIREWYC 
RPVPDPDLQR 
QIKRSARMCG 
RARESYKYFP 
VASSTVKEPP 
EESPFLDPAL 
PERADAKDPA 
RIQQWQQSPC 
AKQQAVREDE 
SQTSFGSMYP 
DEVCGCPLVR 
KLDELFEQER 



NGENAPIYCI 
RECREKDPKL 
RAGSGTGVGA 
ECEACRRTF.D 
SSLSPVTPSE 
EATATPEPLS 
RKRAVKVKHV 
SLPQCLGPGC 
IAEEHGKKLL 
ESNEGDSDDT 
TRIEGATRLF 
DVFELTGDFC 
NVRTAMTNRA 



CRKPDINCFM 
EIRYRHKKSR 
MLARGSASPH 
CGHCDFCRDM 
SLPRPRRPLP 
DEDLPLDPDL 
KRREKKSEKK 
VRPAQPSSKY 
ERIRREQQSA 
DLQIFCVSCG 
CDVYNPQSKT 
RLPKRQCNRH 
GLLALMLHQT 



IGCDNCNEWF 
ERDGNERDSS 
KSSPQPLVAT 
KKFGGPNKIR 
TQQQPQPSQK 
YQDFCAGAFD 
KEERYKRHRQ 
CSDDCGMKLA 
RTRLQEMERR 
HPINPRVALR 
YCKRLQVLCP 
YCWEKLRRAE 
IQHDPLTTDL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_4f 17, frame 3 

TREMBL : CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid 
F52B11, N = 2, Score = 316, P = 8.8e-27 

TREMBL :HSAB2331_1 gene: "KIAA0333"; Human mRNA for KIAA0333 gene, 
partial cds . , N = 2, Score = 163, P = 2.8e-13 

TREMBL :SPCC594_5 gene: "SPCC594 . 05c" ; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.pombe 
chromosome III cosmid c594., N = 3, Score = 168, P = 3.6e-12 

TREMBL :AF072240_1 gene: "Mbdl"; product: "methyl-CpG binding protein 
MBDl"; Mus musculus methyl-CpG binding protein MBD1 (Mbdl) mRNA, 
complete cds., N = 2, Score = 189, P = 7.6e-ll 



>TREMBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52B11 
Length = 523 

HSPs: 

Score = 316 (47.4 bits), Expect = 8.8e-27, Sum P(2) = B . 8e-27 
Identities = 100/336 (29%), Positives = 167/336 (49%) 
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Query : 


333 


REKKSEKKKEERYKRHRQ-KQKHKDKWKHPERADAKDPASLP-QCLGPGCVRPAQPSSKY 


390 






+ ++K+ E Y +R +Q+ D + + +A +P P QCL P C+ ++ SKY 




Sbjct: 


118 


QQRKANIINERDYVPNRPTRQQSADLRRKRTQLNA-EPDKHPRQCLNPNCI YESRIDSKY 


176 


Query : 


391 


CSDDCGMKLAANRIYEILPQRIQQW QQSPCI AEEHGKKLLERI RREQQS ARTRLQ 


445 






CSD+CG +LA R+ EILP R +Q+ P E+ K +1 RE Q + 




Sbjct: 


177 


CSDECGKELARMRLTEILPNRCKQYFFEGPSGGPRSLEDEIKPKRAKINREVQKLTESEK 


236 


Query : 


446 


EMERRFHEL-EAIILRAKQQAVREDEESNEGDSDDTDLQIFCVSCGHPINPRVAL-RHME 


503 






M ++L E I + K Q + +E D +L C+ CG P P + +H+E 




Sbjct: 


237 


NMMAFLNKLVEFI KTQLKLQPLGTEERY DDNLYEGCI VCGLPDI PLLKYTKHIE 


290 


Query : 


504 


RCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKTYCKRLQVLCPEHSRDPKVPADEV 


563 






C+A+ E SFG+ P + +C+ Y+ ++ ++CKRL+ LCPEH + +V 




Sbjct: 


291 


LCWARS EKAI S FGA — PEK--NNDMFYCEKYDSRTNSFCKRLKSLCPEHRKLGDEQHLKV 


346 


Query : 


564 


CGCP LVRDVFELTGDF CRLPKRQCNRHYCWEKLRRAEVDLERVR 


607 






CG P V ++ E+ F CR K C++H+ W R ++LE+ 




Sbjct: 


347 


CGYPKKWEDGMIETAKTVSELIEMEDPFGEEGCRTKKDACHKHHKWIPSLRGTIELEQAC 


406 


Query: 


608 


VWYKLDELFEQ — ERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSA 654 








+ + K+ EL + + N T A L++M+H+ + + LR+ A 




Sbjct: 


407 


LFQKMYELCHEMHKLNAHAEWTTNA — LSIMMHKQPSTEKCSFFLRNFA 4 53 




Score 


= 53 


(8.0 bits). Expect = 8.8e-27, Sum P(2) = 8.8e-27 




Identities = 


= 24/100 (24%), Positives = 41/100 (41%) 




Query : 


169 


CGECEACRRTEDCGHCDFCR DMKK-FGGPNKIRQKCRLRQCQLRARESYKYFPSS 


222 






C C C ++CG C CR DM+K F +K + RQ + + + 




Sbjct: 


17 


CMNCIRCNDEKNCGTCWPCRNGKTCDMRKCFSAKRLYNEKVK-RQTDENLK-AIMAKTAQ 


74 


Query: 


223 


LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 2 64 








+ + P P+ +QQ + +K GR + G A++ 




Sbjct: 


75 


REAAHQAATTTAPSAPVVIEQQVE-KKKRGRKK.GSGNGGAAAA 116 




Score 


= 48 


(7.2 bits), Expect = 2.9e-26, Sum P(2) - 2.9e-26 




Identities ■ 


= 13/39 (33%), Positives = 19/39 (48%) 




Query: 


179 


EDCGHCDFCRDMKKFGG — PNKI RQKCRLRQCQLRARESY 216 








EC+CCDKG P++C +R+C A+ Y 




Sbjct: 


15 


ERCMNCIRCNDEKNCGTCWPCRNGKTCDMRKC-FSAKRLY 53 





Pedant information for DKFZphtes3_4f 17, frame 3 



Report for DKFZphtes3_4f 17 . 3 



[ LENGTH] 656 

[MW] 75711.71 

[pi] 8.61 

[HOMOL] TREMBL : CEF52B114 gene: 



'F52B11.1"; Caenorhabditis elegans cosmid F52B11 3e-25 



[FUNCAT] 

[FUNCAT] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 

[KM] 



99 unclassified proteins [S. cerevisiae, YPL138c] 3e-10 

04.05.01.04 transcriptional control [S. cerevisiae, YNL097c] 2e-04 

MYRISTYL 6 

AMI DAT I ON 2 

CK2_PHOSPHO_SITE 8 

TYR_PHOSPHO_SITE 3 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 9 

All_Alpha 

LOWCOMPLEXITY 18.75 % 

COILED COIL 4.57 % 



SEQ MEGDGSDPEPPDAGEDSKSENGENAPIYCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 

SEG 

prd cccccccccccccccccccccccccceeeeeeccccceeeeecccccccccccchhhhhh 

COILS 

SEQ MAKAI REWYCRECREKDPKLEI RYRHKKSRERDGNERDSSEPRDEGGGRKRPVPDPDLQR 

SEG 

PRD hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

COILS 

SEQ RAGSGTGVGAMLARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARMCGECEACRRTED 

SEG xxxxxxxxx 

PRD cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 

COILS 
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SEQ CGHCDFCRDMKKFGGPNKIRQKCRLRQCQLRARESYKYFPSSLSPVTPSESLPRPRRPLP 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 

COILS 

SEQ TQQQPQPSQKLGRIREDEGAVASSTVKEPPEATATPEPLSDEDLPLDPDLYQDFCAGAFD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

COILS 

SEQ DHGLPWMSDTEESPFLDPALRKRAVKVKHVKRREKKSEKKKEERYKRHRQKQKHKDKWKH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchh 

COILS 

SEQ PERADAKDPASLPQCLGPGCVRPAQPSSKYCSDDCGMKLAANRIYEILPQRIQQWQQSPC 

SEG 

PRD hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 

COILS 

SEQ IAEEHGKKLLERIRREQQSARTRLQEMERRFHELEAIILRAKQQAVREDEESNEGDSDDT 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



SEQ DLQIFCVSCGHPINPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 

SEG x 

PRD ceeeeeeeccccccccchhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccc 

COILS 

SEQ YCKRLQVLCPEHSRDPKVPADEVCGCPLVRDVFELTGDFCRLPKRQCNRHYCWEKLRRAE 

SEG 

PRD cchhhhhhhccccccccccceeeeccccchhhhhccccccccccccccchhhhhhhhhhh 

COILS 

SEQ VDLERVRVWYKLDELFEQERNVRTAMTNRAGLLALMLHQT IQHDPLTTDLRSSADR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 

COILS 



Prosite for DKFZphtes3_4f 17 . 3 



PS00002 


124- 


■>128 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


5£ 


i->61 


PKC_PHOSPHO 


SITE 


PDCC00005 


PS00005 


165- 


•>168 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


215- 


■>218 


PKC PHOSPHO" 


"SITE 


FDOC00005 


PS00005 


248- 


•>251 


PKC PHOSPHO" 


SITE 


PDCC0C005 


PS00005 


265- 


■>268 


PKC PHOSPHO" 


"site 


PDCC0C005 


PS00005 


337- 


•>340 


PKC PHOSPHO" 


"site 


PDCC00005 


PS00005 


387- 


•>390 


PKC PHOSPHO" 


"she 


PDOC00005 


PS00005 


439- 


■>442 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


627- 


>630 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


i 


:->io 


CK2 PHOSPHO^ 


"site 


PDOC00006 


PS00006 


17 


->21 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


227- 


>231 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


265- 


>269 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


280- 


>284 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


308- 


■>312 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


521- 


>525 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


652- 


>656 


CK2_PHOSPHO" 


"site 


PDOC00006 


PS00007 


339- 


>346 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007' 


500- 


>507 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


211- 


>219 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


42 


->48 


MYRISTYL 




PDOC00008 


PS00008 


123- 


>129 


MYRISTYL 




PDOC00008 


PS00008 


125- 


>131 


MYRISTYL 




PDOC00008 


PS00008 


129- 


>135 


MYRISTYL 




PDOC00008 


PS00008 


259- 


>265 


MYRISTYL 




PDOC00008 


PS00008 


396- 


>402 


MYRISTYL 




PDOC00008 


PS00009 


107- 


>111 


AMIDATION 




PDOC00009 


PS00009 


425- 


■>429 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphtes3_4f 17 .3) 
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DKFZphtes3_4f5 



group: signal transduction 

DKFZphtes3_4f5 . 3encodes a novel 790 amino acid protein similar to beta-transducins . 

The protein contains 3 WD-40 repeats, which are typical for the beta-transducin subunit of G 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding site signature is present. The protein is larger (790 amino acids)than the usual 
eukaryotic G-beta transducins (about 340 amino acids) . 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to S.pombe "beta-transducin" 

complete cDNA, EST hits 
complete cds, 

on genomic level encoded by HS313D11, at least 7 exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /map="16pl3 . 3" 
Insert length: 3166 bp 

No poly A stretch found, no polyadenylation signal found 



1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 

51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CTGGTTCTCG 

101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC AGGAACCCTG 

151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 

201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 

251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 

301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 

351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 

401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 

451 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 

501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 

551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 

601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG GGACCAAGCA 

651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG ACCCACCCAG 

701 GCTGACCAGG CCAGCCCACC TCACTGACCT CCTGACCCCT GACCTCATCA 

751 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG TGACCACAGC CCTGGGTGGC 

801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATG CTCCCGCCAA 

851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 

901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 

951 AACCTGCGTG TGGGGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 

1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 

1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 

1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 

1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 

1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 

1251 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 

1301 CTCCACCTTT GAGAACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 

1351 ACCGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 

1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 

1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG GAGATGCACT 

1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 

1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 

1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT GAGGAACACC 

1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC ACCCCCACGA CCCCTCCTTC 

1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 

1751 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 

1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT GGCTGCCGAG 

1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 

1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 

1951 TCAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCG CTGGTTTGTG 

2001 GACACAGCTG AGCGTTATGC GCTGGCTGGC CGGCCACTGG CCGAGCTCTG 

2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 

2101 AAACGTGGAC CATGCTGCGG AT CATC TACT GCAGCCCTGG CCTAGTGCCC 

2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 

2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 

2251 AGACGCGGCT GGACCGCAGC AAAGGAGATG CACGGAGCGA CACAGTTCTG 

2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AGGAAACCGA 

2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 
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2401 AGGACGAGCT GTACCTGCTG 

24 51 GAGTGCGTGC TGCCGCAGGA 

2501 CACGCCTCCC GGACCCGAGC 

2551 TGAGCGGCAG CGAGGCGGAT 

2601 TTCTCGCTCC TGTCTGTCTC 

2 651 CGACTTCTTC GGCGTGCTGG 

2701 AGGGCGACGT GCAGATGGCT 

2751 GTGCGCAAGG ACATCGACGA 

2801 CATCGACCTG CTGCAGCGCT 

28 51 TCAAGCTGAG CACCAGCCGC 

2901 ACCCTGCACG TCAACTGCAG 

2951 CTGGGTCTGC GACAGGTGCC 

3001 ACCACGTAGT CAAGGGTCTC 

3051 GGCCACCTGC AGCACATCAT 

3101 CGCAGGCTGC GGCCACCTCT 

3151 CTTGCCCGGG CGGCCG 



GATCCGGAAC ACGCGCACCC CGAGGACCCT 
GGCCTTTCCG CTGCGCCACG AGATCGTGGA 
ACCTGCAGGA CAAGGCCGAC TCCCCGCACG 
GTGGCCTCCC TGGCCCCCGT GGACTCCTCC 
ACACGCGCTC TACGACAGCC GCCTGCCGCC 
TGCGCGACAT GCTGCACTTC TACGCTGAGC 
GTGTCTGTGC TCATCGTCCT GGGTGAACGG 
GCAGACCCAG GAGCACTGGT ACACTTCCTA 
TCCGCCTCTG GAACGTGTCC AACGAGGTGG 
GCCGTCAGCT GCCTCAACCA GGCCTCCACC 
CCACTGCAAG CGGCCCATGA GCAGCCGGGG 
ACCGCTGCGC CAGCATGTGT GCCGTCTGCC 
TTCGTGTGGT GCCAGGGCTG CAGCCACGGC 
GAAGTGGCTG GAAGGCAGCT CCCACTGTCC 
GCGAGTACTC CTGACGGGGC ATCTGCTGGG 



BLAST Results 



Entry HS313D11 from database EMBL: 

Human DNA sequence from cosmid 313D11 from a contig on the short arm of 
chromosome 16. Contains ESTs, STS and CpG islands. 
Score = 6238, P = 0.0e+00, identities = 1318/1391 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 



1 MEKMSRVTTA 
51 IYAIEEEQFV 
101 TWNLGRPSRN 
151 RRKDSVSTFS 
2 01 RMFTAHNGPV 
251 IASVARVKWR 
301 TGI AWRHPHD 
351 LAFAAKESLV 
401 ETEPGGGGMR 
451 MLRIIYCSPG 
501 DRSKGDARSD 
551 YLLDPEHAHP 
601 EADVASLAPV 
651 QMAVSVLIVL 
701 TSRAVSCLNQ 
751 KGLFVWCQGC 



LGGSVLTGRT 
EKLNLRVGRK 
KQDQLFTEHK 
GQSESVRDVQ 
FCCDWHPEDR 
PECRHHLATC 
PSFLLSGSKD 
AAESGRKPYT 
WFVDTAERYA 
LVPTANLNHS 
TVLLDSSATL 
EDPECVLPQE 
DSSFSLLSVS 
GERVRKDIDE 
ASTTLHVNCS 
SHGGHLQHIM 



MHCHLDAPAN 
PSLNLSCADV 
RTVNKVCFHP 
FSIRDYFTFA 
GWLATGGRDK 
SMMVDHNIYV 
SSLCQHLFRD 
GDRRHPIFFK 
LAGRPLAELC 
VGKGGSCGLP 
ITNEDNEETE 
AFPLRHEIVD 
HALYDSRLPP 
QTQEHWYTSY 
HCKRPMSSRG 
KWLEGSSHCP 



AISVCRDAAQ 
VWHQMDENLL 
TEAHVLLSGS 
STFENGNVQL 
MVKVWDMTTH 
WDVRRPFVPA 
ASQPVERANP 
RKLDPAEPFA 
DHNAKVAREL 
LMNSFNLKDM 
GSDVPADYLL 
TPPGPEHLQD 
DFFGVLVRDM 
IDLLQRFRLW 
WVCDRCHRCA 
AGCGHLCEYS 



VVVAGRSIFK 
ATAATNGVVV 
QDGFMKCFDL 
WDIRRPDRCE 
RAKEMHCVQT 
AMFEEHRDVT 
EGLCYGLFGD 
GLASSALSVF 
GRNQVAQTWT 
APGLGSETRL 
GDVEGEEDEL 
KADSPHVSGS 
LHFYAEQGDV 
NVSNEVVKLS 
SMCAVCHHVV 



BLASTP hits 



Entry YDSB_SCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN C4F8.11 IN 
CHROMOSOME I. >TREMBL : SPAC4F8_11 gene: "SPAC4F8.il"; product: 
"beta-transducin"; S.pombe chromosome I cosmid c4F8. 
Score - 404, P = 3.0e-42, identities = 169/639, positives = 278/639 

Entry PEX7_HUMAN from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) . 
>TREMBL:HSU76560_1 gene: "Pex7"; product: "peroxisome targeting signal 
2 receptor"; Human peroxisome targeting signal 2 receptor (Pex7) mRNA, 
complete cds. >TREMBL : HSU88871_1 gene: "HsPEX7"; product: "HsPex7p"; 
Human HsPex7p (HSPEX7) mRNA, complete cds. 

Score = 220, P = l.le-15, identities = 62/244, positives = 107/244 
Entry PEX7M0USE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7). 
>TREMBL:MMU69171_1 product: "peroxisomal PTS2 receptor"; Mus musculus 
peroxisomal PTS2 receptor mRNA, complete cds. 

Score = 214, P = 5.3e-15, identities = 60/240, positives = 106/240 
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Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17.7"; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score = 232, P = 3.4e-14, identities = 68/260, positives = 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138C - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCYOL138C_l 3. cerevisiae chromosome XV reading frame ORE 
YOL138C 

Score = 136, P = 2.5e-13, identities = 24/77, positives = 44/77 



Alert BLASTP hits for DKFZphtes3_4f 5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_4f 5, frame 3 



Report for DKFZphtes3_4 f 5 . 3 



[LENGTH] 7 90 

[MW] 88207.10 

[pi] 6.05 

[HOMOL] SWISSPROT:YDSB_SCHPO HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 

C4F8.11 IN CHROMOSOME I. 9e-44 



[FUNCAT] 

[FUNCAT] 

[ FUNCAT 1 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

3e-10 

[FUNCAT] 

TAF90 - 



99 unclassified proteins [S. cerevisiae, YOL138c] 5e-16 

10.04.09 regulation of g-protein activity [S. cerevisiae, YBR195C] 3e-ll 

06.10 assembly of protein complexes [S. cerevisiae, YBR195c] 3e-ll 

03.16 dna synthesis and replication [S. cerevisiae, YBR195c] 3e-ll 

09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 3e-ll 

04.05.01.07 chromatin modification [S. cerevisiae, YBR195c] 3e-ll 

30.10 nuclear organization [S. cerevisiae, YCR072c beta-transducin family] 



04.05.01.01 general transcription activities 
TFIID subunit] 9e-09 



[S. cerevisiae, YBR198c 



[FUNCAT] 

[FUNCAT] 

YDL195W] 2e-07 

[ FUNCAT ] 

2e-07 

[FUNCAT] 

[ FUNCAT ] 

4e-07 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

le-05 

[FUNCAT] 

palmitylation, 

[FUNCAT] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 



04.01.04 rrna processing [S. cerevisiae, YLLOllw] le-07 

30.09 organization of intracellular transport vesicles [S. 



cerevisiae. 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDL195w] 



30.19 peroxisomal organization [S. cerevisiae, YDR142c] 4e-07 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR142C] 

08.10 peroxisomal transport [S. cerevisiae, YDR142c] 4e-07 
08.01 nuclear transport [S. cerevisiae, YER107c] 4e-07 

04.07 rna transport [S. cerevisiae, YER107c] 4e-07 

30.03 organization of cytoplasm [S. cerevisiae, YER107c] 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL003c] 
06.13 proteolysis [S. cerevisiae, YGL003c] 5e-07 
04.05.01.04 transcriptional control [S. cerevisiae, 
04.05.03 mrna processing (splicing) [S. cerevisiae, 
03.13 meiosis [S. cerevisiae, YLR129w] 3e-06 
03.25 cytokinesis [S. cerevisiae, YCR057c] le-05 

03.04 budding, cell polarity and filament formation 



4e-07 
5e-07 



YCR084c] 8e-07 
YPR178w] le-06 



[S. cerevisiae, YCR057c] 



06.07 protein modification (glycolsylation, acylation, myristylation, 

f arnesylation and processing) [S. cerevisiae, YEL055w] 2e-04 

30.04 organization of cytoskeleton [S. cerevisiae, YOR272w] 6e-04 

dlgotb_ 2.46.3.1.1 betal-subunit of the signal-transducing 5e-06 

duplication 7e-10 

signal transduction 7e-08 

peroxisome 9e-06 

heterotrimer 7e-08 

GTP binding 7e-08 

peroxisome biogenesis 9e-06 

transmembrane protein le-14 

MSI1 protein 7e-10 

WD repeat homology le-14 

GTP-binding regulatory protein beta chain 7e-08 
PRL1 protein 3e-08 

coatomer complex beta' chain le-06 

CYTOCHROME_C 1 

WD_REPEATS 3 

MYRISTYL 10 

AM I DAT I ON 2 

CAMP_PHOSPHO_SITE 2 

CK2 PHOSPHO SITE 11 
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[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 4 

[PFAM] WD domain, G-beta repeats 

[KW] All_Beta 

[KW] 3D 

[KW] LOW_COMPLEXITY 2.28 % 



SEQ MEKMSRVTTALGGSVLTGRTMHCHLDAPANAISVCRDAAQVVVAGRSIFKI YAIEEEQFV 

SEG 

lgotB 

SEQ EKLNLRVGRKPSLNLSCADVVWHQMDENLLATAATNGVVVTWNLGRPSRNKQDQLFTEHK 

SEG 

lgotB TTCEEEEEETTTEEEEEET-TTTCEEE-- EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVQFSIRDYFTFA 

SEG 

lgotB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTTEEEEEECBTTCCEEEEEETTTTTEEEE 

SEQ STFENGNVQLWDIRRPDRCERMFTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWDMTTH 

SEG 

lgotB E-ETTTEEEEEETTTTEEEE-EEECCCCCEEEEEE-TTTTCCEEEEETTTEEEEEC . . . . 

SEQ RAKEMHCVQTIASVARVKWRPECRHHLATCSMMVDHNIYVWDVRRPFVPAAMFEEHRDVT 

SEG 

lgotB 

SEQ TGIAWRHPHDPSFLLSGSKDSSLCQHLFRDASQPVERANPEGLCYGLFGDLAFAAKESLV 

SEG 

lgotB 

SEQ AAESGRKPYTGDRRHPIFFKRKLDPAEPFAGLASSALSVFETEPGGGGMRWFVDTAERYA 

SEG 

lgotB 

SFQ LAGRPLAELCDHNAKVARELGRNQVAQTWTMLRIIYCSPGLVPTANLNHSVGKGGSCGLP 

SEG 

lgotB 

SEQ LMNSFNLKDMAPGLGSETRLDRSKGDARSDTVLLDSSATLITNEDNEETEGSDVPADYLL 

SEG xxxx 

lgotB 

SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPQEAFPLRHEIVDTPPGPEHLQDKADSPHVSGS 

SEG xxxxxxxxxxxxxx 

lgotB 

SEQ EADVASLAPVDSSFSLLSVSHALYDSRLPPDFFGVLVRDMLHFYAEQGDVQMAVSVLIVL 

SEG 

lgotB 

SEQ GERVRKDIDEQTQEHWYTS YIDLLQRFRLWNVSNEVVKLSTSRAVSCLNQASTTLHVNCS 

SEG 

lgotB 

SEQ HCKRPMSSRGWVCDRCHRCASMCAVCHHVVKGLFVWCQGCSHGGHLQHIMKWLEGSSHCP 

SEG 

lgotB 

SEQ AGCGHLCEYS 

SEG 

lgotB 



Prosite for DKFZphtes3_4f5 . 3 



PS00001 


74->78 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


468->472 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00001 


691->695 


ASN 


GLYCOSYLATION 


pdocooooi 


PS00001 


718->722 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


69->73 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


152->156 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


17->20 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


165->168 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


172->175 


PKC 


"PHOSPHO SITE 


PDOCG0005 


PS00005 


239->242 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


364->367 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


701->704 


PKC 


PHOSPHO SITE 


PDOC00005 
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PS00005 


727- 


•>730 


PKC PHOSPHO 


SITE 


PDOC00005 


pennon f, 


76->80 


CK2 _ PHOSPHO" 


"site 


PDOC00006 


PS00006 


165- 


■>169 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


172- 


■>17 6 


CK2 _ PHOSPHO" 


SITE 


PDOC00006 


pc nnno 


181- 


>185 


CK2 PHOSPHO 


SITE 


PDOC00006 


tro \j \j \j \j \j 


398- 


•>402 


CK2~PH0SPH0 


SITE 


PDOC00006 




4 98- 


>502 


CK2 _ PHOSPHO" 


SITE 


PDOC00006 


PS00006 


503- 


>507 


CK2 PHOSPHO 


"site 


PDOC00006 




522- 


>52 6 


CK ?~~PHOSPHO~ 


SITE 


PDOC00006 


PS00006 


598- 


■>602 


CK2 PHOSPHO 


SITE 


PDOC00006 


r j v w w j u 


600- 


->S04 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


679- 


■>683 


CK2 PHOSPHO 


SITE 


PDOC00006 




337- 


■>34 6 


TYR PHOSPHO 


SITE 


PDOC00007 


PS00008 


1 j 


i->l 9 


MYRI STYL 




PDOC00008 




97- 


■>103 


MYRI STYL 




PDOC00008 


pennnnft 


139- 


■>14 5 


MYRI STYL 




PDOC00008 




161- 


■>167 


MYRI STYL 




PDOC00008 


C V U V \J u 


317- 


>323 


MYRI STYL 




pdoooooob 


pennons 


342- 


•>348 


MYRI STYL 




PDOC00008 


psnnonp, 

r o w u u u u 


391- 


•>397 


MYRI STYL 




PDOC00008 


PS00008 


460- 


•>466 


MYRISTYL 




PDOC00008 


PS00008 


474- 


>480 


MYRI STYL 




PDOC0000B 


PS00008 


759- 


•>765 


MYRISTYL 




PDOC00008 


PS00009 


67 


'->71 


AMIDATION 




PDOC00009 


PS00009 


364- 


•>368 


AM I DAT I ON 




PDOC00009 


PS00190 


743- 


•>749 


CYTOCHROME C 


PDOC00169 


PS00678 


90- 


•>105 


WD REPEATS 




PDOC00574 


PS00678 


223- 


•>238 


WD REPEATS 




PDOC0057 4 


PS00678 


269- 


•>284 


WD REPEATS 




PDOC0057 4 



Pfam for DKFZphtes3_4f 5 . 3 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFIvSGSWDgTCRLWD* 

++ HN++V C+ ++P+- R +++G++D+ +++WD 
Query 203 FTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWD 236 
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DKFZphtes3_4h6 



group: intracellular transport/trafficking 

DKFZphtes3_4h6 encodes a novel 622 amino acid protein with strong similarity to the kinesin 
light chain. 

Kinesin is a microtubule-based motor protein that pulls vesicles or organelles towards the 
plus end of microtubules. Structural changes in the protein that drive motility are coupled to 
ATP binding and hydrolysis. The novel protein is similar to kinesin light chain, which is part 
of the functional kinesin holoenzyme tetrameric protein. The light chain has been proposed to 
function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
of the heavy chain. The novel protein contains two kinesin light chain repeats and one RGD 
cell-attachment site. 

The novel kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 



strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2992 bp 

Poly A stretch at pos . 2914, polyadenylation signal at pos . 2893 



1 GGCGGGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 
51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 
101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 
151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 
201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 
251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 
301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 
351 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 
4 01 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GGCGCAGGTG CGGCGTCTGG 
4 51 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCCAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
601 ACGAGGAGAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTCTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
7 01 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CGACACTAAA 
1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 
1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG ACAAGGTCCT GGGCAAGTTT 
1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 
1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 
1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 
12 51 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 
1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC TCATGAGAAA GAGTTTGGCT 
1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 
1401 GAAAGCAAGG ATAAGCGCCG GGACAGCGCC CCCTATGGGG AATACGGCAG 
1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 
1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 
1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 
1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 
1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 
1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 
1751 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 
1801 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 
1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 
1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 
1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 
2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 
2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 
2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 
2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 
2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 
2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 
2301 CCTCCCCAGA CCCCAGAGCC AAGAAC AC T A AGCACTCGCC GGCCCTTCGG 
2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 
2401 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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2 4 51 GGCCTCCCCT CGTCCCTCTT CTAGTGGTAC CGCCCAGGCC TTAATCACCC 

2 501 CCATTCCGTG CGGTGGTATC TCCCAGGCTC TACATTCTCG GGAGCGGCGC 

2551 CTCCCAAGGG GGTCCTGGGA CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 

2 601 ATGCGTCCTA CCCGCGCCAT CGCCCCGTGG CCCAGGACGG GGACCTCCCC 

2 651 TTAGTCCGTC CTCCCACCGC CGGGCCCTGC CCCGCATCCC GGCCTTATGC 

2701 ACTGCCCCTC CCACCCGGCC CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 

2751 CACCGCCCAC CGAGCCATCC TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 

2 801 TCGCGAGGGG CGGCGACGGT CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 

2851 GCGGGTGAGG CGGCTGCTCT CTATTTTCAG ATGTTGCTGT AGAAAT AAAG 

2901 ACGGTTTAAA TCTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2 951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



98288268 : 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 



Peptide information for frame 3 



ORF from 14 4 bp to 2009 bp; peptide length: 622 
Category: strong similarity to known protein 
Prosite motifs: RGD (502-505) 
KINESIN_LIGHT (223-265) 
KINESIN LIGHT (265-307) 



1 MAMMVFPREE 
51 EPGSQERCIL 
101 RLVQENQWLR 
151 SPNEEKGDVP 
201 LHNLVIQYAS 
251 RDQNKYKEAA 
301 AEPLCKRALE 
351 LEI YATRLGP 
401 FGSVNGDMKP 
4 51 TLRSLGALYR 
501 RRGDRRSSRD 
551 DALRRSSEML 
601 LSDSRTLSSS 



KLSQDEIVLG 
LRRSLEAIEL 
EELAGTQQKL 
KDTLDDLFPN 
QGRYEVAVPL 
HLLNDALAIR 
IREKVLGKFH 
DDPNVAKTKN 
IWMHAEEREE 
RQGKLEAAHT 
MAGGAGPRSE 
VKKLQGGTPQ 
SMDLSRRSSL 



TKAVIQGLET 
GLGEAOVILA 
QRSEQAVAQL 
EDEQSPAPSP 
CKQALEDLEK 
EKTLGKDHPA 
PDVAKQLSNL 
NLASCYLKQG 
SKDKRRDSAP 
LEDCASRNRK 
SDLEDVGPTA 
EPPNPRMKRA 
VG 



LRGEHRALLA 
LSSHLGAVES 
EEEKQHLLFM 
GGGDVSGQHG 
TSGHDHPDVA 
VAATLNNLAV 
ALLCQNQGKA 
KYQDAETLYK 
YGEYGSWYKA 
QGLDPASQTK 
EWNGDGSGSL 
SSLNFLNKSV 



PLVAPEAGEA 
EKOKLRAQVR 
SQIRKLDEDA 
GYEIPARLRT 
TMLNILALVY 
LYGKRGKYKE 
EEVEYYYRRA 
EILTRAHEKE 
CKVDSPTVNT 
VVELLKDGSG 
RRSGSFGKLR 
EEPTQPGGTG 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4h6, frame 3 

TREMBL:AF05 5 666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds . , N = 1, Score 
= 2824, P = 4e-294 

PIR: 153013 kinesin light chain - human, N — 1, Score = 1927, P = 
4.5e-199 

PIR:C41539 kinesin light chain C - rat, N = 1, Score = 1919, P = 
3.2e-198 

SWISSPROT:KNLC_RAT KINESIN LIGHT CHAIN (KLC) . , N = 1, Score = 1919, P = 
3.2e-198 



>TREMBL:AF05 5666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds. 
Length = 599 

HSPs : 
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Score = 2824 (423.7 bits), Expect = 4.0e-294, P = 4.0e-294 
Identities = 558/598 (93%), Positives = 572/598 (95%) 

Query: 1 MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 60 

MA MV PREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPL + EAGEAEPGSQERC+L 
SbjCt: 1 MATMVLPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLASHEAGEAEPGSQERCLL 60 

Query: 61 LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 
SbjCt: 61 LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

Query: 121 QRSEQAVAQLEEEKQHLLFMSQT RKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 180 

QRSEQAVAQLEEEKQHLLFMSQIRKLDE P EEKGDVPKD+LDDLFPNEDEQSPAPSP 
Sbjct: 121 QRSEQAVAQLEEEKQHLLFMSQIRKLDE-MLPQEEKGDVPKDSLDDLFPNEDEQSPAPSP 179 

Query: 181 GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 

GGGDV+ QHGGYEI PARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
Sbjct: 180 GGGDVAAQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 239 

Query: 241 TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 300 

TMLNILALVYRDQNKYK+AAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 
Sbjct: 240 TMLNILALVYRDQNKYKDAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 299 

Query: 301 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 360 

AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEI YATRLGP 
Sbjct: 300 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 359 

Query: 361 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 420 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNG+NKPIWMHAEEREE 
SbjCt: 360 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGENKPIWMHAEEREE 419 

Query: 421 SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 480 

SKDKRRD P EYGSWYKACKVDSPTVNTTLR+LGALYR +GKLEAAHTLEDCASR+RK 
Sbjct: 420 SKDKRRDRRPM-EYGSWYKACKVDSPTVNTTLRTLGALYRPEGKLEAAHTLEDCASRSRK 478 

Query: 481 QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 540 

QGLDPASQTKVVELLKDGSGR G RR SRD+AG P+SESDLE+ GP AEW+GDGSGSL 
Sbjct: 479 QGLDPASQTKVVELLKDGSGR-GHRRGSRDVAG PQSESDLEESGPAAEWSGDGSGSL 534 

Query: 541 RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGG 598 

RRSGSFGKLRDALRRSSEMLV+KLQGG PQEP N RMKRASSLNFLNKSVEEP QPGG 
Sbjct: 535 RRSGSFGKLRDALRRSSEMLVRKLQGCGPQEP-NSRMKRASSLNFLNKSVEEPVQPGG 591 



Pedant information for DKFZphtes3_4h6, frame 3 



Report for DKFZphtes3_4h6 . 3 



[LENGTH] 


622 






[MW] 


68934.82 






[pi] 


6.72 






[HOMOL] 


TREMBL : AF05 5 666 1 gene: "Klc2"; product: "kinesin light chain 2"; Mus musculus 


kinesin light 


chain 2 (Klc2) mRNA, complete 


cds . 0 


.0 


[BLOCKS] 


BL00927C Trehalase proteins 






[BLOCKS] 


BL01160I Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160H Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160G Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160F Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160E Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160D Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160C Kinesin light chain 


repeat 


proteins 


[BLOCKS) 


BL01160B Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160A Kinesin light chain 


repeat 


proteins 


[SUPFAM] 


tetratricopeptide repeat homology le-07 


[PROSITE] 


RGD 1 






[PROSITE] 


MYRISTYL 8 






[PROSITE] 


KINESIN LIGHT 2 






[PROSITE] 


AMIDATION 2 






[PROSITE] 


CAMP_PHOSPHO SITE 5 






[PROSITE] 


CK2 PHOSPHO SITE 11 






[PROSITE] 


TYR PHOSPHO SITE 3 






[PROSITE] 


PKC PHOSPHO SITE 7 






[PROSITE] 


ASN_GLYCOS YLATION 2 






[PFAM] 


Kinesin light chain repeat 






[KW] 


All Alpha 






[KW] 


LOW COMPLEXITY 12.54 % 






[KW] 


COILED COIL 4.S8 % 
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SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 

COILS 



MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 
ccccchhhhhhhhhhhhhchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccchhhhh 

LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 
hhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhh 

cccccccccccc 

QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 
hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccccc 

ccccccccccccccccccc 

GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
cccccccccccccchhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhhhhccccccchh 



SEQ 
SEG 
PRD 
COILS 



TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 

xxxxxxxxxxxx 

hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhcccccchh 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEIYATRLGP 
hhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhccc 



DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 

xxxxx 

ccccccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhcoccccccchhhhhhhhhh 



SEQ 
SEG 
PRD 
COILS 



SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 

xxxxxxxx 

hhhhhccccccccccccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 



SEQ 
SEG 
PRD 
COILS 



QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 

xxxxxxxxxxxxxx xxxxx 

hhccchhhhhhhhhhccccccccccccccccccccccccccccccceeeecccccccccc 



SEQ 
SEG 
PRD 
COILS 



RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGGTG 

xxxxxxxxxx xxxx 

ccccccchhhhhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccc 



SEQ 
SEG 
PRD 
COILS 



LSDSRTLSSSSMDLSRRSSLVG 
xxxxxxxxxxxxxxxxxxxx . . 
cccccccccccchhhhhhcccc 



Prosite for DKFZphtes3_4h6 . 3 



PS00001 


449 


->453 


ASN GLYCOSYLATION 


PDOC00001 


PSQ0001 


587 


->591 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


425 


->429 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


505 


->509 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


554 


->558 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


578 


->582 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00004 


616 


->620 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


30->33 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


90->93 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


451 


->454 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


499 


->502 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


507 


->510 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


539 


->542 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


615 


->618 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


151- 


->155 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


163 


->167 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


232 


->236 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


470 


->474 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


507' 


->511 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


519- 


->523 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


521- 


->525 


CK2 PHOSPHO SITE 


PDOC00006 
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PS00006 


568- 


>572 


CK2 PHOSPHO 


SITE 


PDOC0 0 0 0 6 


C S\J\J v \J V 


589- 


>593 


CK2 PHOSPHO - 


"site 


PDOC00006 




610- 


>614 


CK2 PHOSPHO 


S ITE 


PDOC00006 




339- 


>346 


TYR PHOSPHO" 


"site 


PDOC00007 




339- 


•>347 


TYR PHOSPHO 


S ITE 


PDOC00007 


pennon? 


424- 


•>432 


TYR PHOSPHO~ 


SITE 


PDOC00007 




72 


->7 7 


MYRT STYT, 

11 1£\1 J 1 1 JJ 




PDOC0 0 0 08 


Dcnn n no 


8£ 


i->92 


MYRT 

rlllxlOl 1 l-i 




PDOC00008 


no n n°, 

E O U U U W o 


182- 


>188 


MYRT ^TYT, 

11 IA1 J 1 J. JJ 




PDOC00008 


pcfif) n DP. 

t O W w \J <J \) 


187- 


>193 


MYRISTYL 




PDOC00008 


PS00008 


402- 


>4 0B 


MYRISTYL 




PDOC00008 


PS00008 


482- 


>488 


MYRISTYL 




PDOC00008 


PS00008 


598- 


>604 


MYRISTYL 




PDOC00008 


PS00008 


600- 


>606 


MYRISTYL 




PDOC00008 


PS00009 


292- 


>296 


AMI DAT I ON 




PDOC00009 


PS00009 


499- 


>503 


AMI DAT I ON 




PDOC00009 


PS00016 


502- 


>505 


RGD 




PDOC00016 


PS01160 


223- 


>265 


KINESIN LIGHT 


PDOC00893 


PS01160 


265- 


>307 


KINESIN LIGHT 


PDOC00893 



Pfam for DKFZphtes3_4h6 . 3 



HMM_NAME 

HMM 

Query 

50.46 265 

Alignment to 
Query 

dkf zphtes3 

Query 

Alignment to 
HMM 

Query 

39.10 349 

Alignment to 
Query 

dkf zphtes3 



Kinosin light chain repeat 

♦RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
+ALED+-EKT+GHDHPDVATMLN+LALV+R+QNKY+E+ + ++N 
223 QALEDLEKTSGHDHPDVATMLNILALVYRDQNKYKEAAHLLN 



264 



306 1 42 dkf zphtes3_4h6. 3 strong similarity to Kinesin light chain 

HMM consensus: 

*RALEDREKtlGHDHPDVAtMLNNLALvCRHQNKYeEveNYYN* 
AL +REKTLG DHP VA LNNLA+++ ++KY+E+E + + 
265 DALAIREKTLGKDHPAVAATLNNLAVT.YGKRGKYKEAEPLCK 306 

348 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

HMM consensus: 

♦RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY+ 
307 RALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYR 348 

390 1 42 dkfzphtes3_4h6.3 strong similarity to Kinesin light chain 
HMM consensus : 

*RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
RALE+ LG D P+VA+ NNLA + Q+KY+++E +Y+ 
34 9 RALEIYATRLGPDDPNVAKTKNNLASCYLKQGKYQDAETLYK 390 
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DKFZphtes3_4ol9 



group: testes derived 

DKFZphtes3_4ol9 encodes a novel 1180 amino acid protein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome c family heme-binding site signature. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 



similarity to megakaryocyte stimulating factor and mucin 

complete cDNA, complete cds, EST hits (few) 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3767 bp 

Poly A stretch at pos. 3757, polyadenylation signal at pos . 3737 



1 GGCTAGGTTT AGCTTCAGGG GCAGCCCAGG GCAGTGTTGC TGCATATTGC 
51 ATGGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 
101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGACCC TTCAAGGCAG 
151 AGCTGACCTG TCCGGTAATC AAGGCAATGC AGCCGGCCGC CTAGCTACAG 
201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 
251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 
301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 
351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGGCTGTGGT CGAGAGCCAG 
401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 
451 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 
501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 
551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 
601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 
651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 
701 AAGGAGACCC AGTTCCCTTC CTGTGACAAT CTGGTCCTCT GCAGACCCCA 
751 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 
801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGGGGCTGGC CTTCCTGCCA 
851 CACCAGACGG TCACCATCAG AT TTCCCTGC CCAGTGAGTT TGGACGCAAA 
901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 
951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 
1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAACCCAGGG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
1251 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCAAA 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
1401 TCACGGCAAA GAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GT C CAAG ACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC ATGATAACCA AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTGT CTGGCCTCTC 
1701 CAACAGTGGC AAATGTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
1751 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1801 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1851 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TTTAAGACCA 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
2051 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GGACAGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT GCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2 301 CCACAGGTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC ACAGACATAA CCACGTGCCT 
2 401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
2451 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2501 GGCCTCAGCG CCCCACCCTG GGCCAAGCCA GAGGACAGAC AGACCCAGCC 
2551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2 601 CGGCAGCCTG TGAGGTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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2 651 GGCCATTCCA CATGCAACGT TGAGTCCTGG GGAGACAACG GAGCCACACG 
2701 TGCCCAGCCA TCAATGCCCG GCCAGGCGGT GCCCTGCCAG GAGGACACGG 
27 51 GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC AATCGTGGAA CCGCGCATGG 
2 801 GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC ACCTGGCGCA ACAAGGCGGT 
2851 GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT GGTGTCCATG CAGGCTGCAG 
2901 AGGAGATCCG CATCCTCGCA GTGATCACTA TCCAGGCGGG CGTCCGTGGC 
2 951 TACCTGGCGC GTCGCAGGAT CCGGCTGTGG CACCGGGGGG CCATGGTCAT 
3001 CCAAGCTACT TGGCGCGGCT ACCGTGTGCG GCGGAACCTG GCACACCTCT 
3051 GCAGAGCCAC CACGACCATC CAGTCTGCCT GGCGCGGCTA CAGCACCCGC 
3101 CGGGACCAAG CCCGGCACTG GCAGATGCTC CACCCCGTCA CGTGGGTGGA 
3151 GCTGGGCAGC CGGGCCGGGG TCATGTCTGA CCGAAGCTGG TTCCAGGATG 
3201 GCAGAGCCAG GACAGTATCT GACCATCGCT GCTTCCAGTC CTGCCAGGCA 
3251 CACGCTTGCA GCGTCTGCCA CTCCCTGAGC TCCAGGATCG GGAGCCCGCC 
3301 CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC TCGCACCTGT CATACCTGTG 
3351 GACGCACACA GCCCACCCGT GTGGTGCAGG GCATGGGCCA GGGCACTGAG 
34 01 GGCCCCGGGG CAGTGTCTTG GGCCTCCGCC TACCAGCTGG CTGCCCTGAG 
34 51 TCCCAGGCAG CCGCATCGCC AGGACAAAGC GGCCACAGCC ATCCAGTCCG 
3501 CCTGGAGGGG CTTTAAGATC CGCCAGCAGA TGAGGCAGCA GCAAATGGCA 
3551 GCGAAGATAG TTCAAGCCAC CTGGCGAGGC CACCATACCC GGAGCTGTCT 
3601 GAAGAACACA GAGGCCCTCT TGGGACCAGC AGACCCCTCG GCCAGCTCAC 
3651 GGCACATGCA TTGGCCTGGC ATCTAGGACC CTGGCTCCCT GCAGTGGGGA 
37 01 CTTCGTGGGA GGCACTCATG GCTCTCTGGG TCTAATGAAT AAAGTCCTCC 
3751 ACAGCCTAAA AAAAAAA 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3673 bp; peptide length: 1180 
Category: similarity to known protein 



1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 

51 PPQPQHEGLK SKEHLPQQPA EGKTASRRVP RLRAVVESQA FKNILVDEMD 

101 MMHARAATLI QANWRGYWLR QKLISQMMAA KAIQEAWRRF NKRHILHSSK 

151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 

201 VLCRPQSSPL LQPPAAQGTP EPCVQGPHAA RVRGLAFLPH QTVTIRFPCP 

251 VSLDAKCQPC LLTRTIRSTC LVHIEGDSVK TKRVSARTNK ARAPETPLSR 

301 RYDQAVTRPS RAQTQGPVKA ETPKAPFQIC PG PM I TKTLL QTYPVVSVTL 

351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTCPMPTM 

401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 

451 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 

501 ITKTPAQLRS VATILKTLCL ASPTVANVKA PPQVAVAAGT PNTSGSIHEN 

551 PPKAKATVNV KQAAKVVKAS SPS YLAEGKI RCLAQPHPGT GVPRAAAELP 

601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 

651 VDMAVTLPRG QLAAPLTNAS SQRHPPCLSQ RPLAAPLTKA SSQGHLPTEL 

701 TKTPSLAHLD TCLSKMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGQPIT 

751 DITTCLIPAH QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 

801 DRQTQPQPHG HVPGKTTQGG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 

8 51 DNGATRAQPS MPGQAVPCQE DTGPADAGVV GGQSWNRAWE PARGAASWDT 

901 WRNKAVVPPR RSGEPMVSMQ AAEEIRILAV ITIQAGVRGY LARRRI RLWH 

951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 

1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSD HRCFQSCQAH ACSVCHSLSS 

1051 RIGSPPSVVM LVGSSPRTCH TCGRTQPTRV VQGMGQGTEG PGAVSWASAY 

1101 QLAALSPRQP HRQDKAATAI QSAWRGFKIR QQMRQQQMAA KIVQATWRGH 

1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol9, frame 2 

TREMBL:HSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds . , N = 2, Score = 
242, P = 9.6e-16 
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TREMBL : HSMUC2A_1 gene: "MUC2"; product: "mucin"; Human raucin-2 gene, 
partial cds., N = 1, Score = 204, P = 1.4e-12 

PIR:S48478 glucan 1 , 4-alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 192, P = 9.6e-ll 



>TREMBL: HSU7 0136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds. 
Length = 1, 404 

HSPs: 



Score = 242 (36.3 bits). Expect = 9.6e-16, Sum P(2) = 9.6e-16 
Identities = 145/546 (26%), Positives = 198/546 (36%) 



Que ry : 


282 


KRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAETPKAPFQIC-PGPMITKTLL 


340 






K+ + T K AP TP + P T AP P P TK4- 




Sbjct: 


488 


KKPAPTTPKEPAPTTP-KEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAP 


546 


Ouer 


341 


QTYPVVSVTLPQ TYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTKTAPHTC 


395 






T ^ T + TP TTP K +P PK TP + P PT TK 




Sbjct: 


547 


TTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKE— PAPTTTKK 


599 


0 

ry . 


396 


PMPTMTKTOVHPTASRTGTPROTC PATTTAKNRPOVSLLA^IMK^LPOVCPGPAMAKTPP 


455 






P PT K + PT TP + + T P T T A P +JX T P 
c CI rv t c 1 i. c~~ L tr i Ln tr T rt 1 tr 




Sbjct: 


600 


PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 


653 


Query. 


456 


QMHPVTT PAKNPLQTCLS ATMS KTSSQRSPVGVTKPSPQT-RLP AMI T-KTPAQLRS VAT 


513 






+. TTP + PT a T+4-P +P+P T + Pa T K A T 




Sbjct: 


654 


EEPTPTTP-EEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKEPAPTTPKETAPTTPKGT 


712 


Query ; 


514 


TT VTT PT flt;PTV7iWVI£ APPnUflVZV&G TPNTQGQ T H PM OPV a TC aTUMWriH a K \/U- K a 

J- J-i r\ l ij^Litto r 1 v rtt* v rvrt tr tr y vrt v rtrto 1 r^iN 1 juainc.l'J n r i\rti\rt 1 vt\ v i\"^rtrti\. v v rsrt 


5 69 






TL +PT AP ++A T TS PK A K+ A K 




Sbjct : 


713 


APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKPAPTTPKGTAPTTPKEPAPTTPKE 


772 


Query : 


570 


^ VT AFHKTRCT AflPHPf^TGVPR AAAFT.PT.FAFK'T KTfJT — OKnATCTDMAFKT^VAUF 


627 






+P+ L +P P T A EL KTT KAT +T+ 




Sbjct : 


773 


PAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTTTKGPTSTTSDKPAPTTPK-ETAPTTP 


831 


Query : 


628 


MAGAP^WTKVAFEGnKPPHVYVPVnMAVTT.PRGOLAAPLTNASSORHPPOLSORPLAAPL 


687 






AP+ K+ P P V+P + S PLSP L 




Sbjct: 


832 


KEPAPTTPK--KPAPTTPETPPPTTSEVSTPTTTKEPTTIHKSPDESTPELSAEPTPKAL 


889 


Query : 


688 


TKASSQGHLPTELTKTPSLA--HLDTCLSKMHSQTHLATGAVKVQSQAPLAT--CLTKTQ 


743 






+ + +PT TKTP-t- + T ++ L T + + AP T T T + 




Sbjct: 


890 


ENSPKEPGVPT — TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 


946 


Query : 


744 


SRGQPITDITTCLIPAHQAADLS — SNTHSQVLLTGSKVSN--HACQRLGGLSAPP-WAK 


798 






+ TT++ D+ T+ KV+ ++ P AK 




Sbjct: 


947 


KTTESKITATTTQVTSTTTQDTTPFKITTLKTTTLAPKVTTTKKTITTTEIMNKPEETAK 


1006 


Query : 


799 


PEDRQTQPQPHGHVPGKTTQGGPCPAA 82 5 








P+DR T + P K T+ P + 




Sbjct : 


1007 


PKDRATNSKATTPKPQKPTKAPKKPTS 1033 




Score 


= 205 


(30.8 bits), Expect = 3.1e-12, Sum P(2) = 3.1e-12 




Identities = 


= 146/565 (25%), Positives = 209/565 (36%) 




Query: 


281 


TKRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAE — TPKAPFQICPGPMITKT 


338 






TK+ + K AP TP +ATP+ PK TP+ P P + T 




Sbjct: 


597 


TKKPAPTAPKEPAPTTPK ETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTT 


652 


Query: 


339 


LLQTYPVVSVTLPQTYPASTMTTTPPKTSPV-PKVTIIKTPAQMYPGPTVTK-TAPHTCP 


396 






+ P TP*- TP t tp PK TP + P PT K TAP T P 




Sbjct: 


653 


PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE — PAPTTPKETAP-TTP 


709 


Query: 


397 


M PTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 


453 






PT K + PT + P + + PT +S+KP GAT 




Sbjct: 


710 


KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTSD — KPAPTTPKGTAPT-T 


761 


Query: 


454 


PPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQTRLPAMITKTPAQLRSVAT 


513 






P + P TTP KPT T T + +P KP+P+ P TK P S 




Sbjct: 


762 


PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP 


818 


Query : 


514 


ILKTLCLASPTVANVKAPPQVAVAAGTPNTSGSIHENPPKAKATVNV KQAAKVVKA 


569 






T +PT AP APT E PP + V+ K+ + K+ 




Sbjct: 


819 


APTTPKETAPTTPKEPAPTTPKKPA— PTTP ETPPPTTSEVSTPTTTKEPTTIHKS 


872 


Query : 


570 


SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGTQKQAKTDMAFKTSVAV 


626 






S+P AE + L GVP + P + T T K T+ +T+ 
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Sbjct: 873 PDESTPELSAEPTPKALENSPKEPGVP — TTKTPAATKPEMTTTAKDKTTERDLRTTPET 930 

Query: 627 EMAGAPSWTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAA 685 

A AP TK A +K + +T Q+ + T ++ L LA 

Sbjct: 931 TTA-APKMTKETATTTEKT TESKITATTTQVTSTTTQDTTPFKITTLKTTTLAP 983 

Query: 686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS QAPLATCLT 7 4 0 

+T + + TE+ P +T K + AT K Q + P +T 

Sbjct: 984 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQKPTKAPKKPTSTKKP 1037 

Query: 741 KTQSR-GQPITDIT TCLIPAHQAADLSSNTHSQVLLTCSKVSNHACQRLGGLSAPP 795 

KT R +P T T T +P + Q ++ N + S 

Sbjct: 1038 KTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVNPKSEDA 1097 

Query: 796 W-AKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTGHSTCN 845 

A+ E +PH +P T P QG+++ PM + CN 

SbjCt: 1098 GGAEGETPHMLLRPHVFMPEVTPDMDYLPRVPN-QGIIINPMLSDETNICN 1147 

Score = 198 (29.7 bits), Expect = 2.3e-ll, Sum P(2) = 2.3e-ll 
Identities = 142/513 (27%), Positives = 200/513 (38%) 



Query : 


204 


DPnQQPT t opdzi anfTPr P^unr DiiaaDURn htpt PHnTUTTnpprPUQT nufcrrriPPT t t 

Kir^o o irJjJjy tr irrtrtijij 1 rLtLV y(j trrirtHrW rvjLirtr Jjrriyi V 1 J. i\r IrUirV b LiVn.L\\-,\^ c L> JjLi I 


2 63 






R + P +PP G + H V+ + +P L 




Sbjct : 


207 


RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 


266 


Query : 


£. O'i 


D r PTDC r PPTT7lJTt7 , ( r, nC\7^' r T , fc'D\7CEDTMt^a DAD CTDT CDOVDATAUTD DC D SPiTTS 

K 1 1 Ko l^LVnlLUUiVM NnVi»ftKlNt\ftKAf YL1 f Lb KKI UyR V 1 KJ^bK "V- 1 - W 


t; 

J _ J 






T + T L + +V+TK + TNK + E S + Q++ + S AT 




Sbjct: 


267 


NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGKEKTTSAKETQSI EKTSAKDLAPTS 


325 


Query : 


316 


GPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTII 


375 






+ TPKA GP +T T + P T P+ PAST TP + +P + 




Sbjct : 


326 


KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE-PAST TPKEPTPTTIKSAP 


375 


Query : 


376 


KTPAQMYPGPTVTKTAPHTC--PMPTMTKIQVHPTASRTGTPRQTC-PATITAKNRPQVS 


432 






TP + P PT TK+AP T P PT TK + PT + P T PA T K+ P 




Sbjct: 


376 


TTPKP PftPTTTK'^aPTTPKFPAPTTTff-FPAPTTPKPPaPTTTKTPAPTTTK^APTTP 

1 1 rrvCi t rt c 1 1 lixOrttl 1 riXdrnr 1 1 1 i\ r_i C r\LT 1 1 c I\ljc nr J. 1 1 i\Dr nr 1 l l ivorvir lie 


432 


Query: 


433 


LLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVT 


489 






+ K P PA TP + P TTP KPT + T + +P 




Sbjct: 


433 


KEPAPTTPKKPAPTTPKEPAPT-TPKEPTP-TTP-KEPAPTTKEPAPT-TPKEPAPTAPK 


488 


Query : 


490 


KPSPQT-RLPAMIT-KTPAQLRSVA TILK TLCLASPTVANVKAPPQVAVAAGT 


540 






KP+P T + PA T K PA + T'K T ++PT AP AT 




Sbjct: 


489 


KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAPTT 


548 


Query: 


541 


PNT-SGSIHENP PKAKATVNVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 


594 






P S++P PKA K+A K +P+ E +P P P+ 




Sbjct: 


549 


PKEPS PTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPAPTA — PK 


606 


Query: 


595 


AAAELPLEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTK-VAEEGDKPPHVYVPVDM 


653 






A' P ++ T K+ K + AP+ + +A + P P + 




Sbjct: 


607 


EPA--PTTPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTPEEPTPTTPEEP 


664 


Query : 


654 


AVTLPRGQLAAPLTNASSQRHP-PCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTC 


712 






A T P+ AAP T +PP+PAPT PET T 




Sbjct: 


665 


APTTPKA — AAPNT PKEPAPTTPKEP — APTTPKEPAPTTPKETAPTTPKGTAPTT 


716 


Query: 


713 


LSK 715 
L + 




Sbjct: 


717 


LKE 719 




Score 


= 108 


(16.2 bits). Expect = 4.3e-02, Sum P(2) = 4.3e-02 




Identities = 


= 60/214 (28%), Positives = 85/214 (39%) 




Query: 


265 


TIRSTCLVHIEGDSVKTKRVSAR-TNKA — RAPETP-LSRRYDQAVTRPSRAQTQGPVKA 


320 






T + +H D T +SA T KA +P+ P + A T+P T 




Sbjct: 


862 


TTKEPTTIHKSPDE-STPELSAEPTPKALENSPKEPGVPTTKTPAATKPEMTTTAKDKTT 


920 


Query: 


321 


ETP--KAPFQICPGPMITK-TLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTIIKT 


377 






E P P +TK T T + T T TTT T+P K+T +KT 




Sbjct: 


921 


ERDLRTTPETTTAAPKMTKETATTTEKTTESKITATTTQVTSTTTQD-TTPF-KITTLKT 


978 


Query: 


378 


PAQMYPGPTVTK TAPHTCPMPTMT-KIQVHPTASRTGTPRQTCPATITAKNRPQVSL 


433 






+ PTTK T PTK+ TS+ TP+ P A +P + 




Sbjct: 


979 


TT-LAPKVTTTKKTITTTEIMNKPEETAKPKDRATNSKATTPKPQKPTK — APKKPTSTK 


103J 


Query: 


434 


LAS IMKSL-- PQVCPGPA-MAKT PPQMHPVTTPAKNPLQT 470 








M + P+ P P M T P+++P + A+ LQT 




Sbjct: 


1036 


KPKTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQT 1075 




Score 


= 56 i 


18.4 bits), Expect = 3.1e-12, Sum P(2) = 3.1e-12 
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Identities = 17/60 (28%), Positives = 22/60 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P PS E AP P+ + K+ P P E + + P 

Sbjct: 533 TTKEPAPTTTKSAPTTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEP 592 

Score = 52 (7.8 bits), Expect = 9.6e-16, Sum P(2) = 9.6e-16 
Identities = 17/59 (28%), Positives = 22/59 (37%) 

Query: 22 TVHEPV-VTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAE-GKTASRR 78 

T EP T P P P+ E P P+ +KE P P E TA ++ 

Sbjct: 431 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKEPAPTTPKEPAPTAPKK 489 

Score = 51 (7.7 bits), Expect = 1.2e-15, Sum P(2) = 1.2e-15 
Identities = 15/51 (29%), Positives = 19/51 (37%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAE 71 

T EP T P P P+ + AP P+ + KE P P E 

Sbjct: 416 TTKEPAPTTTKSAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKE 4 56 

Score = 47 (7.1 bits), Expect - 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 12/41 (29%), Positives = 17/41 (41%) 

Query: 3 6 PAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 7 6 

P P P + P +P +KS P++PA T S 

Sbjct: 350 PTPTTPK— EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score = 47 (7.1 bits), Expect = 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 15/57 (26%), Positives = 19/57 (33%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKS APTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 433 

Score = 46 (6.9 bits), Expect = 4.0e-15, Sum P(2) = 4.0e-15 
Identities = 16/58 (27%), Positives = 22/58 (37%) 

Query: 20 LATVHEPVVT QWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P P* + P +P KS P ++PA T 

Sbjct: 344 LTTPKEPTPTTPKEPASTTPKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score = 42 (6.3 bits), Expect = 1.0e-14, Sura P(2) = 1.0e-14 
Identities = 15/60 (25%), Positives = 21/60 (35%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+- + AP P+ + KE P E + + P 

SbjCt: 4 63 TPKEPAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score = 39 (5.9 bits), Expect = 2.1e-14, Sura P(2) = 2.1e-14 
Identities = 15/55 (27%), Positives = 20/55 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

T EP T P PA + + P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKS 544 



Pedant information for DKFZphtes3_4ol9, frame 2 



Report for DKFZphtes3_4ol9 . 2 



[LENGTH] 

[MW] 

[pi] 

[HOMOLj 

[FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



1180 

127693.40 
10.25 

SWISSPROT:MUC2 HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2). le- 



08 



98 classification not yet clear-cut [S. 
30.01 organization of cell wall [S. 
30.90 extracellular/secretion proteins 
01.05.01 carbohydrate utilization [S. 
BL00412B Neuromodulin (GAP-43) proteins 
CYTOCHROME_C 1 
MYRISTYL 12 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 8 
PKC_PHOSPHO_SITE 25 
ASN_GLYCOSYLATION 2 
Alpha_Beta 

LOW COMPLEXITY 5 . 00 % 



cerevisiae, YJR151c] 6e-06 
cerevisiae, YIR019c] 6e-06 

[S. cerevisiae, YIR019c] 6e-06 
cerevisiae, YIR019c] 6e-06 
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SEQ MTLQGRADLSGNQGNAAGRLATVHEPWTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLK 

SEG 

PRD cccccceeeccccccceeeeeeeeceeeeeeeecccccccceeeeccccccccccccccc 

SEQ SKEHLPQQPAEGKTASRRVPRLRAVVESQAFKNILVDEMDMMHARAATLIQANWRGYWLR 

SEG 

PRD cccccccccccccccccchhhhhhhhhhhhhhheeehhhhhhhhhhhhhhhhhccchhhh 

SEQ QKLISQMMAAKAIQEAWRRFNKRHILHSSKSLVKKTRAEEGDIPYHAPQQVRFQHPEENR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeecccchhhhhhhhcccccccccceeeecccccce 

SEQ LLSPPIMVNKETQFPSCDNLVLCRPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPH 

SEG 

PRD eeccceeeecccccccccceeeecccccccccccccccccccccccccceeeeeeeeccc 

SEQ QTVTIRFPCPVSLDAKCQPCLLTRTIRSTCLVHIEGDSVKTKRVSARTNKARAPETPLSR 

SEG 

PRD eeeeeecccccccccccccccccccccceeeeecccccccceeeeecccccccccccccc 

SEQ RYDQAVTRPSRAQTQGPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMT 

SEG xxxx 

PRD ccceeeeeccccccccceeecccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTIIKTPAQMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRQTCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccceeeccccccccccccccccccccccccccceeeccccccccccccccc 

SEQ ATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccecccccc 

SEQ SQRSPVGVTKPSPQTRLPAMITKTPAQLRSVATILKTLCLASPTVANVKAPPQVAVAAGT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PNTSGSIHENPPKAKATVNVKQAAKVVKASSPSYLAEGKIRCLAQPHPGTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ LEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRG 

SEG xxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeccccccceeeccccccccccc 

SEQ QLAAPLTNASSQRHPPCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLATGAVKVQSQAPLATCLTKTQSRGQPITDITTCLIPAHQAADLSSNTHSQVLLTGSKV 

SEG 

PRD ccccceeeeeccccceeeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ SNHACQRLGGLSAPPWAKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTG 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSTCNVESWGDNGATRAQPSMPGQAVPCQEDTGPADAGVVGGQSWNRAWEPARGAASWDT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ WRNKAVVPPRRSGEPMVSMQAAEEIRILAVITIQAGVRGYLARRRIRLWHRGAMVIQATW 

SEG 

PRD ccceeecccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ RGYRVRRNLAHLCRATTTIQSAWRGYSTRRDQARHWQMLHPVTWVELGSRAGVMSDRSWF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhhhhhhhh 

SEQ QDGRARTVSDHRCFQSCQAHACSVCHSLSSRIGSPPSVVMLVGSSPRTCHTCGRTQPTRV 

SEG 

PRD hccceeeeccceeeecccceeeeeeeecccccccccceeeeeecccccccccccccccee 

SEQ VQGMGQGTEGPGAVSWASAYQLAALS PRQPHRQDKAATAIQSAWRGFKI RQQMRQQQMAA 

SEG xxxxxxxxxxxxx 

PRD eeeccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIVQATWRGHHTRSCLKNTEALLGPADPSASSRHMHWPGI 

SEG xx 

PRD hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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Prosite for DKFZphtes3_4ol9.2 



PS0000 1 


542->546 


ASN GLYCOSYLATION 


PDOC00001 


PS0000 1 


668->672 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


282->286 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


76->79 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


148->151 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


244->247 


PKf - PHOSPHO - 


SITE 


PDOC00005 


PS00005 


265->268 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 

i O \J V V w ~1 


278->281 


PKC _ PHOSPHO" 


"site 


PDOC00005 


pennon's 

c o yj *-/ \J *j 


2 81->284 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


285->288 


PKC _ PHOS PHO 


SITE 


PDOC00005 


PS00005 


288->2 91 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


299->302 


PKC PHOSPHO 


SITE 


FDOC00005 


PS00005 


322->325 


pKr phospho 


SITE 


PDOC00005 


PS00Q05 


4 14->4 17 


PKC^PWO^PHO 


SITE 


PDOC00005 


psoooo s 


424->427 


PKr Pwri<5PHri 


SITE 


PDOC00005 




481->484 


PKC PHflSPHO 


SITE 


PDOC00005 


PS00005 

£ t-J \J \J W \J -J 


610->G13 


PKf PHri^PHn 


SITE 


PDOC00005 


pennon 5 


671->67 4 


pkt PHn^PHn 


SITE 


fT \J KJ —J 


PS00005 


679->682 


PKf - PHO^ PHOT 


SITE 


PDOC00005 


pennon s 


900->903 


riM, r nuo r nu 


SITE 


PDOC000 05 




959->962 


r trnkjo cn\j 


SITE 


PDOC00005 


PS00005 


987->990 


PKr PHri^PHn 


SITE 


PDOC000 0 5 


pqoooo 5 




PKr PwriQDHri 


"site 


pnnr o nn o s 


p^nnnn s 


1 04 9->1 n^? 


PKr PHn^PHn 


SITE 


pnnr o no D 5 




x u u «j »•* J. u *j □ 


PKr PunQDiin 


SITE 


PDOC0 00 0 5 


pennnn R 




PKr PHnQPHn 


SITE 


PDOC00005 


r j U U U v J 


114 6->l 1 4 9 


PKr PHDQPHn 


SITE 


PDOCO 00 0 5 


ponn nn ^ 


1 17 1 ->117 4 


PKr pur^ pun 


SITE 


pnnr 0000 R 




22 ->2 6 


CVy PHfi^PHfi 
i_ t\ i rnUorniJ 


SITE 


PDOC00006 




42->4 6 


rv> pnn<; phh 


SITE 


pnoro ono £> 


pennons 

r o u u u u u 


J_ *J U _L U U 


fvO PHri^ppn 


SITE 


pnoro 00 0 6 


penoon 


5 4 6->5 50 


rK") PtinQDHA 


SITE 


PDOT00006 


penono fi 


R4 ft->fl 


rKO DHnQDUA 


OTTT 
illEi 






COO-sQO') 

5 O □ -'i JiL 


^*I\Z rriLJotrn"^ 


SITE 


□nor onooc 


peonnn 


1 oo i->i no7 


CK2~PH0SPHO" 


'site 


pnoronnofi 

tr i^v^ u u u u u 


DC|*ifi on £ 

roUUUU O 


109 7 1 m 1 


CK2 PHOSPHO 


~SITE 




pqonnofl 

rouuuuu 


11->17 


MYRISTYL 




PDOC00008 


pennnn ft 

r j w u if^j o 


14 ->2 0 


MYRISTYL 




PDOC00008 


PS00008 


539->545 


MYRISTYL 




PDOC00008 


PS00008 


591->597 


MYRISTYL 




PDOC00008 


PS00008 


746->752 


MYRISTYL 




PDOC00008 


PS00008 


777->783 


MYRISTYL 




PDOC00008 


PS00008 


853->859 


MYRISTYL 




PDOC00008 


PS00008 


878->884 


MYRISTYL 




PDOC00008 


PS00008 


882->888 


MYRISTYL 




PDOC0300S 


PS00008 


1008->1014 


MYRISTYL 




PDOC03008 


PS00008 


1053->1059 


MYRISTYL 




PDOC00008 


PS00008 


1083->1089 


MYRISTYL 




PDOC03008 


PS00190 


1042->1048 


CYTOCHROME C 


PDOC00169 



(No Pfara data available for DKFZphtes3_4ol9.2) 
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DKFZphtes3_50j4 
group: testes derived 

DKFZphtes3_50j 4 encodes a novel 187 amino acid protein proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

unknown, prolin ritch protein 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1186 bp 

Poly A stretch at pos . 1176, polyadenylation signal at pos. 1126 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



CACTGGGCGT 
CCTCCTGGGA 
TCCCTCTACC 
AGCCCACCCC 
CAGGAGAACC 
GCCCTCCGTC 
GCACCTTGAA 
GTCTCCTTGA 
CTACAAGGAG 
GCCACCTCTC 
AAAGAAGAGG 
GTGCGAGAGC 
CAACTGCTGG 
TCATCCTGGG 
CTTCTCCTTT 
TGAATCCAGG 
TAGACCCTTC 
CGAGGACAGA 
GAAGAGTGGC 
AAGGTATCGG 
CATCCATTAC 
GAGGAGAGGC 
CAGGCCTGAC 
ATGCAGGCGT 



CTGAAGCTCA 
TGAGGGAGCC 
AGCACCTGCC 
TGCAAAGGAC 
CAGAGAGCCA 
GTAGCTGAGG 
TCCCACGGCT 
AGGAGGCTGC 
GGCAAGTTTG 
ACACTTGCTG 
CCCAGAACCT 
GAAGCTGACT 
CTGGGCAGGG 
CCTCACCTGC 
CCAACCATAC 
TCAGAGGTCA 
TCAGAGCGGT 
AGGTGGAGGG 
CCCTCCCCGA 
AACCTACCCA 
GATGCCAGCT 
AGCCAGGCCC 
AGATGTTTGG 
GCACACAGCC 



GAGCTCACCC 
ACCAGGACCC 
CGCCCAGAGA 
ACATGGAAGG 
GCCTCAGAAG 
TCAAGGGCAG 
CAAGACCCCT 
AAATGTTGTG 
CTTCCAAGGA 
ACTCAGAAGA 
CATCAGGCAC 
GGCATGGCCT 
CCCGCGTCCT 
TGATGCCAGG 
TTGGCTTTGG 
GCCCACCTTT 
CCTCATGGCT 
TGGTGGAGCT 
GTTCTAAGTC 
GGGGACCCTC 
TCCAGCCTTG 
TGTTCCTGCT 
GAGAGGAATA 
CTTTTCAAAA 



CTGAGATGGG 
AGTGCTGTGA 
CCAGGGCACC 
GCAAGCGGCC 
AGGCCACGCC 
CGTCTCGGCC 
TCCAGCTCTC 
GTCAAGTGCC 
GTTGTTTAAA 
CCTCTCCTGG 
TTCTTCCATG 
GTGTGGCCCC 
CCCCCAGATT 
GCCATCGTCT 
GGATGACCCC 
CTTTCTGCTT 
GGGTTTTCTG 
GCTGCTGGAA 
AGGATGAGGC 
AGATCCTCCA 
CCCAGGTCAG 
CAGCTCC7GC 
AAGTTGTGTT 
AAAAAA 



CTCTCCTAGG 
TGCCTGCTCT 
CCTGAAGTCC 
TCGATCCCAG 
CCTCAGCCAA 
AGCGAACAGG 
CGCTCCTGGC 
TCACCCCTTT 
GGCTTTGCCC 
AAGGAGCGTG 
GCCGGGCCCG 
CAGAGATGAC 
CTAGCATGGG 
TTTCTCAGTC 
AGACACCCCC 
GCAAAGCCTA 
EGACACATGT 
GAAGGGGAAG 
CCACCTGTCC 
CCCACTCCCC 
AGCTGTGGCA 
TCAGGAAGGC 
GTTGTGGGGC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 36 bp to 596 bp; peptide length: 187 
Category: putative protein 



1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVQP TPAKDTWKGK 

51 RPRSQQENPE SQPQKRPRPS AKPSVVAEVK GSVSASEQGT LNPTAQDPFQ 

101 LSAPGVSLKE AANVVVKCLT PFYKEGKFAS KELFKGFARH LSHLLTQKTS 

151 PGRSVKEEAQ NLIRHFFHGR ARCESEADWH GLCGPQR 

BLASTP hits 

Entry MMU92455_1 from database TREMBL: 
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product: "WW domain binding protein 7"; Mus musculus WW domain binding 
protein 7 mRNA, partial cds . 

Score = 134, P = 6.9e-08, identities = 45/125, positives = 56/125 



Alert BLASTP hits for DKFZphtes3_50 j 4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50j4, frame 3 



Report for DKFZphtes3_50 j4 . 3 



[LENGTH] 187 

[MW] 20353.06 

[pi] 9.76 

[PROSITE] MYRISTYL 1 

[PROSITE] AMIDATION 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[ PROSITE] PKC_PHOSPHO_SITE 6 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 8.56 % 



SEQ MGSPRPPGMREPPGPSAVMPALPSTSTCPPRDQGTPEVQPTPAKDTWKGKRPRSQQENPE 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQPQKRPRPSAKPSVVAEVKGSVSASEQGTLNPTAQDPFQLSAPGVSLKEAANVVVKCLT 

SEG 

PRD cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 

SEQ PFYKEGKFASKELFKGFARHLSHLLTQKTSPGRSVKEEAQNLIRHFFHGRARCESEADWH 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhheeecccccchhhhhhhhhhhhhhccchhhhhhhhh 

SEQ GLCGPQR 

SEG 

PRD ccccccc 



Prosite for DKFZphtes3_50j4 . 3 



PS00005 




3->6 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


46 


->49 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


70 


->73 


PKC 


"PHOSPHO" 


"site 


PDOC00C05 


PS00005 


107- 


>110 


PKC" 


"PHOSPHO 


"site 


PDOC0 00 0S 


PS00005 


14 6- 


>149 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


154- 


>157 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00006 


54 


->58 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


84 


->88 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


94 


->98 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


107- 


>111 


CK2_ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


154- 


>158 


CK2" 


"PHOSPHO" 


"stte 


PDOC0000 6 


PS00006 


175- 


>179 


CK2 


" PHOSPHO" 


"site 


PDOC00006 


PS00008 


81 


->87 


MYRISTYL 




PDOC00008 


PS00009 


48 


->52 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphtes3_50 j 4 . 3 ) 
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DKFZphtes3_50n0 6 



group: testes derived 

DKFZphtes3_50n06 encodes a navel 186 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1095 bp 

Poly A stretch at pos . 1065, polyadenylation signal at pos . 1061 

1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCGGAAGGT 
51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 

101 GACCTGCTGT GCTCACATGC CCCCCTGTCC AGCGAGGACG ACACCTCCCC 

151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 

2 01 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCGCTCCTG 

251 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 

301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 

351 GGGACAGGAC CAGGAGGAGC TACTACCTCA ATGAGATCCA GAGCTTCGCG 

401 GGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 

4 51 CCGCCGCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 

501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 

551 CTGGACGGCT CCGTGGACGA GAGGAAGCTG CGCGAGCTGA CGCAGCGCTA 

601 CCTGGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 

651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 

7 01 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC AGCAGCCCGG CCGCGCTGCG 

751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTGC 

801 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAAGCCCCTC 

851 TTCGCCTGGT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 

901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCGCGTCCC CCCGGAGGGG 

951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 
1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGGAGTGCT 
1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 302 bp to 859 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 



1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR VVGEIAFQLD 

51 RRILAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 

101 LALSARLEKL GYSRDVHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 

151 KLVIDVVPPK FLGDSLLLLN CLCELSKEDG KPLFAW 

BLASTP hits 

No BLASTP hits available 
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No Alert BLASTP hits found 



Alert BLASTP hits for DKFZphtes3_50n06, frame 2 
iSTP hits found 
Pedant information for DKFZphtes3_50n06, frame 2 



Report for DKFZphtes3_50n06.2 

[LENGTH] 18 6 

[MW] 21049.39 

[pi] 9.28 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 5.38 % 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLNEIQSFAGAEKDARVVGEIAFQLDRRILAYVFPG 

SEG 

PRD ccccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VTRLYGFTVANIPEKIEQTSTKSLDGSVDERKLRELTQRYLALSARLEKLGYSRDVHPAF 

SEG 

PRD ceeeeeeeeeeccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccch 

SEQ SEFLINTYGILKQRPDLRANPLHSSPAALRKLVIDVVPPKFLGDSLLLLNCLCELSKEDG 

SEG xxxxxxxxxx 

PRD hhhhhhcceeecccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhcccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 

(No Prosite data available for DKFZphtes3_50n06 . 2 ) 
(No Pfam data available for DKFZphtes3_5Cn06 . 2 ) 
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DKFZphtes3_50n23 
group: testes derived 

DKFZphtes3_50n23 encodes a novel 499 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1907 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos . 1872 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 



GGGCACCAGC 
GTTCGGCAGC 
TAGATCGCAG 
GGCCACAAAG 
AATTAAGTTC 
TGACCTCTGA 
ATGCGGAGGC 
GAAGAAGTGG 
GGAATCTGGA 
GAAAAGGAGC 
TGTGGAGAGG 
AGGCAGAGCT 
CAAAGCAGGA 
CCTGGGAAAG 
GGACCCGCCG 
ACTGGGACAT 
TGCAAATATT 
AGAACCTGCA 
CTGCGCAGCA 
GCTCAGGCTG 
AGAGCCTCCG 
GAGGCTTCCT 
CCGCCTGCAG 
TGGAGGAGAA 
AAGCTCCAGC 
GCCAAAGCCA 
GCCCCAGTGG 
TGTGTGCCCC 
CTGGAAGACC 
CTGCCAGCCT 
CTGTTGACAC 
GAAGGCCCAG 
TACAAATGAA 
CTGGAACCCA 
ATCCCTCAGC 
AACTTGGTCA 
GGGGGTTGCT 
CAGTCCCCAA 
AAAAAAA 



CACTTTCCAC 
AAGGACACTG 
GTTTCCTAAG 
ACAAAGACCA 
CACTGTAGCA 
GAGCCAAGAG 
AGCTGTGGCT 
GCCCTGCTGG 
AGACCTGGCC 
AGGAGAGCCC 
AGGATCTTCA 
ATCATTAGTG 
GGCCACACTT 
CAGAGACCTA 
AGTTCCCACA 
CCATCCGAAG 
AAGAAGAAGG 
GCTCCTGAGT 
AAGCACTGGA 
CAGTACCTGT 
GCAAGAAGCG 
ACAAGGCCCA 
AGTCTCAGGC 
GCACCGAGAG 
TGGAGTGGAA 
AAGAAATGCA 
CCCCACCTAC 
TGCAGATGGC 
GAGGTGGCCT 
TCCCCGGGAC 
TGGACGTGTA 
TAAGCGCCTC 
TCCGCTTAGC 
AATAAGCCAG 
CAGTGATTCT 
AAATGCAGGT 
GAGTACTCCT 
ACTCTACATT 



CATGACTGTG 
AGAGCCTTGA 
AAATGGGAAA 
GGAGGACTAC 
AGCAGCTGTC 
GAGCCCTGGG 
GGAGGAGGAG 
AGCAGGAGCA 
AGGGAGCAAC 
ACGGAGAGAG 
CACCCACCAG 
CCTGCCCCAA 
GCCCATGTCT 
TGAGTTCAGT 
AAGCCCAAGA 
GCTGACCTGG 
TGTACCACAT 
GAGGAGTCTG 
GCTCACCACC 
GCCATAAGTA 
ATCAACCATG 
GAACCTCTAC 
TGCAGGCCTG 
TGCCTGAGCA 
CGTTCACCTG 
AGTTGCCTGC 
AAGCAGCCCT 
CCGCCAACAG 
CCTCCAGTTA 
CAGCTGAGGG 
GTCCTCCTGC 
AGCGAACCAA 
TTGTTCAAAA 
AAGGATCAAG 
CAACCTTCTG 
TCCCAGCTGG 
AGAACTTTGA 
TTAATAAAAT 



CGCTCGAGGG 
GCCTGTGCTT 
GACCGGTGGC 
TTCCAGAAGG 
TCTAGAGAGC 
AGGAGGAATT 
GAGATGTGGC 
TCAGGAGAAG 
AGCGGAGATG 
CCAGAGCAGC 
TCGATGGAGG 
GCCGGACCCA 
CCTAGTACCC 
GGAGTTTACC 
AATCTGCCTC 
CCCTCTTTGC 
GGACATGGAG 
AGTTGAGGCT 
ACCACCATGG 
CATCTTCTAT 
TACAAATCAT 
ATCTTCCTGG 
GACGGACAAG 
GCATGGTGAC 
AACATCCCTG 
AGCCTCACCC 
TTCTGTCTAG 
GGGAAGCAGA 
CGCAATAGAA 
GACACCCAGA 
CACAAAAGCC 
AGGAAGGAAT 
AAAGTCAAGC 
ACAGCCCCAG 
AGGGACGGAA 
TGCTTTTAAA 
GAAACACTGC 
AGAGGTTGGT 



TCGCAGATGT 
TTACCCTTAG 
AGAAAGCTTA 
GAGGACTCCA 
TCCAGGCAGG 
CGGCCGGGAG 
AGCAGCGGCA 
CTGCGGCAGT 
GGTCCAGCTA 
TAGGGGAGGA 
GACTTGGAGA 
ATCTGCTCAC 
AGCAGCCTGC 
TACAGACCAC 
CTTTCCTGTC 
AGATATCCCC 
GCCCAGAGGA 
GCCCCACTAC 
AGCTGGGCGC 
AGACGCCTCC 
GAAAGAAACG 
AAAACATTGA 
CAGAAGGGGC 
CATGTTCCCC 
AGGTCACCTC 
CGGCACATCC 
GCACCGGGCA 
TGGAGGCTGT 
AAAAAGACCC 
TATTCCCCGG 
TGAACTTCCT 
GCCAGGAACC 
GAGTCACTCC 
TCTCCACTGC 
ACCCACAGAG 
GAAACCCTCT 
TTCCCTCCTG 
TTATTTTAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 22 bp to 1518 bp; peptide length: 499 
Category: similarity to known protein 
Classification: no clue 



1 MTVRSRVADV FGSKDTESLE PVLLPLVDRR FPKKWERPVA ESLGHKDKDQ 
51 EDYFQKGGLQ IKFHCSKQLS LESSRQVTSE SQEEPWEEEF GREMRRQLWL 
101 EEEEMWQQRQ KKWALLEQEH QEKLRQWNLE DLAREQQRRW VQLEKEQESP 
151 RREPEQLGED VERRIFTPTS RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 
201 PMSPSTQQPA LGKQRPMSSV EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 
251 LTWPSLQISP ANIKKKVYHM DMEAQRKNLQ LLSEESELRL PHYLRSKALE 
301 LTTTTMELGA LRLQYLCHKY I FYRRLQSLR QEAINHVQIM KETEAS YKAQ 
351 NLYIFLENID RLQSLRLQAW TDKQKGLEEK HRECLSSMVT MFPKLQLEWN 
401 VHLNIPEVTS PKPKKCKLPA ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 
451 RQQGKQMEAV WKTEVASSSY AIEKKTPASL PRDQLRGHPD I PRLLTLDV 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_50n23, frame 1 

PIR:S28589 trichohyalin - rabbit, N = 1 , Score = 134, P = 5.3e-05 

TREMBLNEW: AF132479_1 product: "Ese2L protein"; Mus musculus Ese2L 

protein mRNA, complete cds . , N = 1, Score = 130, P = 0.00017 



>PIR:S2B589 trichohyalin - rabbit 
Length = 1, 407 



HSPs : 



Score = 134 (20.1 bits), Expect = 5.3e-05, P = 5.3e-05 
Identities = 88/354 (24%), Positives = 154/354 (43%) 



Query: 


29 


RRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIK-FHCSKQLSLESSRQVTSESQEEPWE 


87 






R++ K +R + L + ++E ++ G + F +QL + + + E +EE + 




Sbjct: 


165 


RQYRDKEQRLQRQELEERRAEEEQLRRRKGRDAEEFIEEEQLRRREQQELKRELREEEQQ 


224 


Query: 


88 


EEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQ 


147 






RE + L+EEE RQ++W E Q++LR+ LE++ RE+++R Q E+ + 




Sbjct: 


225 


RRERREQHERA-LQEEEEQLLRQRRWRE-EPREQQQLRR-ELEEI-REREQRLEQEERRE 


280 


Query: 


148 


ESFRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQ 


207 






+ RRE ++L E ERR ++ + EL RQQR + + 




Sbjct: 


281 


QQLRRE-QRL-EQEERREQQLRRELEEIREREQRLEQEERREQRLEQEERREQQLKRELE 


338 


Query: 


208 


QPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 


266 






+ +QR +E RR + ++++A GS+RW SA++K 




Sbjct: 


339 


EIREREQR LEQEER-REQLLAEEVREQAR — ERGESLTR-RWQRQLESEAGARQSK 


390 


Query: 


267 


VYHMDMEAQRKNLQLLSEESELRLPHYLRSKALELTTTTM ELGALRLQYLCHKY 


320 






VY +R+ QL++ER R+LE E RQL+ 




Sbjct: 


391 


VYS RPRRQEEQSLRQDQERR-QRQERERELEEQARRQQQWQAEEESERRRQRLSARP 


446 


Query: 


321 


IFYRRLQSLRQEAINHVQIMKETEASYKAQNLYI-FLENIDRLQSL-RLQAWTDKQKGLE 


378 






RQ +E Q+EE ++ + FLE ++LQ R Q ++ E 




Sbjct: 


447 


SLRER-QLRAEERQEQEQRFREEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQE 


505 


Query: 


379 


EKHR 382 








++ R 




Sbjct: 


506 


DRER 509 




Score 


= 119 


(17.9 bits), Expect = 2.2e-03, P = 2.2e-03 





Identities = 79/357 (22%), Positives = 150/357 (42%) 



Query: 


33 


KKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 


92 






++ E+ + + K +++E Q+ + + +Q R+ + + + EE+F + 




Sbjct: 


990 


RREEQELRQERDRKFREEEQLLQE REEERLRRQERDRKFREEERQLRRQELEEQFRQ 


1046 


Query: 


93 


EMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRR 


152 






E R+ LEE+ + Q++++K L QE K R+ E+ R +Q R QL +E++ R 




Sbjct: 


1047 


ERDRKFRLEEQ-IRQEKEEK-QLRRQERDRKFRE EEQQRRRQEREQQLRRERDRKFR 


1101 


Query: 


153 


EPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSR — RPHLPMSPSTQQPA 


210 
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Sbjct: 


1102 


Query : 


211 


Sbjct: 


1161 


Query : 


268 


Sbjct: 


1221 


Query: 


317 


Sbjct: 


1281 


Query : 


376 


Sbjct: 


1339 


Score 


= 109 


Identities : 


Query: 


67 


Sbjct: 


764 


Query: 


125 


Sbjct: 


823 


Score 


= 107 


Identities = 


Query: 


71 


Sbjct: 


742 


Query: 


127 


Sbjct: 


801 


Score 


= 104 


Identities = 


Query: 


67 


Sbjct: 


451 


Query: 


124 


Sbjct: 


508 


Query: 


176 


Sbjct : 


567 


Query: 


232 


Sbjct: 


625 


Query: 


286 


Sbjct: 


685 


Query : 


338 


Sbjct: 


745 


Score 


= 103 


Identities = 


Query: 


36 


Sbjct: 


835 


Query: 


95 


Sbjct: 


894 


Query: 


147 



E EQL ++ E R R L + E L+ 



L +Q R + 



4EAQ RKNLQLLS-EESELRLPHYLRSKALELTTTTMELGALRLQYL 316 

E Q R+ QLL EE ELR + + E E LR Q 



+ L E ++ +E + Y+A+ + E RL+ LR + 



E K RE 



1.7e-01 



+QL E R+ E Q +E EE R+ R + EEE++ Q+R+++ L QE + KL 



R+ EL +E++ ++ +E+E RE EQL E+ + R R L + E 



(16.1 bits), Expect - 3.0e-01, P = 2.6e-01 
35/109 (32%), Positives = 61/109 (55%) 



L Q+ ES+EE +E +++RR+ + EEE++ Q+R+++ L QE + KLR+ 
LREEEQLLQESEEERLRRQEREQQLRRERDRKFREEEQLLQEREEE-RLRRQERERKLRE 

SNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 17 9 

E L +E++ ++ +E + E RE EQL + + E R R L + E 

S--EQLLQEREEERLR-RQERERKLREEEQLLQEREEERLRRQERERKLREEE 850 

(15.6 bits), Expect = 9.4e-02, P = 9.0e-02 
84/339 (24%), Positives = 149/339 (43%) 

KQLSLESSRQVTSESQEEPWEEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE--HQEK 
fQL E ++ +EE EE RE R++L +LEEEE Q+R++ L E++ ++ + 



LRQWNLEDLAREQQRRWVQLEKEQESPRR EP EQLGEDVE-RRI FTPTSRWRDL 17 5 

R+ ++ Q RW QL++E + R +P EQL E+ E +R R R+ 



+ EL + + R++- Q+L + R+ E +R RR 



rKPK KSASFPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRK NLQLLSEE 285 

+ K + +R+ L+ + + + + f E +RK QLL E 



E RL R++ L L ELR + L+ RR Q LRQE 



Q+++E+E + E +L+ R + + ++++ L+E+ E L 

3QLLQESEEERLRRQ EREQQLRRERDRKFREEEQLLQEREEERL 789 

15.5 bits), Expect = 1.2e-01, P = l.le-01 
42/152 (27%), Positives = 74/152 (48%) 



ER + K +++E ++ +++ +++L E + + E QE 



^RQLWLEEEEMWQQRQKKWA LLEQEHQEKLRQWNLEDLAREQQ RRWVQ-LEKE 14 6 

+ L EE+E+ Q+R +K LL++ +E+LR+ E RE++ RR Q L +E 



RE EQL ++ E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score = 103 (15.5 bits), Expect = 7.8e-01, P = 5.4e-01 
Identities = 31/91 (34%), Positives = 52/91 (57%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQ 126 

++L E R++ E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 
Sbjct: 642 QELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQEREEE-RLRRQERARKLRE 700 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQL 157 

E L R++++ +L +E+E RE EQL 
Sbjct: 701 E--EQLLRQEEQ ELRQERERKLREEEQL 726 

Score = 101 (15.2 bits), Expect = 2.0e-01, P = 1.8e-01 
Identities = 38/111 (34%), Positives = 57/111 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLE 130 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ + 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEE-Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ++ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERDRKFREEEQLLQEREESRLRRQERDRKFREEERQL 1035 

Score = 101 (15.2 bits), Expect = 1.3e+00, P = 7.2e-01 
Identities = 33/108 (30%), Positives = 56/108 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R++ E Q EE+ R+ R + EEE++ +Q +++ L QE KLR+ E 
Sbjct: 841 ERERKLREEEQLLRQEEQELRQERARKLREEEQLLRQEEQE LRQERDRKLREE — EQ 895 

Query: 132 LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

L R++++ +L +E++ RE EQL ++ E R R L + E 

Sbjct: 896 LLRQEEQ ELRQERDRKLREEEQ1,LQF:SEFERI,RRQERERKLREEE 940 

Score = 99 (14.9 bits), Expect = 2.0e+00, P = 8.7e-01 
Identities = 32/97 (32%), Positives = 50/97 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R+ E Q EE E R L EEE Q +++ L QE + KLR+ E 
Sbjct ^ 578 EKRRRQERERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREE--EQ 635 

Query: 132 LAREQ QRRWVQLEKEQESPRREPEQLGEDVERRI 165 

L R++ Q R +L +E++ RRE ++L ++ ER++ 

Sbjct: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Score = 99 (14.9 bits), Expect = 2.0e+00, P - 8.7e-01 
Identities = 34/111 (30%), Positives = 58/111 (52%) 

Query: 67 KQLSLESSRQVTSESQ--EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

++L E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERERKL 720 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEK 177 

R+ + L RE+Q L +E++ RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQEREQQLRR 7 68 

Score = 98 (14.7 bits), Expect = 2.6e+00, P - 9.2e-01 
Identities = 37/146 (25%), Positives = 77/146 (52%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 
Sbjct: 715 ERERKLREEE--QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQESPRREPEQLG-EDVERRI 165 

++ E+EQ RE E+L ++ ER++ 

Sbjct: 773 KF-- REEEQLLQEREEERLRRQERERKL 798 

Score = 97 (14.6 bits), Expect = 3.3e+00, P = 9.6e-01 
Identities = 38/129 (29%), Positives = 63/129 (48%) 

Query: 72 ESSRQVTSESQ — EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL 129 

E R++ E Q +E EE R+ R + EEE++ +Q +++ L QE KLR+ 
Sbjct: 817 ERERKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE LRQERARKLREE — 871 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRT 18 9 
E L R++++ +L +E++ RE EQL E+ + R R L + E L+ 
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Sbjct: 


872 


EQLLRQEEQ ELRQERDRKLREEEQLLRQEEQEL — RQERDRKLREEE-QLLQESEEE 


925 


Query: 


190 


QSAHQSRRPHL 200 








+ Q R L 




Sbjct: 


926 


RLRRQERERKL 93 6 




Score 


= 96 


(14.4 bits), Expect = 4.1e+00, P = 9.8e-01 




Identities ■ 


= 41/132 (31%), Positives = 69/132 (52%) 




Ouery: 


46 


KDKDQEDYFQKGGLQI-KFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 


104 






+++ QE F + Q+ + ++QL ESQ E + E+ G+ R QL +EE 




Sbjct: 


473 


RERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRWQL QEE 


529 


Query: 


105 


MWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERR 


164 






++R +A Q QE+LR+ E+L RE++R+ E+E+E E Q ED +RR 




Sbjct: 


530 


AQRRRHTLYAKPGQ--QEQLREE— EELQREKRRQ EREREYREEEKLQREEDEKRR 


581 


Query: 


165 


IFTPTSRWRDLEK 177 








++R+LE+ 




Sbjct: 


582 


RQERERQYRELEE 594 




Score 


= 96 


(14.4 bits), Expect = 4.1e+00, P = 9.8e-01 




Identities = 35/138 (25%), Positives = 76/138 (55%) 




Query: 


28 


DRRFPKKWERPVAESL-GHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW 


86 






+R++ + E EL K +++E Q+ + ++ L Q+ + ++E 




Sbjct: 


586 


ERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE-L 


644 


Query: 


87 


EEEFGREMRRQLWL EEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQL 


143 






+E R++R + L EE+E+ Q+R++K L +E Q L+ + E L R+++ R +L 




Sbjct: 


645 


RQERERKLREEEQLLRREEQELRQERERK LREEEQ-LLQEREEERLRRQERAR — KL 


698 


Query: 


144 


EKEQESPRREPEQLGEDVERRI 165 








+E++ R+E ++L ++ ER++ 




Sbjct: 


699 


REEEQLLRQEEQELRQERERKL 720 




Score 


= 95 


(14.3 bits), Expect = 5.2e+00, P = 9.9e-01 




Identities = 


= 59/282 (20%), Positives = 121/282 (42%) 




Query: 


20 


EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 


79 






E LL ++ ++ ER + E + +E+ ++ K +QL + + + + 




Sbjct : 


655 


EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 


714 


Query : 


80 


ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 


138 






E + + EEE + +RR+ L +E ++ +++ LL++ +E+LR+ E L RE+ R 




Sbjct: 


715 


ERERKLREEE— QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 


772 


Query: 


139 


RWVQLEKEQESPRREPEQLG-EDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ--S 


195 






++ E+EQ RE E+L ++ ER++ ++ E+ L + + Q 




Sbjct: 


773 


KF--REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKLREEEQLLQ 


830 


Query: 


196 


RRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 


255 






R + ++ L ++ + E R R ++ +R+ 




Sbjct: 


831 


EREEERLRRQERERKLREEEQLLRQE-EQELRQERARKLREEEQLLRQEEQELRQERDRK 


889 


Query: 


256 


LQISPANIKKKVYHMDMEAQRK NLQLLSEESELRLPHYLRSKAL 299 








L+ ++++ + E RK QLL E E RL R + L 




Sbjct : 


890 


LREEEQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKL 936 




Score 


= 94 


(14.1 bits), Expect = l.le+00, P = 6.8e-01 




Identities = 


= 35/116 (30%), Positives = 59/116 (50%) 




Query : 


72 


ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEK L 


124 






E +R++ E Q EE+ R+ R + + EEE++ Q+R+++ L QE K L 




Sbjct: 


977 


ERARKLREEEQLLRREEQELRQERDRKFREEEQLLQEREEE-RLRRQERDRKFREEERQL 


103: 



Query: 12 5 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R+ LE+ R+++ R +LE EQ +E +QL R F + R ++ E L 

Sbjct: 1036 RRQELEEQFRQERDRKFRLE-EQIRQEKEEKQLRRQERDRKFREEEQQRRRQEREQQL 1092 

Score = 94 (14.1 bits), Expect = l.le+00, p = 6.8e-01 
Identities - 51/166 (30%), Positives = 76/166 (45%) 



Query: 67 KQLSLESSRQVTSESQ--EEPWEEEFGREMR-RQLWLEEEEMWQQRQKKWALLEQEHQEK 123 

+ +L E R+ E Q +E EE R+ R R+L EEE++ + Q++ L QE+ 
Sbjct: 1250 QELRRERDRKFREEEQLLQEREEERLRRQERARKLREEEEQLLFEEQEEQRL RQER 1305 

Query: 124 LRQWNLED-LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R++ E+ ARE++ R +LE+E R+E EQ R F R E+ E 

Sbjct: 1306 DRRYRAEEQFAREEKSR — RLEREL RQEEEQRRRRERERKFREEQLRRQQEE-EQRR 1359 
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Query: 183 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 232 

R QSRR L P T+Q A R E+ R++ P 

Sbjct: 136Q RQLRERQFREDQSRRQVL--EPGTRQFARVPVRSSPLYEYIQEQRSQYRP 1407 



Score 


= 93 


(14.0 bits), Expect = 8.3e+00, P = 1.0e+00 




Identities 


= 41/145 (28%), Positives = 72/145 (49%) 




Query: 


28 


DRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW- 


86 






+RR ++ER+E ++Q+ + Q+ L R + QE+ + 




Sbjct : 


408 


ERRQRQERERELEEQARRQQQWQAEEESERRRQ-RLSARPSLRERQLRAEERQEQEQRFR 


466 


Query : 


87 


-EEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEKLRQWNLEDLAREQQRRWVQ 


142 






EEE RE R++L +LEEEE Q+R++ L E++ +++ R+ ++ Q RW Q 




Sbjct: 


467 


EEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRW-Q 


525 


Query: 


143 


LEKEQESPRR EP EQLGEDVE 162 








L++E + R +P EQL E+ E 




Sbjct : 


526 


LQEEAQRRRHTLYAKPGQQEQLREEEE 552 




Score 


= 91 


(13.7 bits), Expect = 2.4e+00, P = 9.1e-01 




Identities - 


= 38/110 (34%), Positives = 57/110 (51%) 




Query: 


72 


ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL- 


129 






E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ 




Sbjct : 


931 


ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 


988 


Query: 


130 


EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 180 








++L +E+ R++ E+EQ RE E+L R F R L + EL 




Sbjct: 


989 


LRREEQELRQERDRKF--REEEQLLQEREEERLRRQERDRKFREEER— QLRRQEL 104 0 


Score 


= 89 


(13.4 bits), Expect = 2.2e+00, P - 8.9e-01 




Identities • 


= 35/138 (25%), Positives = 65/138 (47%) 




Query : 


82 


QEEPWEEEFGREMRRQLWLEEEEM--WQQRQKKKALLEQEHQEKLRQWNLEDLAREQQRR 


139 






Q E++ E+R + + +E E WQ+++++ L E+E Q K R+ + +R+ + + 




Sbjct : 


111 


QNRRQEDQRRFELRDRQFEDEPERRRWQKQEQERELAEEEEQRKKRERFEQHYSRQYRDK 


170 


Query : 


140 


WVQLEKEQ-ESPRREPEQL GEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ 


194 






+L++++ E RE EQL GDEF +RE+EL Q+ 




Sbjct: 


171 


EQRLQRQELEERRAEEEQLRRRKGRDAEE- - FI EEEQLRRREQQELKR- ELREEEQQRRE 


227 


Query : 


195 


SRRPHLPMSPSTQQPALGKQR 215 








R H ++ L ++R 




Sbjct: 


228 


RREQHERALQEEEEQLLRQRR 248 




Score 


= 50 


(7.5 bits). Expect = 2.2e+00, P = 8.9e-01 




Identities = 


= 34/160 (21%), Positives = 67/160 (41%) 




Query: 


325 


RLQSLRQEAINHVQIMKETEASYKAQNLYIFLENIDRL -QSLRLQAWTDKQKGLEEKHRE 


383 






R + R+E Q+ +E E + + LE +R Q LR + ++++ E++ R 




Sbjct : 


245 


RQRRWREEPREQQQLRRELEEIREREQR LEQEERREQQLRREQRLEQEERREQQLRR 


301 


Query: 


384 


CLSSMVTMFPKLQLEWNVHLNIP-EVTSPKPKKCKLPAASPRHIRPSGPTYKQPFLSRHR 


442 






L + +L+ E + E + K +L R R ++ L+ 




Sbjct : 


302 


ELEEIREREQRLEQEERREQRLEQEERREQQLKRELEEIREREQRLEQEERREQLLAEEV 


361 


Query : 


443 


ACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASLPRDQ 484 








+ AR++G+ + W+ ++ S + A + K S PR Q 




Sbjct: 


362 


R EQARERGESLTRRWQRQLESEAGARQSKV- YSRPRRQ 398 




Score 


= 40 


(6.0 bits), Expect = 1.9e-01, P = 1.7e-01 




Identities = 


= 32/115 (27%), Positives = 47/115 (40%) 




Query: 


276 


RKNLQLLSEESELRLPHYLF.SKAL — ELTTTTMELGALRLQYLCHKYI FYRRL-QSLRQE 


332 






R+ QLL E E RL R++ L E E LR Q K+ +L Q +E 




Sbjct: 


959 


REEEQLLQEREEERLRRQERARKLREEEQLLRREEQELR-QERDRKFREEEQLLQEREEE 


1017 


Query : 


333 


AINHVQI MKETEASYKAQNLYI-FLENIDRLQSLRLQAWTDKQ-KGLEEKHRE 383 








+ + +E E + Q L F + DR L Q +K+ K L + R+ 




Sbjct: 


1018 


RLRRQERDRKFREEERQLRRQELEEQFRQERDRKFRLEEQIRQEKEEKQLRRQERD 107 3 


Score 


= 37 


(5.6 bits), Expect = 1.6e+00, P = 7.9e-01 




Identities = 


= 27/108 (25%), Positives = 43/108 (39%) 




Query: 


276 


RKNLQLLSEESELRLPHYLRSKAL ELTTTTMELG7ALRLQYLCHK Y I FYRRLQSLRQE 


332 






R+ QLL E E RL R + L E E LR Q K R+LQE 




Sbjct: 


775 


REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKL REEEQLLQE 


831 


Query: 


333 


AINHVQIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRE 383 








+EE ++ + E L+R+ ++++ L ++ +E 




Sbjct : 


832 


REEERLRRQERERKLREEEQLLRQEE-QELRQERARKLREEEQLLRQEEQE 881 
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Pedant information for DKFZphtes3_50n23 , frame 1 



Report for DKFZphtes3_50n23 . 1 

[LENGTH] 499 

[MW] 58885.69 

[pi] 9.67 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 10.42 % 

SEQ MTVRSRVADVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQ 

SEG 

PRD ccccccceeecccccccccceeeccccccccccccchhhhhhhcccccccccccccccce 

SEQ IKFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEH 

SEG xxxxxxxxxx. . xxxxxxxxxxxxxxxxxxx 

PRD eeeecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeccccccchhhhhhhh 

SEQ SLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSAS 

SEG xxxxxxxxxxxxxxx. . . 

PRD hccccccchhhhhccccccccccccccccccccccccceeeeeeccccccccccccceee 

SEQ FPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRKNLQLLSEESELRLPHYLRSKALE 

SEG xxxxxxxx 

PRD ecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LTTTTMELGALRLQYLCHKYIFYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENID 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLQSLRLQAWTDKQKGLEEKHRECLSSMVTMFPKLQLEWNVHLNIPEVTSPKPKKCKLPA 

SEG 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccccccccccc 

SEQ ASPRHIRPSGPTYKQPFLSRHRACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASL 

SEG 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccc 

SEQ PRDQLRGHPDIPRLLTLDV 

SEG 

PRD ccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_50n2 3 . i ) 
(No Pfam data available for DKFZphtes3_50n23 . 1 ) 
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DKF2phtes3_6b21 



group: testes derived 



DKFZphtes3_6b21 encodes a novel 781 amino acid protein without similarity to human KIAA0256 
gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to KIAA0256 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: /map="356.3 cR from top of Chr9 linkage group" 
Insert length: 3360 bp 

Poly A stretch at pos. 3314, polyadenylation signal at pos. 3300 



1 GGCAAGCCGA CGGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 
51 CTCGCGGCAT GGCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 
101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 
151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TGTCTTCCCC AGCTCTGCAG 
201 CCACATACTA TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 
251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 
301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 
351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA TGATGAGAAA AAAACGTATG 
401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 
451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 
501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 
551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 
601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 
651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 
701 TAAGAGAAGT AGTAAAACCA GCTGCAGTGT TATCAAAGGG TGAAATAGTG 
751 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 
801 TCCTTCATGT ACAAGAGAGT TATCTTGCAC ACCAATGGGT TATGTTGTTC 
851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 
901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 
951 TATACCATCT TCTGAAGCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 
1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 
1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 
1101 TACATCAAAA TATGAAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 
1151 ATGCCGAGGA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAGAGACAGA 
1201 ATAGAGACAC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 
1251 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 
1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 
1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 
1401 ATGTGCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 
1451 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 
1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA CTGAAGAAGA TTATTTTGAA 
1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 
1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 
1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 
1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 
1751 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 
1801 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 
1851 TAAAGAAGTG GATGCTTGTG TTACCGACCT ACTCAAAGAA CTGGTCCGTT 
1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 
1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 
2001 ACTGAAATGT GTCATTATTT CTCCCAACTG T GAG A AG AT A CAGTCAAAAG 
2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 
2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 
2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TATGATGGGG 
2201 CCCAGGATCA GTTCCACAAG ATGGTTGAGC TGACAGTGGC GGCCCGACAG 
2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 
2301 CAGGCCTCAG GCACCTCCCA GCCTACCCAC ACAGGGCCCC AGCTGCCCTG 
2351 CAGAAGATGG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 
2 4 01 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATGTA CCCTGGAGCT 
24 51 AGAAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 
2 501 GAGAGTTCTT GCCTGTGTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 
2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 
2 601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 
2 651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATGTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GGCTTTCTGA 

2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTGATACT TAGAATACTT 

2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATAGCTG AGCCAAGTGA 

2851 GTGAGTTTGC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 

2901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 

2951 GGCAGAAGGG TGCAGGGCTG CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 

3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 

3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 

3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 

3151 AATGTGTATT GAAGTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 

32 01 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 

32 51 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 

3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AAAAAAAAAA 



BLAST Results 



Entry HS773347 from database EMBL: 
human STS WI-18160. 
Score - 813, P = 2.9e-30, identities = 167/171 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 157 bp to 2499 bp; peptide length: 781 
Category: similarity to known protein 



1 MVRVLRSMCL PQLCSHILSV CSGTTSDRNV YSVPGSQYLY NQPSCYRGFQ 

51 TVKHRNENTC PLPQEMKALF KKKTYDEKKT YDQQKFDSER ADGTISSEIK 

101 SARGSHHLSI YAENSLKSDG YHKRTDRKSR IIAKNVSTSK PEFEFTTLDF 

151 PELQGAENNM SEIQKQPKWG PVHSVSTDIS LLREVVKPAA VLSKGEIVVK 

201 NNPNESVTAN AATNSPSCTR ELSWTPMGYV VRQTLSTELS AAPKNVTSMI 

251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHIIHPTQK SKASQGSDLE 

301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 

351 TPKFQSKQQP QDNFKNNVKK SQLPVQLDLG GMLTALEKKQ HSQHAKQSSK 

401 PVVVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LMKKGKQREI 

451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSDDTQD GESGGDDQFP 

501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPNHTTFPK 

551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQDRMYQKD PVKAKTKRRL 

601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT IIDYACEQNI 

651 PFVFALNRKA LGRSLNKAVP VSVVGIFSYD GAQDQFHKMV ELTVAARQAY 

701 KTMLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHYIEI 

751 WKKHLEAYSG CTLELEESLE ASTSQMMNLN L 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_6b21, frame 1 

SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256., N = 1, Score = 
786, P = 3.6e-78 

TREMBL : PFMAL3P3_15 gene: "MAL3P3.15"; Plasmodium falciparum MAL3P3, N 
= 2, Score = 161, P = 5.1e-10 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
= 1, Score = 150, P = 9.1e-07 



>SWISSPROT: Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 
Length = 635 

HSPs : 

Score = 786 (117.9 bits), Expect = 3.6e-78, P = 3.6e-78 
Identities = 190/424 (44%), Positives = 263/424 (62%) 
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Query: 369 KKSQLPVQLDLGGMLTALEKKQHSQHAKQ— SSKPVVVSVGAVPVLSKECASGERGRRMS 426 

KK++ PVQLDLG ML ALEK+Q + A+Q +++P+ +V + ++ + S 

Sbjct: 16 KKNKTPVQLDLGDMLAALEKQQQAMKARQITNTRPLSYTVVTAASFHTKDSTNRKPLTKS 75 

Query: 427 Q-MKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVSPAFTS 485 

Q T N +D ++ KKGK++EI K K+PT+LKK+ILKER+E+K RL + S 
Sbjct: 76 QPCLTSFNSVDIASSKAKKGKEKEIAKLKRPTALKKVILKEREEKKGRLTVD — HNLLGS 133 

Query: 486 DDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPG — TELQRDTEASHL — 541 

++ + D P++ G+ + S S+ S+ P T + + + AS 

Sbjct: 134 EEPTEMHLDFI DDLPQEI VSQEDTGLS-MPSDTSLSPASQNSPYCMTPVSQGS PAS SGI G 192 

Query: 542 APN-HTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 600 

+P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+YQKDPV+AK +RRL 
Sbjct: 193 SPMASSTITKIHSKRFREYCNQVLCKEIDECVTLLLQELVSFQERI YQKDPVRAKARRRL 252 

Query: 601 VLGLREVLKHLKLKKLKCVIISPNCEKIQSKGGLDDTLHTIIDYACEQNIPFVFALNRKA 660 

V+GLREV KH+KL K+KCVI I SPNCEKIQSKGGLD+ L+ +1 A EQ IPFVFAL RKA 
Sbjct: 253 VMGLREVTKHMKLNKIKCVIISPNCEKIQSKGGLDEALYNVIAMAREQEIPFVFALGRKA 312 

Query: 661 LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRP 717 

LGR +NK VPVSVVGIF+Y GA+ F+K+VELT AR+AYK M+ ++QE E 
Sbjct: 313 LGRCVNKLVPVSVVGIFNYFGAESLFNKLVELTEEARKAYKDMVAAMEQEQAEEALKNVK 372 

Query: 718 QAPPSLP-TQGPS CPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTL ELE 766 

+ P + ++PS C P+EEYW++EG EE 

Sbjct: 373 KVPHHMGHSRNPSAASAISFCSVISEP — ISEVNEKEYETNWRNMVETSDGLEASENEKE 430 

Query: 767 ESLEASTSQ 775 

S + STS+ 
Sbjct: 431 VSCKHSTSE 439 



Pedant information for DKFZphtes3_6b21, frame 1 



Report for DKFZphtes3_6b2 1 . 1 



[LENGTH] 


781 






[MW] 


87393.44 






[pi] 


8.94 






[H0MOL] 


SWISSPROT:Y256 HUMAN 


HYPOTHETICAL PROTEIN KIAA0256 


. 4e-75 


[PROSITE] 


MYRISTYL 4 






[PROSITE] 


AMIDATION 1 






[PROSITE] 


CAMP PHOSPHO SITE 


3 




[PROSITE] 


CK2 PHOSPHO SITE 


16 




[PROSITE] 


TYR PHOSPHO SITE 


4 




[PROSITE] 


PKC PHOSPHO SITE 


16 




[PROSITE] 


ASN_GLYCOSYLATION 


6 




[KW] 


Alpha Beta 






[KW] 


LOW_COMPLEXITY 8. 


,45 % 





SEQ MVRVLRSMCLPQLCSHILSVCSGTTSDRNVYSVPGSQYLYNQPSCYRGFQTVKHRNENTC 

SEG 

PRD ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

SEQ PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHHLSIYAENSLKSDG 

SEG xxxxxxxxxxxx 

PRD cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhhhhcccceeeeeeeecccccc 

SEQ YHKRTDRKSRIIAKNVSTSKPEFEFTTLDFPELQGAENNMSEIQKQPKWGPVHSVSTDIS 

SEG 

PRD cccccchhhhheeeccccccccceeecccccccccccchhhhhhccccccccceeecchh 

SEQ LLREVVKPAAVLSKGEI VVKNNPNESVTANAATNSPSCTRELSWTPMGYVVRQTLSTELS 

SEG 

PRD hhhhhhheeeeecccceeeeccccceeeeeecccccccceeeeeccceeeeeeccccccc 

SEQ AAPKNVTSMINLKTIASSADPKNVSIPSSEALSSDPSYNKEKHIIHPTQKSKASQGSDLE 

SEG 

PRD ccccceeeeehhhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

SEQ QNEASRKNKKKKEKSTSKYEVLTVQEPPRIEDAEEFPNLAVASERRDRIETPKFQSKQQP 

SEG . . . . xxxxxxxxxxxxxx 

PRD hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

SEQ QDNFKNNVKKSQLPVQLDLGGMLTALEKKQHSQHAKQSSKPWVSVGAVPVLSKECASGE 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeecccccc 
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SEQ RGRRMSQMKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVS 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ PAFTSDDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPGTELQRDTEASH 

SEG 

PRD ccccccccccccccccccchhhhhhcccccceeeeccccccccccccccccccccccccc 

SEQ LAPNHTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhh 

SEQ VLGLREVLKHLKLKKLKCVII SPNCEKIQSKGGLDDTLHTIIDYACEQNI PFVFALNRKA 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhcccceeeeccccc 

SEQ LGRSLNKAVPVSVVGIFSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRPQAP 

SEG 

PRD cccccccceeeeeeeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ PSLPTQGPSCPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 

SEQ L 
SEG 

PRD c 



Prosite for DKFZphtes3_6b21 . 1 



pcnnnm 


135- 


•>139 


ASM 


GLYCOSYLATION 


PDOC00001 




159- 


•>163 


asn" 


GLYCOSYLATION 


PDOC00001 




204- 


•>208 


ASN 


GLYCOSYLATION 


PDOC000Q1 


PS00001 


245- 


■>249 


asn" 


GLYCOSYLATION 


PDOC00001 


pcnnnm 


263- 


•>267 


ASN^ 


"GLYCOSYLATION 


PDOC00001 


PSOOOO 1 


544- 


•>548 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00004 


71 


->75 


CAMP PHOSPliO SITE 


PDOC00004 


PS00004 


423- 


>427 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


454- 


■>458 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


25->29 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


51 


.->54 


PKC 


"PHOSPHO_SITE 


PDOC00005 


PS00005 


85 


:->91 


PKC^ 


PHOSPHO SITE 


PDOC00005 


PS00005 


101- 


•>104 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


115- 


•>118 


PKC" 


"PHOSPHO_SITE 


PDOC00005 


PS00005 


125- 


•>128 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


138- 


•>141 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


288- 


•>291 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


305- 


>308 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


316- 


•>319 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


343- 


•>346 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


351- 


•>354 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


398- 


•>401 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


458- 


•>461 


PKC 


"PHOSPHO_SITE 


PDOC00005 


PS00005 


553- 


•>556 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


596- 


>599 


PKC_ 


"PHOSPHO SITE 


PDOC00005 


PS00006 


24 


,->28 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


74 


i ->78 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


139- 


■>143 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


146- 


■>150 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


■>197 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


257- 


•>261 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


297- 


•>301 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


317- 


•>321 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


323- 


■>327 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


384- 


•>388 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


484- 


>4B8 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


506- 


>510 


CK2 


"PHOSPHO_SITE 


PDOC00006 


PS00006 


519- 


■>523 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


640- 


>644 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


702- 


■>706 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00007 


581- 


>588 


TYR 


"PHOSPHO SITE 


PDOC00007 


PS00007 


740- 


>748 


TYR 


PHOSPHO SITE 


PDOC00007 


PS00007 


740- 


■>748 


TYR 


"PHOSPHO SITE 


PDOC00007 


PS00007 


73->82 


TYR 


'PHOSPHO SITE 


PDOC00007 


PS00008 


92 


:->99 


MYRISTYL 


PDOC00008 


PS00008 


155- 


•>161 


MYRISTYL 


PDOC00008 


PS00008 


380- 


•>386 


MYRISTYL 


PDOC00008 
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PS00008 633->639 MYRISTYL PDOC00008 

PS00009 421->425 AMI DAT I ON PDOC00009 



(No Pfam data available for DKFZphtes3_6b21 . 1 ) 
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DKFZphtes3_6cll 



group: signal transduction 

DKFZphtes3_6cll encodes a novel 1025 amino acid protein with similarity to A. ambisexualis 
antheridiol steroid receptor. 

The novel protein is a putative steroid receptor. It shares similarity with yeast YNL132w and 
contains the ATP/GTP-binding site motif A (P-loop) and RGD site, similar to the A. 
ambisexualis antheridiol steroid receptor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this receptor. 



strong similarity to YNL132w 

strong similarity to S . pombe/'r'DK9_5CHPO, S.cerevisiae/YNL132w, 
C.elegans/F55A12.8 

Sequenced by BMFZ 

Locus : unknown 



Insert length: 3966 bp 

Poly A stretch at pos . 3890, polyadenylation signal at pos . 3873 



1 GCTGTGCCTT CTCTTTCGGA GTTGTTCCGT GCTCCCACGT GCTTCCCCTT 
51 CTCCACTGGC TGGGATCCCC CGGGCTCGGG GCGCAGTAAT AATTTTTCAC 
101 CATGCATCGG AAAAAGGTGG ATAACCGAAT CCGGATTCTC ATTGAGAATG 
151 GAGTAGCTGA GCGGCAAAGA TCTCTCTTTG TTGTAGTTGG GGATCGAGGA 
201 AAAGATCAGG TGGTAATACT TCATCACATG TTATCCAAAG CAACTGTGAA 
251 GGCTCGGCCT TCAGTGCTGT GGTGTTATAA GAAAGAGCTG GGGTTTAGCA 
301 GTCACCGGAA GAAAAGAATG CGACAGCTGC AGAAGAAAAT AAAGAATGGA 
351 ACACTGAACA TAAAGCAGGA CGACCCCTTT GAACTCTTCA TAGCAGCCAC 
4 01 AAACATTCGC TACTGCTACT ACAACGAGAC CCACAAGATC CTGGGCAATA 
451 CCTTCGGCAT GTGTGTGCTG CAGGATTTTG AAGCCTTAAC TCCAAACTTG 
501 CTGGCCAGGA CTGTAGAAAC AGTGGAAGGT GGTGGGCTAG TGGTCATCCT 
551 CCTACGGACC ATGAACTCAC TCAAGCAATT GTACACAGTG ACTATGGATG 
601 TGCATTCCAG GTACAGAACT GAGGCCCATC AGGATGTGGT GGGAAGATTT 
651 AATGAAAGGT TTATTCTGTC TCTGGCCTCT TGTAAGAAGT GTCTCGTCAT 
701 TGATGACCAG CTCAACATCC TGCCCATCTC CTCCCACGTT GCCACCATGG 
751 AGGCCCTGCC TCCCCAGACT CCGGATGAGA GTCTTGGTCC TTCTGATCTG 
801 GAGCTGAGGG AGTTGAAGGA GAGCTTGCAG GACACCCAGC CTGTGGGTGT 
851 GTTGGTGGAC TGCTGTAAGA CTCTAGACCA GGCCAAAGCT GTCTTGAAAT 
901 TTATCGAGGG CATCTCTGAA AAGACCCTGA GGAGTACTGT TGCACTCACA 
951 GCTGCTCGAG GACGGGGAAA ATCTGCAGCC CTGGGATTGG CGATTGCTGG 
1001 GGCGGTGGCA TTTGGGTACT CCAATATCTT TGTTACCTCC CCAAGCCCTG 
1051 ATAACCTCCA TACTCTGTTT GAATTTGTAT TTAAAGGATT TGATGCTCTG 
1101 CAATATCAGG AACATCTGGA TTATGAGATT ATCCAGTCTC TAAATCCTGA 
1151 ATTTAACAAA GCAGTGATCA GAGTGAATGT ATTTCGAGAA CACAGGCAGA 
1201 CTATTCAGTA TATACATCCT GCAGATGCTG TGAAGCTGGG CCAGGCTGAA 
1251 CTAGTTGTGA TTGATGAAGC TGCCGCCATC CCCCTCCCCT TGGTGAAGAG 
1301 CCTACTTGGC CCCTACCTTG TTTTCATGGC ATCCACCATC AATGGCTATG 
1351 AGGGCACTGG CCGGTCACTG TCCCTCAAGC TAATTCAGCA GCTCCGTCAA 
1401 CAGAGCGCCC AGAGCCAGGT CAGCACCACT GCTGAGAATA AGACCACGAC 
1451 GACAGCCAGA TTGGCATCAG CGCGGACACT GCATGAGGTT TCCCTCCAGG 
1501 AGTCAATCCG ATACGCCCCT GGGGATGCAG TGGAGAAGTG GCTGAATGAC 
1551 TTGCTGTGCC TGGATTGCCT CAACATCACT CGGATAGTCT CAGGCTGCCC 
1601 CTTGCCTGAA GCTTGTGAAC TGTACTATGT TAATAGAGAT ACCCTCTTTT 
1651 GCTACCACAA GGCCTCTGAA GTTTTCCTCC AACGGCTTAT GGCCCTCTAC 
1701 GTGGCTTCTC ACTACAAGAA CTCTCCCAAT GATCTCCAGA TGCTCTCCGA 
1751 TGCACCTGCT CACCATCTCT TCTGCCTTCT GCCTCCTGTG CCCCCCACCC 
1801 AGAATGCCCT TCCAGAAGTG CTTGCTGTTA TCCAGGTGTG CCTTGAAGGG 
1851 GAGATTTCTC GCCAGTCCAT CTTGAACAGT CTGTCTCGAG GCAAGAAGGC 
1901 TTCAGGGGAC CTGATTCCAT GGACAGTGTC AGAACAGTTC CAAGATCCAG 
1951 ACTTTGGTGG TCTGTCTGGT GGAAGGGTCG TTCGCATTGC TGTTCACCCA 
2001 GATTATCAAG GGATGGGCTA TGGCAGCCGT GCTCTGCAGC TGCTGCAGAT 
2051 GTACTATGAA GGCAGGTTTC CTTGTCTGGA GGAAAAGGTC CTTGAGACAC 
2101 CACAGGAAAT TCACACCGTA AGCAGCGAGG CTGTCAGCTT GTTGGAAGAG 
2151 GTCATCACTC CCCGGAAGGA CCTGCCTCCT TTACTCCTCA AATTGAATGA 
2201 GAGGCCTGCC GAACGCCTGG ATTACCTGGG TGTTTCCTAT GGCTTGACCC 
2251 CCAGGCTCCT CAAGTTCTGG AAACGAGCTG GATTTGTTCC TGTTTATCTG 
2301 AGACAGACCC CGAATGACCT GACCGGAGAG CACTCGTGCA TCATGCTGAA 
2351 GACGCTCACT GATGAGGATG AGGCTGACCA GGGAGGCTGG CTTGCAGCCT 
2401 TCTGGAAAGA TTTCCGACGG CGGTTCCTAG CCTTGCTCTC CTACCAGTTC 
2451 AGTACCTTCT CTCCTTCCCT GGCTCTGAAC ATCATTCAGA ACAGGAACAT 
2501 GGGGAAGCCA GCCCAGCCTG CCCTGAGCCG GGAGGAGCTG GAAGCACTCT 
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2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 



TCCTCCCCTA 
GACTATCACC 
CCTGAACCAG 
TCTTGGGGAT 
ATTGAGCTGC 
CAAAGTTGTG 
AGATGGTGGC 
AGTGACGACC 
GGAAGTAGGG 
GGGACGATGA 
TCGATCATCA 
AGAACCCAAA 
AAGATATGAA 
GTGTTTGATC 
ACTGTTAAAA 
TTCGGCCTCT 
GTCACTCCCA 
TCTAGAATTG 
TTCCTATAAG 
ACACATGTGG 
ATCGCTTTCT 
CTTTGTGGAC 
CCATGGCAGC 
GCTGTTCCAC 
CCTGTAGTTT 
CCATTTGGGA 
GGAAGGATAG 
AAAAAAAAAA 
AAAAAAAAAA 



TGACCTGAAG 
TCATCATGGA 
CTGGGGGACC 
TGGCCTGCAG 
CCTCGGGCCA 
AAGCTATTTA 
AGCGAAGGAT 
TAGATGAAGC 
AAGCTGAAGA 
AGAGTGGAAT 
GCCTGAAAAG 
CAGAGCAAGA 
ACTGAAGCGG 
ATGGGAAGAT 
GCAACGAGAG 
GGGCCTGTGT 
AATGGGTCTC 
CCACGAGTCT 
TTCATATTTT 
AAGCCACGTT 
GGTGGTGCCC 
TTGTACCTGG 
CCGCGGTTAG 
TCTTGGCTCC 
ATGTAGAATG 
AAAGATGTTG 
AGAATCTATT 
AAAAAAAAAA 
AAAAAA 



CGGCTGGAGA 
CATGATCCCG 
TGGCCCTGTC 
CATAAGTCTG 
GTTGATGGGA 
ATGAAGTTCA 
GTGGTCATGG 
AGCAAAGGAA 
GCATGGACCT 
GAAGTTTTGA 
TGACAAGAAA 
AGTTGAAGAA 
AAGAAATAGT 
ACTCTCACTA 
GCCCCGGCAC 
GTCTGTGAGC 
TTTAGAACTT 
CTCTCTTCCT 
GCTTTGAGCC 
GCCTCTCGAC 
AGGAGGCTGC 
AGCAGGAGGA 
GTGCGCCAGG 
AGCAGACCCA 
CCACATCTGC 
GGAAAGGCCA 
TTTAATAAAT 
AAAAAAAAAA 



TGTATTCACG 
GCCATCTCTC 
TGCGGCTCAG 
TGGACCAGCT 
CTTTTCAACC 
GGAAAAGGCC 
AGCCCACGAT 
TTTCAGGAGA 
CTCTGAATAC 
ACAAAGCTGG 
AGGAAGTTAG 
CAGAGAGACA 
GAAGAGAAAC 
ACTGAACCCT 
ACCTGGAAGC 
TCAACCTGGC 
GATGGCTGGG 
GCCCAGTCCA 
AGCTTTTTAG 
CGCCTGAGGC 
TGCTGGGCCG 
ACTCCAGTCC 
GTTTGCTGAT 
CTGTCCCAGA 
GTCCTCAAGA 
CTTTGCTCGC 
AACATTCTAG 
AAAAAAAAAA 



GAATATGGTG 
GCATCTATTT 
TCGGCTCTTC 
GGAAAAGGAG 
GGATCATCCG 
ATTGAGGAGC 
GAAGACCCTC 
AACACAAGAA 
ATAATCCGTG 
GCCGAACGCC 
AGGCCAAACA 
AAGAACAAAA 
TCGGGCATCT 
CTCTGGCTGG 
TGGCCGCGAA 
TAAAGGCAGA 
CACTGCCATC 
GGGCCCTCCT 
TCTCATTCCC 
CCTTAAGTAC 
CTGGGTCTCT 
GTCCCGGCAT 
GTTGTCTTGT 
AAAGCCTGAT 
CCTGTTTCAT 
AGGGGTGAGG 
AATGAAAAAA 
AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP GTP A (284-292) 



1 
51 

101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



MHRKKVDNRI 
ARPSVLWCYK 
NIRYCYYNET 
LRTMNSLKQL 
DDQLNILPIS 
LVDCCKTLDQ 
AVAFGYSNIF 
FNKAVIRVNV 
LLGPYLVFMA 
TARLASARTL 
LPEACELYYV 
APAHHLFCLL 
SGDLIPWTVS 
YYEGRFPCLE 
RPAERLDYLG 
TLTDEDEADQ 
GKPAQPALSR 
LNQLGDLALS 
KVVKLFNEVQ 
EVGKLKSMDL 
EPKQSKKLKN 



RILIENGVAE 
KELGFSSHRK 
HKILGNTFGM 
YTVTMDVHSR 
SHVATMEALP 
AKAVLKFIEG 
VTSPSPDNLH 
FREHRQTIQY 
STINGYEGTG 
HEVSLQESIR 
NRDTLFCYHK 
PPVPPTQNAL 
EQFQDPDFGG 
EKVLETPQEI 
VSYGLTPRLL 
GGWLAAFWKD 
EELEALFLPY 
AAQSALLLGI 
EKAIEEQMVA 
SEYIIRGDDE 
RETKNKKDMK 



RQRSLFVVVG 
KRMRQLQKKI 
CVLQDFEALT 
YRTEAHQDVV 
PQTPDESLGP 
ISEKTLRSTV 
TLFEFVFKGF 
IHPADAVKLG 
RSLSLKLIQQ 
YAPGDAVEKW 
ASEVFLQRLM 
PEVLAVIQVC 
LSGGRVVRIA 
HTVSSEAVSL 
KFWKRAGFVP 
FRRRFLALLS 
DLKRLEMYSR 
GLQHKSVDQL 
AKDVVMEPTM 
EWNEVLNKAG 
LKRKK 



DRGKDQVVIL 
KNGTLNIKQD 
PNLLARTVET 
GRFNERFILS 
SDLELRELKE 
ALTAARGRGK 
DALQYQEHLD 
QAELVVIDEA 
LRQQSAQSQV 
LNDLLCLDCL 
ALYVASHYKN 
LEGEISRQSI 
VHPDYQGMGY 
LEEVITPRKD 
VYLRQTPNDL 
YQFSTFSPSL 
NMVDYHLIMD 
EKEIELPSGQ 
KTLSDDLDEA 
PNASI1SLKS 



HHMLSKATVK 
DPFELFIAAT 
VEGGGLVVIL 
LASCKKCLVI 
SLQDTQPVGV 
SAALGLAI AG 
YEIIQSLNPE 
AAIPLPLVKS 
STTAENKTTT 
NITRIVSGCP 
SPNDLQMLSD 
LNSLSRGKKA 
GSRALQLLQM 
LPPLLLKLNE 
TGEHSCIMLK 
ALNIIQNRNM 
MIPAISRIYF 
LMGLFNRIIR 
AKEFQEKHKK 
DKKRKLEAKQ 



BLASTP hits 



No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes36cll , frame 3 

TREMBL : CEAF3130_4 gene: "F55A12.8"; Caenorhabditis elegans cosmid 
F55A12., N = 1, Score = 2782, P = l.le-289 

PIR:S55151 probable membrane protein YNL132w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 2549, P = 3.5e-273 

SWISSPROT: YXX1_ACHAM HYPOTHETICAL PROTEIN (FRAGMENT)., N = 1 , Score = 
1013, P = 3.2e-102 

SWISSPROT: YDK9SCHP0 HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN 
CHROMOSOME I., N = 1, Score = 2843, P = 3.8e-296 



>SWISSPROT: YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN CHROMOSOME 
I. 

Length = 1, 033 

HSPs: 

Score = 2843 (426.6 bits), Expect = 3.8e-296, P = 3.8e-296 
Identities = 576/1033 (55%), Positives = 750/1033 (72%) 

Query: 1 MHRKKVDNRIRILIENGVAERQRSLFWVGDRGKDQVVILHHMLSKATVKARPSVLWCYK 60 

M +K +D+RI LI+NG ' E+QRS FVVVGDR +DQVV LH +LS++ V ARP+VLW YK 
Sbjct: 1 MPKKALDSRIPTLIKNGCQEKQRSFFVVVGDRARDQVVNLHMLLSQSKVAARPNVLWMYK 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFG 119 

K+L GF+SHRKKR +++K+IK G + +DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKIKKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPNLLARTVETVEGGGLVVILLRTKNSLKQLYTVTMDVHSRYRTEAHQDV 179 

M VLQDFEALTPNLLART+ETVEGGG+VV+LL +NSLKQLYT++MD+HSRYRTEAH DV 
Sbjct: 121 MLVLQDFEALTPNLLARTIETVEGGGIVVLLLHKLNSLKQLYTMSMDIHSRYRTEAHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELK 239 

RFNERFILSL +C+ CLVIDD+LN+LPIS ++ALPP +++ + ++EL+ 

Sbjct: 181 TARFNERFILSLGNCENCLVIDDELNVLPISGG-KNVKALPPTLEEDN--STQNSIKELQ 237 

Query: 240 ESLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIA 299 

ESL + P G LV KTLDQA+AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAIA 
Sbjct: 238 ESLGEDHPAGALVGVTKTLDQARAVLTFVESIVEKSLKGTVSLTAGRGRGKSAALGLAIA 297 

Query: 300 GAVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVN 359 

A+A GYSNIF+TSPSP+NL TLFEF+FKGFDAL Y+EH+DY+IIQS NP ++ A++RVN 
Sbjct: 298 AAIAHGYSNIFITSPSPENLKTLFEFTFKGFDALNYEEHVDYDI IQSTNPAYHNAI VRVN 357 

Query: 360 VFREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 

+FR+HRQTIQYI P D+ LGQAELVVIDEAAAIPLPLV+ L+GPYLVFMASTINGYEGT 
Sbjct: 358 IFRDHRQTIQYISPEDSNVLGQAELVVIDEAAAIPLPLVRKLIGPYLVFMASTINGYEGT 417 

Query: 420 GRSLSLKLIQQLRQQSAQSQV3TTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEK 479 

GRSLSLKL+QQLR+QS S + NK+ + + + S RTL E+SL E IRYA GD +E 

Sbjct: 418 GRSLSLKLLQQLREQSRI — YSGSGNNKSDSQSHI-SGRTLKEISLDEPIRYAMGDRIEL 474 

Query: 480 WLNDLLCLDCLN-ITRI VS-GCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASH 537 

WLN LLCLD + ++R+ + G P P C LY V+RDTLF YH SE FLQR+M+LYVASH 
Sbjct: 475 WLNKLLCLDAASYVSRMATQGFPHPSECSLYRVSRDTLFSYHPISEAFLQRMMSLYVASH 534 

Query: 538 YKNSPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRG 597 

YKNSPNDLQ++SDAPAH LF LLPPV LP+ + VIQ+ LEG ISR+SI+NSLSRG 

Sbjct: 535 YKNSPNDLQLMSDAPAHQLFVLLPPVDLKNPKLPDPICVIQLALEGSISRESIMNSLSRG 594 

Query: 598 KKASGDLIPWTVSEQFQDPDFGGLSGGRVVRIAVHPDYQGMGYGSRALQLLQMYYEGRFP 657 

++A GDLIPW +S+QFQD +F L G R+VRIAV P++ MGYG+RA+QLL Y+EG+F 
Sbjct: 595 QRAGGDLIPWLISQQFQDENFAALGGARI VRIAVSPEHVKMGYGTRAMQLLHEYFEGKFI 654 

Query: 658 CLEEKVLETPQEIHTVSSEAV SLLEEVITPR — KDLPPLLLKLNERPAERLDYLGVS 712 

E+ ++E+ +LEIRK +PPLLLKL+E E L Y+GVS 

Sbjct: 655 SASEEFKAVKHSLKRIGDEEIENTALQTEKIHVRDAKTMPPLLLKLSELQPEPLHYVGVS 714 

Query: 713 YGLTPRLLKFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFR 772 

YGLTP L KFWKR G+ P+YLRQT NDLTGEH+C+ML+ L D WL AF ++F 

Sbjct: 715 YGLTPSLQKFWKREGYCPLYLRQTANDLTGEHTCVMLRVLEGRDSE WLGAFAQNFY 770 

Query: 773 RRFLALLSYQFSTFSPSLALNIIQNRNMGKP AQPALSREELEALFLPYDLKRLEMY 828 

RRFL+LL YQF F+ AL+++ N G + L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RRFLSLLGYQFREFAAI TALSVLDACNNGTKYWNSTSKLTNEEINNVFESYDLKRLESY 830 
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Query. 


829 


SRNMVDYHLIMDMI PAISRI YFLNQLGD-LALSAAQSALLLGIGLQHKSVDQLEKEIELP 


887 






q M4.+ nVH4-T 4-P.4-4-P 4-4- +VF 4- D 4- T Q fl 4-4-TT. 4-fXT .f)4-K 4- 4-fl T.PKf T.P 




Sbjct: 


831 


SNNLLDYHVIVDLLPKLAHLYFSGKFPDSVKLSPVQQSVLLALGLQYKTIDTLEKEFNLP 


890 


Query. 


888 




939 










Sbjct : 


891 


SNQLLAMLVKLSKKIMKCIDEIETKDIEEELGSNKKTESSNSKLPEFTPLQQSLEEELQE 


950 


Query: 


940 


AAKEFQ-EKHKKEVGKLKSMDLSEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEA 


998 






A E +K+ + ++DL +Y IRG++E+W KA N I R + 




Sbjct : 


951 


GADEAMLALREKQRELINAIDLEKYAIRGNEEDW KAAEN-QIQKTNGKGARVVSI 


1004 


Query : 


999 


KQEPKQSKKL--KNRETKNKKDMKLKRKK 1025 








K E ++4- L +++TK K K K +K 




Sbjct: 


1005 


KGEKRKNNSLDASDKKTKEKPSSKKKFRK 1033 





Pedant information for DKFZphtes3_6cll, frame 3 



Report for DKFZphtes3_6cll . 3 



[LENGTH] 1025 

[MW] 115704.57 

[pi] 8.50 

[HOMOL] PIR:S55151 probable membrane protein YNL132w - yeast ( Saccharomyces cerevisiae) 
0.0 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YNL132w] 0.0 

[FUNCAT] r general function prediction [H. influenzae, HI1254] 2e-05 

[PROSITE] ATP_GTP_A 1 

[PROSITE] RGD 1 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 11.80 % 



SEQ MHRKKVDNRIRILIENGVAERQRSLFVVVGDRGKDQVVILHHMLSKATVKARPSVLWCYK 

SEG 

PRD cccccccchhhhhhcccccccceeeeeeeeccccceeeeehhhhhhhhhhccceeehhhh 

SEQ KELGFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFGM 

SEG 

PRD hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

SEQ CVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDVV 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ GRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELKE 

SEG 

PRD hhhhhhhhhhhcccceeeeeecceeeeccccccccccccccccccccccccbhhhhhhhh 

SEQ SLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIAG 

SEG xxxxxxxxx 

PRD hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchbhhhhhhhh 

SEQ AVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVNV 

SEG xxx 

PRD hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhheeeeeccccccceeeeeeh 

SEQ FREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGTG 

SEG 

PRD hhhhhhheeeeccccccccccceeeehhhhhccchhhhhhhccceeeeeeeccccccccc 

SEQ RSLSLKLI QQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQES I RYAPGDAVEKW 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 

SEQ LNDLLCLDCLNITRI VSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKN 

SEG xxxxxxxxxxx 

PRD hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 

SEQ SPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 

SEG 

PRD cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 

SEQ SGDLIPWTVSEQFQDPDFGGLSGGRVVRIAVHPDYQGMGYGSRALQLLQMYYEGRFPCLE 

SEG 

PRD cccchhhhhhhhhhhccccccccceeeeeeccccccccccchhhhhhhhhhhhcccchhh 

SEQ EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVS YGLTPRLL 
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SEG xxxxxxxxxx 

PRD hhhhhccccccchhhhhhhhhhhhhhccccccccccccccccccceeeeccccccchhhh 

SEQ KFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFRRRFLALLS 

SEG 

PRD hhhhhcccceeeeeccccccccceeeeeeecccccccccchhhhhhhhhhhhhhhhhhhh 

SEQ YQFSTFSPSLALNIIQNRNMGKPAQPALSREELEALFLPYDLKRLEMYSRNMVDYHLIMD 

SEG 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhhhhhhhccchhhhhhhh 

SEQ MIPAISRIYFLNQLGDLALSAAQSALLLGIGLQHKSVDQLEKEIELPSGQLMGLFNRIIR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 

SEQ KVVKLFNEVQEKAIEEQMVAAKDVVMEPTMKTLSDDLDEAAKEFQEKHKKEVGKLKSMDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ SEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEAKQEPKQSKKLKNRETKNKKDMK 

SEG xxxxxxxxxxxxxxx 

PRD cceeecccchhhhhhhhhccccceeeeeeccchhhhhhhhcccccccccccccccchhhh 

SEQ LKRKK 

SEG xxxxx 

PRD hhccc 



Prosite for DKFZphtes3_6cll . 3 

PS00016 966->969 RGD PDOC0001S 

PS00017 284->292 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphtes3_6cll .3) 
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DKFZphtes3_6dl6 



group: testes derived 

DKFZphtes3_6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC:H_DJ1185I07 .2 . 

The cDNA is different to the proposed gene model: it contains additional exons. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



WUGSC:H_DJ1185I07 .2, differences to genmodel 

differences to genmodel of WUGSC : H_DJ1 185107 . 2 two exons skippt, 

Sequenced by BMFZ 

Locus: /map="7qll .23-q21" 

Insert length: 4572 bp 

Poly A stretch at pos . 4540, polyadenylation signal at pos. 4520 



1 GGCGGCGCTA GCTTCGGAGT CTCCCGCGCG CACCTCAGCC GCCTCCTAGC 
51 GGCGCGGCGC TCGCTCCTAC GCCTAAAATG ACCAATGTGT GATTTCAGTG 
101 GAATAAATGG CGTCCAAAGT CACAGATGCT ATAGTCTGGT ATCAAAAGAA 
151 GATTGGAGCA TATGATCAAC AAATATGGGA AAAATCTGTT GAACAGAGAG 
201 AAATCAAGGG GCTAAGGAAT AAACCAAAGA AAACAGCACA TGTGAAACCA 
251 GACCTCATAG ATGTTGATCT TGTAAGAGGG TCTGCATTTG CAAAGGCAAA 
301 GCCTGAAAGT CCTTGGACTT CTCTGACCAG AAAGGGAATT GTTCGAGTTG 
351 TATTTTTCCC CTTTTTCTTC CGGTGGTGGT TACAAGTAAC ATCAAAGGTC 
401 ATCTTTTTCT GGCTTCTTGT CCTTTATCTT CTTCAAGTTG CTGCAATAGT 
451 ATTATTCTGC TCCACTTCTA GCCCACACAG CATACCTCTG ACAGAGGTGA 
501 TTGGGCCGAT ATGGCTGATG CTGCTCCTGG GAACTGTGCA TTGCCAGATT 
551 GTTTCCACAA GAACACCCAA ACCTCCTCTA AGTACAGGGG GTAAAAGAAG 
601 AAGGAAATTA AGAAAAGCAG CCCATTTGGA AGTACATAGG GAAGGAGATG 
651 GTTCTAGTAC CACAGATAAC ACACAAGAGG GAGCAGTTCA GAACCACGGT 
701 ACAAGCACCT CTCACAGCGT TGGCACTGTC TTCAGAGATC TCTGGCATGC 
751 TGCTTTCTTT TTATCAGGAT CAAAGAAAGC AAAGAATTCA ATTGATAAAT 
801 CAACTGAAAC TGACAATGGC TATGTATCCC TTGATGGGAA GAAGACTGTT 
851 AAAAGCGGTG AAGATGGAAT ACAAAACCAT GAACCTCAGT GTGAAACTAT 
901 TCGACCAGAA GAGACAGCCT GGAACACAGG AACACTGAGG AATGGTCCTA 
951 GCAAAGATAC CCAAAGGACA ATAACAAATG TCTCTGATGA AGTCTCCAGT 
1001 GAGGAAGGTC CTGAAACAGG ATACTCATTA CGTCGTCATG TGGACAGGAC 
1051 TTCTGAAGGT GTTCTTCGGA ATAGAAAGTC ACACCATTAT AAGAAACATT 
1101 ACCCTAATGA GGACGCCCCT AAATCGGGTA CTAGTTGCAG CTCTCGCTGT 
1151 TCAAGTTCCA GACAGGATTC TGAGAGTGCA AGGCCAGAAT CTGAAACAGA 
1201 AGATGTGTTA TGGGAAGACT TGTTACATTG TGCAGAATGC CATTCATCTT 
1251 GTACCAGTGA GACAGATGTG GAAAATCATC AGATTAATCC ATGTGTGAAA 
1301 AAAGAATATA GAGA1GACCC TTTTCATCAG AGTCATTTGC CCTGGCTCCA 
1351 TAGTTCCCAC CCAGGATTAG AAAAAATAAG TGCTATAGTA TGGGAAGGTA 
1401 ATGATTGTAA GAAAGCAGAC ATGTCTGTAC TTGAAATCAG TGGAATGATA 
1451 ATGAACAGAG TGAACAGCCA TATACCAGGA ATAGGATACC AGATTTTTGG 
1501 AAATGCAGTC TCTCTCATAC TGGGTTTAAC TCCATTTGTT TTCCGACTTT 
1551 CTCAAGCTAC AGACTTGGAA CAACTCACAG CACATTCTGC TTCAGAACTT 
1601 TATGTGATTG CATTTGGTTC T AAT GAAGAT GTCATAGTTC TTTCTATGGT 
1651 TATAATAAGT TTTGTGGTTC GCGTGTCTCT TGTGTGGATT TTCTTTTTTT 
1701 TGCTCTGTGT AGCAGAAAGA ACTTATAAAC AGCGATTACT TTTTGCAAAA 
1751 CTCTTTGGAC ATTTAACATC TGCAAGGAGG GCTCGAAAAT CTGAGGTTCC 
1801 TCATTTCCGG TTGAAGAAAG TACAGAATAT AAAAATGTGG CTATCTCTCC 
1851 GTTCCTATCT TAAGCGTCGA GGTCCTCAGC GATCAGTTGA TGTAATAGTT 
1901 TCATCTGCTT TCTTATTGAC TATCTCAGTT GTATTTATCT GTTGTGCCCA 
1951 GATAAACCTC TACTTGAAAA TGGAGAAAAA ACCTAACAAA AAGGAGGAAC 
2001 TGACACTAGT GAATAATGTT TTAAAACTGG CTACTAAACT GCTAAAGGAG 
2051 TTGGACAGTC CTTTTAGATT ATATGGGCTT ACAATGAATC CGCTGCTTTA 
2101 TAACATCACC CAGGTTGTTA TCCTGTCAGC TGTTTCTGGT GTTATCAGTG 
2151 ACTTGCTTGG ATTTAATTTA AAGCTATGGA AGATTAAGTC ATGACAATTC 
2201 AAAGAAAAGA AGATGTAGCC TCTTTTCCAG AATAAGAGTA CTGACTAAGC 
2251 TGCCTGAAAG CTTGTCACTG ATTCTTTGCT TCAGGAGTCT CAGCTAGGGA 
2301 GTTGAAGTGT TTACATCAGA CTGTCTTGTG CAATTCTTAT ATTTATTTTA 
2351 CTGGTTCACT TTTTTTTACA TTTATTTTAG TCTTTATATT TTTATTTTTA 
24 01 AGCATTGATG TACTTAGTTG TTGAAAGGGT GAT G AAAC T G ATATCCAGAT 
2451 ACTTGAGATC CTGGTAATTG GTCATAAATA ATTGGCAAAA TAACAAATTG 
2 501 T G AAAAT AG A AGCCATTGCT CAGCACCGTT TCTCCATCAA TGCCGTGAAC 
2551 TTGCCTTACT TGAGGAAAAA TTCTTTAACT TTGGAATATT GCATTGAACT 
2 601 CAGCTATACA CATAAAACAT TTTCTTTGGT AAATCAAGAT CCAGTCAGGG 
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2 651 TTTCTCTTGA ATTATTTTGG AACAATGCCA GGATCCAAAC TGATTAAGTT 

2701 ACAGTTTAAG CACCCTTCAG TATTAATATA TACGGTATTA TATAACAGGT 

2751 CAACAAGTGC TCTTTGATGA TAAAACTTGT AATAGAGCAA TAATTGTAAA 

2801 TGGTTACCAT ACTGTAAGAT ATTTTGATAA AAATTAACTA GTAATACTTG 

2851 TATTTATTTG AAACACTGGG CTGTTTGCAC AGCTCCAACT GTGCATGCTC 

2901 AAAATGTGCA CTTTTTAAAA TTGTTACTTT TAATGCGTAT CTTTATATGG 

2 951 GATCTGTTAT AGTATACTAG GGCATGATAT GGTATCCTTT TGAGTGAGGT 

3001 ATATACTCAT CTCACAAGTG AAGTGCCTAC TGATATTACT AAAGTACATT 

3051 ATGTTTACTC AAGTAAATAA TTTTCTCCCC ATGGTACACT CTAGTGTAGG 

3101 CTATTCATAC CACACTGAAA TGAACAACTG AAGAATAAGG CTAAGAACCA 

3151 ATAAAATATT TCTCTAATTG CTAGTTGTAA AACTGTATCC AAATTTTCAG 

3201 AAAAGACAGC TTCAGCTTGC AAATTCTATC CTCTAAACTT ATCTGGTGCA 

3251 TTCTCCCCAC CCCACCCCCA TTATATAAGG GCTATTTTAG ATGCTTTTAA 

3301 CCTCCCCAAC AAATAATTTG CCAAGTGTCC AATGAGAACT TATCATGTTG 

3351 GTGTGTTAGG TAAATCGGGC AAATATGATA GTGTCTTACA TTGGGCCTTG 

34 01 ATTTTAAGTT GTTATATTTG TACAATCGAG TATTTTAGAA ATT AC AT G AA 

34 51 ACATGAAACA GTTTTTGCAA TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 

3501 AAATTTATTT GAAACAATCT AGAATTTTCT TGGTGCAAAG TGTATCATGT 

3551 GGAATATCCT CATATTTTTA CCATATTTTA AGAACTTTAA GACGATTAAT 

3601 TGTAAATAAT TTATTTGATT GGTGCAGTTC TAATCCCTAA ATCATAATCT 

3651 TAAAATCAGG AATGTGTGGA GAACAGAGCC ATGTCATATC ACTTTGCTCT 

37 01 TACCATTCCT TTTGATCAGC CTCAATTCAG CCTCATTGTG TAGTATGTTT 

3751 TTTCTTTCTA TGAAAAACAA CAGAAAGCAT TTCATTTTAT TTGCCTATGT 

3801 TCAAATATGT TTAATAATGA CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 

3851 TGTAATACTG GAGCTTTAAG AACATACTTA GTTTCTCATG TGAAAACTTA 

3901 GGCTTTGTCT GATGTTTTTC CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 

3951 TTTAGGAATT ATGTTTTATA AACTTTTTCA ATATAAGGTA CATGCCTATA 

4 001 CAGAACTTAA CATTTTGCAC AGAATATATC AAATATATTT TGAGAAAAAA 

4051 AGTACGGCAT GAGTTCTGTT AGGAATAAAA GATGAAACTA TTGTATCTCA 

4101 CAAAAAATCT TATTTCAGAA TGGAAATATT TTTGAGAAAA GTAGCTGAGT 

4151 ATACTGGTTT AAGAAAATGC TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 

42 01 GGAGTTGATT TATTAAGTAC AGTATACCTC TCAACAGTTT ATAAATAATA 

425.1 TGTTGAATTA TGTCAGTGTG GGCAGCAGTA GAATACTAAA AGGAAAATGT 

4 301 CATGTTAAGC AATTTCAGAA CATTAACTGA ACTATTTTCA AAGCAGAAAA 

4 351 ATTGACATTG CTGCCTTTAA GAATACCATG AATGTAAGAA ATTGAAAGAA 

4 4 01 ATTGTAAAAT ATCACATAAT ATAGAAATGG CAGTTCAAAG AGAATTGTGG 

4 451 CAGATGTTGT GTGTGAACTG TTGTTTCTTT GCCACATGTG TTGTATTTGA 

4501 AAGTTTTACA GTAAGTTTAA AATAAAACAT TCTGTGACTG AAAAAAAAAA 
4551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 2191 bp; peptide length: 695 

Category: known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROME_C (375-381) 



1 MASKVTDAIV WYQKKIGAYD QQ1WEKSVEQ REIKGLRNKP KKTAHVKPDL 

51 IDVDLVRGSA FAKAKPESPW TSLTRKGIVR VVFFPFFFRW WLQVTSKVIF 

101 FWLLVLYLLQ VAAIVLFCST SSPHSIPLTE VIGPIWLMLL LGTVHCQIVS 

151 TRTPKPPLST GGKRRRKLRK AAHLEVHREG DGSSTTDNTQ EGAVQNHGTS 

201 TSHSVGTVFR DLWHAAFFLS GSKKAKNSID KSTETDNGYV SLDGKKTVKS 

251 GEDGIQNHEP OCETIRPEET AWNTGTLRNG PSKDTQRTIT NVSDEVSSEE 

301 GPETGYSLRR HVDRTSEGVL RNRKSHHYKK HYPNEDAPKS GTSCSSRCSS 

351 SRQDSESARP ESETEDVLWE DLLHCAECHS SCTSETDVEN HQINPCVKKE 

401 YRDDPFHQSH LPWLHSSHPG LEKISAIVWE GNDCKKADMS VLEISGMIMN 

451 RVNSHIPGIG YQIFGNAVSL ILGLTPFVFR LSQATDLEQL TAHSASELYV 

501 IAFGSNEDVI VLSMVIISFV VRVSLVWIFF FLLCVAERTY KQRLLFAKLF 

551 GHLTSARRAR KSEVPHFRLK KVQNIKMWLS LRSYLKRRGP QRSVDVIVSS 

601 AFLLTIS WF ICCAQINLYL KMEKKPNKKE ELTLVNNVLK LATKLLKELD 

651 SPFRLYGLTM NPLLYNITQV VILSAVSGVI SDLLGFNLKL WKIKS 

B LAS TP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_6dl6, frame 2 

PIR:S38170 SRP40 protein - yeast (Saccharomyces cerevisiae) , N = 1, 
Score = 100, P = 0.08 

TREMBL:AC004990_1 gene: "WUGSC : H_DJ1185I07 . 2" ; Homo sapiens PAC clone 

DJ1185I07 from 7qll.23-q21, complete sequence., N = 2, Score = 2693, P 
= 0 



>TREMBL:AC004990_1 gene: "WUGSC :H_DJ1185I07 . 2 "; Homo sapiens PAC clone 
DJ1185I07 from 7qll.23-q21, complete sequence. 
Length = 588 

HSPs: 



Score = 2693 (404.1 bits), Expect = 0.0e+00, Sum P(2) = 0.0e+00 
Identities = 510/515 (99%), Positives = 512/515 (99%) 



Query: 


35 


GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 


94 






GLRNKPKKTAHVKPDLI DVDLVRGSAFAKAKPESPWTSLTRKGI VRVVFFPFFFRWWLQV 




Sb j ct : 




GT.RNKPKKTAHVKPDT.T nvni.VRG^ AFAKAKPF.^PWTST.TRKGT VRVVFFPFFFRWWT.OV 


60 


Query: 


95 


TSKVIFFWLLVLYLLQVAAIVLFCSTSSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTP 


154 






TSKVI FFWLLVLYLLQVAAIVLFCSTSSPHSI PLTEVIGPIWLMLLLGTVHCQIVSTRTP 




Sb j ct * 


61 


T^KVT FFWT.T.VT.YT T.OVAA T VT.Fr^T^ ^ PH 1 ^ T PT.T WTf^PTWT.MT.TJ.GTVHrOT V^TRTP 

lOi\V XT C n JjlJ V Ll 1 Lilly V r\T\ X VllC O 1 oOl njl X J-l L CjV J. \J XT XvilJlJJ-Ji-Jj-lO± V Jl \g X V O i. r\ X JT 


120 


Query: 


155 


KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 


214 






KPPT.^TfifiKRRRKT RKAAHT,FVHRF{^nG^^TTnK'' T, r)FGAVnNHGT^T <: iH c iV'(^TVFRr)T.WH 




Sbjct: 


121 


KPPT.^TGGKRRRKT RKAAHLEVHREGDG^TTDNTOEGAVONHGTSTSH^VGTVFRDLWH 


180 


Query: 


215 


AAFFLSGSKKAKNSIDKSTETDNGY VSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 


274 






AAFFL^GSKKAKN^TDKSTETDNGYVSLDGKKTVKSGEDGIOMHEPOCETIRPEETAWMT 




Sbjct: 


181 


AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 


240 


Query : 


275 


GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 


334 






GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 




Sbjct: 


241 


GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 


300 


Query : 


335 


EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 


394 






EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 




Sbjct: 


301 


EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 


360 


Query : 


395 


PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAI VWEGNDCKKADMSVLEI SGMIMNRVNS 


454 






PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAIVWSGNDCKKADMSVLEISGMIMNRVNS 




Sbjct: 


361 


PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAIVWSGNDCKKADMSVLEISGMIMNRVNS 


420 


Query : 


455 


HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 


514 






HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 




Sbjct: 


421 


HIPGIGYQIFGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 


480 


Query: 


515 


VI ISFVVRVSLVWIFFFLLCVAERT YKQRLLFAKL 549 








VIISFVVRVSLVWIFFFLLCVAERTYKQ L+ K+ 




Sbjct: 


481 


VI ISFVVRVSLVWIFFFLLCVAERT YKQINLYLKM 515 




Score 


= 409 


(61.4 bits), Expect = 0.0e+00, Sum P(2> = 0.0e+00 




Identities = 


= 92/115 (80%), Positives = 98/115 (85%) 




Query: 


595 


DVIVSS AFLLTISVVFI CCA QINLYLKMEKKPNKKEELTLVNNVLK 


640 






DVIV S +F++ tS+V+I C A QINLYLKMEKKPNKKEELTLVNNVLK 




Sbjct: 


474 


DVI VLSMVII SFVVRVSLVWI FFFLLCVAERTYKQINLYLKMEKKPNKKEELTLVNNVLK 


533 


Query : 


641 


LATKLLKELDSPFRLYGLTMNPLLYNITQWILSAVSGVISDLLGFNLKLWKIKS 695 








LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 




Sbjct : 


534 


LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 588 





Pedant information for DKFZphtes3_6dl6, frame 2 



Report for DKFZphtes3_6dl6 . 2 



[LENGTH] 695 

[MW] 78466.68 

[pi] 9.30 

[HOMOL] TREMBL:AC004990_1 gene: "WUGSC : H_D Jl 185107.2"; Homo sapiens PAC clone DJ1185I07 

from 7qll.23-q21, complete sequence. 0.0 
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[PROSITE] CYTOCHROME_C 1 

[KW] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 5.32 % 

SEQ MASKVTDAIVWYQKKIGAYDQQIWEKSVEQREIKGLRNKPKKTAHVKPDLIDVDLVRGSA 

SEG 

PRD ccceeeeehhhhhhhcccchhhhhhhhhhhhhhhcccccccccccccccceeeeeeccch 

MEM 

SEQ FAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQVTSKVIFFWLLVLYLLQVAAIVLFCST 

SEG xxxxxxxxxxx 

PRD hhhhcccccccccccccceeeeecchhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeecc 

MEM MMMMM^[MMMM^11■1MMMM^1MMMMMMMMMMMM 

SEQ SSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTPKPPLSTGGKRRRKLRKAAHLEVHREG 

SEG xxxxxxxx 

PRD ccccccceeeeehhhhhhhhhhhhheeeeeeccccccccccchhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ DGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWHAAFFLSGSKKAKNSI DKSTETDNGYV 

SEG 

PRD cccccccccceeeeeeccccccccchhhhhhhhhhhhhhcccchhhhhcccccccccccc 

MEM 

SEQ SLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNTGTLRNGPSKDTQRTITNVSDEVSSEE 

SEG 

PRD cccccceeecccccccccccccccccccceeeeccccccccccccceeeecccccccocc 

MEM 

SEQ GPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPNEDAPKSGTSCSSRCSSSRQDSESARP 

SEG xxxxxxxxxxxxxxxxxx . . . 

PRD ccccceeeeeeccccccchhhhhhcccccccccccccccccccccccccccccccccccc 

MEM 

SEQ ESETEDVLWEDLLHCAECHSSCTSETDVENHQrKPCVKKEYRDDPFHQSHLPWLHSSHPG 

SEG 

PRD cccchhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccccccccccc 

MEM 

SEQ LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQIFGNAVSLILGLTPFVFR 

SEG 

PRD cccceeeeeecccccccceeeeehhhhhhhhhccccccccccccccccceeecccccchh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LSQATDLEQLTAHSASELYVIAFGSNEDVIVLSMVIISFVVRVSLVWIFFFLLCVAERTY 

SEG 

PRD hhhhhhhhhhhhcccceeeeeeeccccceeeehhhhhhhhcchhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ KQRLLFAKLFGHLTSARRARKSEVPHFRLKKVQNIKMWLSLRSYLKRRGPQRSVDVIVSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccceeeeeehhhhhhhhhhhhccccceeeeeeee 

MEM MMMMMMM 

SEQ AFLLTISVVFICCAQINL YLKMEKKPNKKEELTLVNNVLKLATKLLKELDSPFRLYGLTM 

SEG 

PRD eeeeeeeeeeeeeehhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccccceeeeccc 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ NPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 

SEG 

PRD cchhhhheeeeeeeeecchhhhhccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_6dl6 . 2 
PS0O190 375->381 CYTOCHROME C PDOC00169 



(No Pfam data available for DKFZphtes3_6dl6 , 2 ) 
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DKFZphtes3_72kll 



group: testes derived 

DKFZphtes3_72kll encodes a novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine zippers and a microbodies C-terminal targeting signal 
K-L) signature. This sequence is responsible for transport of proteins from free polysomes 
into the microbodies. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to S.pombe hypothetical repeat-containing protein 

complete cDNA, complete cds, 6 EST hits (3 from testis derived 
librarys) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1134 bp 

Poly A stretch at pos. 1124, polyadenylation signal at pos. 1088 



1 AACCTTTCAA GTGCCCCCTC CTTTCCTTAA AGTCTTTTAT AGGGGTCCCC 

51 TTCTTGGCCA TCTCCATCCT GTGAGTCAGG ACTGAAAGGG CACAGACAGG 

101 TCACTGCCAG CATTGTTGGG GCAAGCCTGC AAGCACGCAT CACTGGGGAT 

151 CTGACATGAC AATGGCCGCC TGCCCCCTCT GAGGGCTACA GGACTTACCC 

201 CAGTGGGAAG CAGCTAAGCA GGTCTGACCA GCCGACCTGG ACCTGGCCAA 

251 GGGTCCTGTC ATCCCTCATG GCCACCCCGC CATTCCGGCT GATAAGGAAG 

301 ATGTTTTCCT TCAAGGTGAG CAGATGGATG GGGCTTGCCT GCTTCCGGTC 

351 CCTGGCGGCA TCCTCTCCCA GTATTCGCCA GAAGAAACTA ATGCACAAGC 

401 TGCAGGAGGA AAAGGCTTTT CGCGAAGAGA TGAAAATTTT TCGTGAAAAA 

451 ATAGAGGACT TCAGGGAAGA GATGTGGACT TTCCGAGGCA AGATCCATGC 

501 TTTCCGGGGC CAGATCCTGG GTTTTTGGGA AGAGGAGAGA CCTTTCTGGG 

551 AAGAGGAGAA AACCTTCTGG AAAGAGGAAA AATCCTTCTG GGAAATGGAA 

601 AAGTCTTTCA GGGAGGAAGA GAAAACTTTC TGGAAAAAGT ACCGCACTTT 

651 CTGGAAGGAG GATAAGGCCT TCTGGAAAGA GGACAATGCC TTATGGGAAA 

701 GAGACCGGAA CCTTCTTCAG GAGGACAAGG CCCTGTGGGA GGAAGAAAAG 

751 GCCCTGTGGG TAGAGGAAAG AGCCCTCCTT GAGGGGGAGA AAGCCCTGTG 

801 GGAAGATAAA ACGTCCCTCT GGGAGGAAGA GAATGCCCTC TGGGAGGAAG 

851 AGAGGGCCTT CTGGATGGAG AACAATGGCC ACGTTGCCGG AGAGCAGATG 

901 CTCGAAGATG GGCCCCACAA CGCCAACAGA GGGCAGCGCT TGCTGGCCTT 

951 CTCCCGAGGC AGGGCGTAGC CAGCATGCAG GTGCAGGGCC CTGTGGTCCA 

1001 GACTCCCCTG GGTTGGGATT CAAGTCCAGG GTGAGCCCAT GTGCTGGAGA 

1051 AAATACACAC TCATTGGTCT CCTTGCTTTG AAAGATCCAA TAAAGTCCTG 

1101 AGGCAAGGTT TGGAAAACCA ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES_CTER (231-234) 
LEUCINE_ZIPPER (142-164) 
LEUCINE_ZIPPER (149-171) 
LEUCINE_ZIPPER (156-178) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (170-192) 
LEUCINE ZIPPER (170-192) 
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1 MATPPFRLIR KMFSFKVSRW MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 

51 FREEMKIFRE KIEDFREEMW TFRGKIHAFR GQILGFWEEE RPFWEEEKTF 

101 WKEEKSFWEM EKSFREEEKT FWKKYRTFWK EDKAFWKEDN ALWERDRNLL 

151 QEDKALWEEE KALWVEERAL LEGEKALWED KTSLWEEENA LWEEERAFWM 

201 ENNGHVAGEQ MLEDGPHNAN RGQRLLAFSR GRA 



BLASTP hits 



Entry SPCC330_4 from database TREMBLNEW: 

gene: "SPCC330 . 04c"; product: "hypothetical repeat-containing protein 
S.pombe chromosome III cosmid c330. 

Score = 149, P = 1.6e-08, identities = 55/187, positives = 88/187 

Entry A45973 from database PIR: 
trichohyalin - human 

Score = 147, P = 3.0e-07, identities = 57/194, positives = 94/194 



Alert BLASTP hits for DKFZphtes3_72kll , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_72kll, frame 1 



Report for DKFZphtes3_72kll . 1 



[LENGTH] 233 

[MW] 28752.65 

[pi] 5.70 

[PROSITE] LEUCINE_ZIPPER 5 

[PROSITE] MICROBODIES_CTER 1 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 15.45 4 



SEQ MATPPFRLIRKMFSFKVSRWMGLACFRSLAASSPSIRQKKLMHKLQEEKAFREEMKIFRE 

SEG 

PRD cccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ KIEDFREEMWTFRGKIHAFRGQILGFWEEERPFWEEEKTFWKEEKSFWEMEKSFREEEKT 

SEG xxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ FWKKYRTFWKEDKAFWKEDNALWERDRNLLQEDKALWEEEKALWVEERALLEGEKALWED 

SEG 

PRD hhhhcccccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTSLWEEENALWEEERAFWMENNGHVAGEQMLEOGPHNANRGQRLLAFSRGRA 

SEG . . . xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhccccchhhhhhcccccccccchhhhhhhhccc 



Prosite for DKFZphtes3_72kll . 1 



PS00005 


14->17 


PS00005 


35->38 


PS00005 


71->74 


PS00005 


113->116 


PS00006 


106->110 


PS00006 


113->117 


PS00006 


183->187 


PS00008 


81->87 


PS00342 


231->234 


PS00029 


142->164 


PS00029 


149->171 


PS00029 


156->178 


PS00029 


163->185 


PS00029 


170->192 



PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I T E 

PKC_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PH0SPHO_SITE 

MYRISTYL 

MICROBODIES_CTER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEOCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00299 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 



(Ho Pfam data available for DKFZphtes3_72kll . 1) 
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DKFZphtes3_72kl5 



group: cell structure and motility 

DKFZphtes3_72kl5 encodes a novel 188 amino acid protein with strong similarity to Rattus 
norvegicus actin-f ilament binding protein Frabin. 

FGDl-related F-actin-binding protein ( Farbin/FGDl) is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible for faciogenital dysplasia or Aarskog-Scott syndrome. 
Frabin binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in 
Swiss 3T3 cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase 
activation, as described for FGD1 . Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actin cytoskeleton . Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 



strong similarity to actin-f ilament binding protein Frabin 

2 EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1845 bp 

Poly A stretch at pas. 1835, polyadenylation signal at pos . 1816 



1 GTGATGGAGA GTGCTGTTAT GATAGATGAA TCTAGGAAAG CCTCTTTGGA 

, 51 GATGTGATAC CTGAACAGAA CCCCGAATGA TAAGAAGAAA TACCAGTGTT 

101 TTAGGAGAGA TTGTCCTAAG CAGAGAACAG CAGCTGCAAA GACCCCAAGA 

151 CACATACACT TGGTTATTAA GAATGGGAGC AGCAAGGAGT ATGGCAAGAA 
2 01 , CACAGTGAGT TTTCCCTTGA GTGTGTGAGG AAGCCCTCAG AGTTTGTGAC 

251 TGACTTGTAG AGGTTCTAGT GGAGGGGATC AGAGTGGAAA CAAAGAGACC 

301 AGTTAAAAAG GTATGGCAGC ATGAATAAAA AAGTTTTGAG AGTATTCATT 

351 ATGCCTTCCA AATAAAAAAC TCTTTGGTTC ATAATTTGTT CATAAATTAA 

401 GGACTGGCTA CACTGTACTA TTTAAAAATG TTAAGAAACA TCAATAAGTA 

451 AAAATGTTAG GAAGAGATGA TAAATACGTA AGTATTATAT CTAACTAAGT 

501 CTTTACTAAC TAGTCACATT ATTAAACAGT GCAAGGATCA AGAAAAGTTA 

551 AGCGTTGAAA AATAAATAAA TAAGTTATAA ATAAAATAAA CAGCCCAAGG 

601 AAATGTTCCA GTCCCCATAG GTAGACTCGG GGTCATCTTC TTTATTTAAA 
651- TCTTTATTTA AATGTGGATA GCATCCCAAG AGACTTGGGT CTACACTAAG 

7 01 AATATTCAAA TCCATGTTTC TGAAACCATC AGAGATAGAA AAAAAAAGTA 

7 51 GCGAATATCC CTTTTCAACT GGAATAAACT TGTCTTAATT CTAGAACTTT 
801 TCCATACCAA TGTTTTCATG CTTCCTTTGT ATTTTATCTT TTAGCTCATT 

8 51 ATCAAATTAT AGTGATTTGA AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 
901 CTAGAACCCC AGGAAGGCAT GGATTGACAA CCACACCTCA AC AAAAAC T C 
951 CTCTCCCAGC ACTTGCCACA GAGGCAGGGA AATGATACAG ATAAGACTCA 

1001 GGGTGCACAG ACTTGTGTGG CCAACGGTGT AATGGCAGCA CAAAACCAGA 

1051 TGGAATGTGA GGAGGAGAAA GCTGCCACTC TTAGCTCAGA TACTTCTATT 

1101 CAAGCTTCTG AACCCTTGCT TGATACGCAC ATAGTGAATG GAGAAAGAGA 

1151 TGAAACTGCC ACAGCTCCTG CATCACCCAC AACAGATAGC TGTGATGGAA 

1201 ATGCTTCTGA CAGTAGCTAC AGGACTCCAG GCATAGGCCC AGTGCTCCCC 

1251 CTAGAAGAAA GAGGGGCAGA AACAGAAACC AAGGTACAAG AGAGGGAAAA 

1301 TGGGGAAAGC CCTCTGGAAC TGGAGCAGCT GGACCAGCAC CATGAGATGA 

1351 AGGTAGAGCA TGAGACTAGC TCATGAGCAG GGAAAACCCT GCCTATTCGA 

1401 TTGTTGTCTT AAAACTCTTT ATTTATTGCA CCCCTGAAAT GTATGAATCA 

14 51 GATCACCCAC ACTGGCAGTT AAACGATTTT CAAGCTCTGG CTGCTGATTA 

1501 GCATTTCCCC TATGCTCTAA GCAGATATTT CACTTTTTCT TTTCATGTAG 

1551 TTTCTGTTAA TATCTCTGTT GTAATTTCAG GAGTCAGAAC AGTGTGGAAA 

1601 CTTTAATATA GGAAATCCAC AAATGTATTG TTTTTACATA GAAAGAAAAT 

1651 GTTCCTTGTT GCTCTAGATG TTGGTGCTGT ATCCCTAATA CTTACGGGCC 

1701 AAGCAAGAAG AAATTGTATA ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 

1751 TAAATAGGCC TGTAAGATGA ACTTGCCACT AGTAAATGTT ACTTTTAAGG 

1801 ACATGAATAT GGAAGTATTA AATTATTCAA CAGATAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98334590 : 

Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing cell shape 

and activating c-Jun N-terminal kinase. 



Peptide information for frame 3 



ORF from 810 bp to 1373 bp; peptide length: 188 
Category: similarity to known protein 
Classification: Cell structure/motil ity 

1 MFSCFLCILS FSSLSNYSDL KKESAVNLNA PRTPGRHGLT TTPQQKLLSQ 

51 HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT LSSDTSIQAS 

101 EPLLDTHIVN GERDETATAP A3PTTDSCDG NASDSSYRTP G1GPVLPLEE 

151 RGAETETKVQ ERENGESPLE LEQLDQHHEM KVEHETSS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72kl5, frame 3 

TREMBL: AF038388_1 product: "actin- filament binding protein Frabin"; 
Rattus norvegicus actin-f ilament binding protein Frabin mRNA, complete 
Cds., N = 1, Score - 428, P = 1.8e-39 



>TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; Rattus 
norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 
Length =7 66 

HSPs: 

Score - 428 (64.2 bits), Expect - 1.8e-39, P - 1.8e-39 
Identities = 90/174 (51%), Positives = 115/174 (66%) 

Query: 12 SSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDTDKTQGAQTCVA 71 

S LS+Y+D++K+S +NLN P+TP +HGLT4-T QKL S PQ+Q D+D+ QG C+A 
Sbjct: 31 SVLSSYTDVQKDSTMNLNIPQTPRQHGLTSTTPQKLPSHKSPQKQEKDSDQNQGQHGCLA 90 

Query: 72 NGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHI VNGERDETATAPASPTTDSCDGN 131 

NGV AAQ+QMECE EK A LS +T Q + D H++NG R+ET T AS T+S D N 

Sbjct: 91 NGVAAAQSQMECETEKEAALSPETDTQTAAASPDAHVLNGVRNETTTDSASSVTNSHDEN 150 

Query: 132 ASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEMKVEHE 185 

A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

Sbjct: 151 ACDSSCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 



Pedant information for DKFZphtes3_72kl5, frame 3 



Report for DKFZphtes3_72kl5 . 3 



[LENGTH] 188 

[MW] 20388.32 

[pi] 4.62 

[HOMOL] TREMBL:AF038388_1 product: "actin-f ilament binding protein Frabin"; 

norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 2e-38 

[KW] All_Alpha 

[KW] SIGNAL_PEPTIDE 16 

[KW] LOW COMPLEXITY 12.77 % 



SEQ MFSCFLCILSFSSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

SEG . xxxxxxxxxxxxxx 

PRD ccchhhhhcccccccccccccccccccccccccccccccccccchhhhhhhccccccccc 

SEQ DKTQGAQTCVANGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAP 
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SEG xxxxx 

PRD ccccccceeecchhhhhhhhhhhhhhhhhhhccccceeecccccceeeeecccccccccc 

SEQ ASPTTDSCDGNASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEQ KVEHETSS 

SEG 

PRD hhhhhccc 

(No Prosite data available for DKFZphtes3_T2kl5 . 3) 
(No Pfam data available for DKFZphtes3_72kl5.3) 
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DKF2phtes3_72pl6 



group: intracellular transport and trafficing 

DKFZphtes3 72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mus 
musculus maternal-embryonic 3 (Mem3) gene. 

Mem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
preimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Mem3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar carboxypeptidase Y (CPY) , proteinase A (PrA) , 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments . 



strong similarity to mouse MEM 3 and yeast VPS35 
Sequenced by DKFZ 
Locus: /map=' , 16pl3.3" 
Insert length: 2707 bp 

Poly A stretch at pos. 2 697, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
901 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



CTACGCGCGG 
CCTACAACAC 
AGCCATACAG 
AC AAA A AC A A 
GAACTCCGGA 
GGCCATTTCT 
TTGCTAAAGG 
GGAAACATTA 
TGTCAAGTCA 
AAATGTGCCG 
TACCTTCTTC 
AGATGAAGAA 
TCAACTTTGC 
CATAGCCGAG 
TTTAGTGGGA 
TGGAACGTTA 
AACTGTAGGG 
GGTTTTCCCT 
CCTGTGCTGA 
TTAATTGATA 
CCCAGCGGAT 
TGATACAGTC 
GTCTCTCTGA 
TGTTGATAAA 
TIGAACATAT 
TTGAAAATAC 
AAAACATTTT 
GCATGAGTTG 
GTCTCTCAAG 
TCAAGATCAG 
CTGATGAGCA 
GACCCTGACC 
AGCTGGTGGA 
CAGCTTACCA 
AAATGGGAAA 
CAGTGCTTTG 
TTCAAGGAGC 
GTCGCATATG 
CAGCGATTCC 
TTGAAAGGAT 
CAGTGTGCCC 
AGCTGTGAGC 
AAAATGGGGA 
AAAGCTCTAA 
GCTTTTTATA 
ATGATGCGGT 
GAAGACCTCC 
ACATTTTCAT 
AATCCGAGGG 
TCACCATACT 
TTTCCCTTCC 



GGCGGGTGCT 
AGCAGTCCCC 
GCTGTGAAGG 
GCTTATGGAT 
CTTCTATGTT 
GATGAACTGC 
AAGGAAAGTG 
TCCCAAGGCT 
TTTCCTCAGT 
TGGTGTGCAA 
AGTGTACCAG 
ACAACTGGTG 
AGAAATGAAC 
ATAGAGAAAA 
ACAAATTTGG 
CAAACAGATT 
ATGCTTTGGC 
GATGAATTTC 
GTTACACCAG 
GATTAGCTTT 
ATTAAACTTT 
TAGACAAGAC 
TTAATCTTGC 
GTTCTAGAAA 
TGCTACCAGT 
CAGTTGACAC 
CACCCACTCT 
TTATGTGCTT 
ACCAGGTGGA 
CCAGATCAAC 
GAGCCTTGTG 
AGCAGTACTT 
AATCAGCGGA 
GCTGGCTTTT 
AGAAATGCCA 
ATCAAAGCAG 
ACTAGCTGCT 
AATTCATGTC 
AAAGCACAGC 
GAAGTGCTTC 
TTGCTGCATC 
ACCTGTGCAC 
GGAGCTTCAC 
AAATAGCAAA 
GAAATTCTGA 
AACAATTCAG 
CGAATCTTGA 
AACACACTGG 
GCCAATTTAT 
CCTTTCCATG 
ATAGATTGTG 



GCTTGCTGCA 
TCAGGATGAG 
TCCAGTCATT 
TCTCTAAAAC 
ATCACCAAAG 
ACTACTTGGA 
GCAGATCTCT 
TTACCTTTTG 
CCAGGAAGGA 
CATCCCTTGA 
AAATATCTTA 
ACATCAGTGA 
AAGCTCTGGG 
AAGAGAACGA 
TGCGCCTCAG 
GTTTTGACTG 
TCAAGAATAT 
ACCTCCAGAC 
AATGTAAATG 
ATTTGCTCAC 
TTGATATATT 
ATGCCTTCAG 
CATGAAATGT 
CAACAGTGGA 
AGTGCAGTTT 
TTACAACAAT 
TTGAGTACTT 
AGTAATGTTC 
TTCCATAATG 
CTGTAGAAGA 
GGCCGCTTCA 
GATTTTGAAC 
TTCGCTTCAC 
CGATATAAAG 
GAAGATTTTT 
AGCTGGCAGA 
GGGGAAATTG 
CCAGGCATTT 
TAGCTGCCAT 
AGTGAAGAGA 
CAAACTTCTA 
ATCTCTTCTG 
GGAGGCAAGA 
TCAGTGCATG 
ACAGATATAT 
GTTTTAAACC 
ATCCAGTGAA 
AGCATTTGCG 
GAAGGTCTCA 
TACATCCAGT 
CCTTTCAGAA 



GGCTCTGGGG 
CAGGAAAAGC 
CCAAATGAAG 
ATGCTTCTAA 
AGTTACTATG 
GGTCTACCTG 
ACGAACTTGT 
ATCACAGTTG 
TATTTTGAAA 
GGGGTCTGTT 
CCTGATGAAG 
TTCCATGGAT 
TGCGAATGCA 
GAAAGACAAG 
TCAGTTGGAA 
GCATATTGGA 
CTCATGGAGT 
TTTGAATCCT 
TGAAGAACAT 
CGTGAAGATG 
TTCACAGCAG 
AGGATGTTGT 
TACCCTGATC 
GATATTCAAT 
CAAAGGAACT 
ATTTTAACAG 
TGACTACGAG 
TGGATTATAA 
AATTTGGTAT 
CCCTGATCCA 
TTCATCTGCT 
ACAGCACGAA 
ACTGCCACCT 
AGAATTCTAA 
TCATTTGCCC 
ATTGCCCTTA 
GTTTTGAAAA 
TCTCTGTATG 
CACCTTGATC 
ATCATGAACC 
AAGAAACCTG 
GTCTGGCAGA 
GGGTAATGGA 
GACCCCTCTC 
CTATTTTTAT 
AGCTTATCCA 
GAAACAGAGC 
CTTGCGGCGG 
TCCTTTAAAA 
GAGGGTTTTA 
ATGCTGAGGT 



AGTCGCCATG 
TCTTGGATGA 
AGATGCCTGG 
TATGCTTGGT 
AACTTTATAT 
ACAGATGAGT 
ACAGTATGCT 
GAGTTGTATA 
GATTTGGTAG 
TCTTCGAAAT 
GAGAGCCAAC 
TTTGTACTGC 
GCATCAGGGA 
AACTGAGAAT 
GGTGTAAATG 
GCAAGTTGTA 
GTATTATTCA 
TTTCTTCGGG 
AATCATTGCT 
GACCTGGAAT 
GTGGCTACAG 
ATCTTTACAA 
GTGTGGACTA 
AAGCTCAACC 
CACCAGACTT 
TCTTGAAATT 
TCCAGAAAGA 
CACAGAAATT 
CCACGTTGAT 
GAAGATTTTG 
GCGCTCTGAG 
AACATTTTGG 
TTGGTATTTG 
AGTGGAT G AC 
ACCAGACTAT 
AGACTTTTTC 
TCATGAGACA 
AAGATGAAAT 
ATTGGCACTT 
TCTGAGGACT 
ATCAGGGCCG 
AACACGGACA 
GTGCCTAAAA 
TACAAGTGCA 
GAAAAGGAAA 
AAAGATTCGA 
AGATTAACAA 
GAATCACCAG 
AGGAAATAGC 
TTACGCTAGG 
AGGTTTCCCA 
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2551 TTTCTTACCT GTGATGTGTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 
2601 GGACCTTAAT AAAATTATTC ACTTGGTAAG TGTTCAAGTC TTTCTGATCA 
2651 CCCCAAGTAG CATGACTGAT CTGCAATTTA AAATTCCTGT GATCTGTAAA 
2701 AAAAAAA 



BLAST Results 



Entry AC007225 from database EMBLNEW: 

Homo sapiens chromosome 16 clone 480G7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 

Score = 1081, P = 2.8e-217, identities = 219/221 
13 exons 

Entry HS015146 from database EMBL: 
human STS WI-8848. 
Score = 2033, P = 2.9e-87, identities = 425/436 



Medline entries 



96327632: 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mem3. 

97258867: 

Endosome to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, requires the function of the 
VPS29, VPS30, and VPS35 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant missorts and 
secretes only a subset of vacuolar hydrolases. 

10198044 : 

Distinct Domains within Vps35p Mediate the Retrieval of Two Different 
Cargo Proteins from the Yeast 

Pre vacuolar /Endosomal Compartment 



Peptide information for frame 3 



ORF from 48 bp to 2435 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 



1 MPTTQQSPQD EQEKLLDEAI QAVKVQSFQM KRCLDKNKLM DSLKHASNML 

51 GELRTSMLSP KSYYELYMAI SDELHYLEVY LTDEFAKGRK VADLYELVQY 

101 AGNI I PRLYL LITVGVVYVK SFPQSRKDIL KDLVEMCRGV QHPLRGLFLR 

151 NYLLQCTRNI LPDEGEPTDE ETTGDISDSM DFVLLNFAEM NKLWVRMQHQ 

201 GHSRDREKRE RERQELRILV GTNLVRLSQL EGVNVERYKQ IVLTGILEQV 

251 VNCRDALAQE YLMECIIQVF PDEFHLQTLN PFLRACAELH QNVNVKNIII 

301 ALIDRLALFA HREDGPGIPA DIKLFDIFSQ QVATVIQSRQ DMPSEDVVSL 

351 QVSLINLAMK CYPDRVDYVD KVLETTVEIF NKLNLEHIAT SSAVSKELTR 

401 LLKI PVDTYN NILTVLKLKH FHPLFEYFDY ESRKSMSCYV LSNVLDYNTE 

451 IVSQDQVDSI MNLVSTLIQD QPDQPVEDPD PEDFADEQSL VGRFIHLLRS 

501 EDPDQQYLIL NTARKHFGAG GNQRIRFTLP PLVFAAYQLA FRYKENSKVD 

551 DKWEKKCQKI FSFAHQTISA LIKAELAELP LRLFLQGALA AGEIGFENHE 

601 TVAYEFMSQA FSLYEDEISD SKAQLAAITL IIGTFERMKC FSEENHEPLR 

651 TQCALAASKL LKKPDQGRAV STCAHLFWSG RNTDKNGEEL HGGKRVMECL 

701 KKALKIANQC MDPSLQVQLF IEILNRYIYF YEKENDAVTI QVLNQLIQKI 

751 REDLPNLESS EETEQINKHF HNTLEHLRLR RESPESEGPI YEGLIL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72pl6, frame 3 

TREMBL : AF02 45 04_3 gene: "A_TM017A05 . 7" ; Arabidopsis thaliana BAC 
TM017A05., N = 2, Score - 927, P = 1.9e-162 
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PIR:S56936 vacuolar protein-sorting protein VPS35 - yeast 
(Saccharomyces cerevisiae), N = 3, Score = 826, P = 1.5e-116 

TREMBL:MM47024_1 gene: "Mem3"; product: "MEM3"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds., N = 1, Score = 3376, P 
= 0 

TREMBL:S42186_1 gene: "VPS35"; product: "Vps35p"; VPS35=vacuolar 
protein sorting [Saccharomyces cerevisiae=yeast, Genomic, 3790 nt] , N = 
3, Score = 813, P = 4.4e-115 

>TREMBL:MM4702 4_1 gene: "Mem3"; product: "MEM3"; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds. 
Length = 754 

HSPs : 

Score = 337S (506.5 bits). Expect = 0.0e+00, P = 0.0e+00 
Identities = 666/721 (92%), Positives = 682/721 (94%) 

Query: 78 EVYLTDEFAKGRKVADLYELVQYAGNI I PRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 137 

+VYLTDEFAKG ++ADLYELVQY+GNI I PRLYLLITVGVVYVKSFPQSRKDI LKDLVEMC 
Sbjct: 34 KVYLTDEFAKGERLADLYELVQYSGNI I PRLYLLITVGVVYVKSFPQSRKDI LKDLVEMC 93 

Query: 138 RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLNFAEMNKLWVRM 197 

RGVQHPLRGLFLRN YLLQCTRNI LPDEGEPTDEETTGDI S DSMDFVLLN FAEMNKLWVRM 
SbjCt: 94 RGVQHPLRGLFLRN YLLQCTRN I LPDEGEPTDEETTGDIS DSMDFVLLN FAEMNKLWVRM 153 

Query: 198 QHQGHSRDREKRERERQELRILVGTNLVRLSQLEG-VNVERYKQIVLTGILEQVVNCRDA 256 

QHQGHSRDREKRERERQELRILVGTNLV L+ + +QI VLTGILEQVVNCRDA 

Sbjct: 154 QHQGHSRDREKRERERQELRILVGTNLVALTLVSWRCKCGTLQQIVLTGILEQVVNCRDA 213 

Query: 257 LAQEYLMECI IQVFPDEFHLQTLNPFLRACAELHQNVNVKNI IIALIDRLALFAHREDGP 316 

LAQE MECIIQVFPDEFHLQTLMPFLRACAELHQNVNVKNIIIALIDRLALFAHRE P 
SbjCt: 214 LAQEI SMECI IQVFPDEFHLQTLNPFLRACAE1KQNVNVKNI I IALI DRLALFAHREMEP 273 

Query: 317 GIPADIKLFDIFSQQVATVIQSRQDMPSEDWSLQVSLINLAMKCYPDRVDYVDKVLETT 375 

GIPA++KLFDIFSQQVATVIQSR+DMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 
Sbjct: 274 GIPAELKLFDIFSQQVATVIQSRRDMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 333 

Query: 377 VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESR — K 434 

VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYES K 
Sbjct: 334 VEIFNKLNLEHIATSSAVSKELTRLLKIPVDTYNNILTVLKLKHFHPLFEYFDYESSPGK 393 

Query: 435 SMSCYVLSNVLDYNTEI VSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 494 

SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 
SbjCt: 394 SMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 453 

Query: 4 95 IHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKVDDKWE 554 

IHLLRS + DPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSK + 
Sbjct: 454 IHLLRSDDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKWMTSGK 513 

Query: 555 KKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 614 

+ ++ F HQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 
Sbjct: 514 RNARRYFHLPHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 573 

Query: 615 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKLLKKPDQGRAVSTCA 674 

EDEISDSKAQLAAITLIIGTFF.RMKCFSEENHF.PLRT+CALAASKLLKKPDQ C 
Sbjct: 574 EDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTECALAASKLLKKPDQAEREHMCT 633 

Query: 675 HLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 734 

L WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 
SbjCt: 634 SL-WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYIYFYEKE 692 

Query: 735 NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLRRESPESEGPIYEGL 794 

NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLR RRESPESEGPIYEGL 
Sbjct: 693 NDAVT I QVLNQL I QKIREDLPNLESSEETEQINKHFHNTLEHLRT RRESPESEGPIYEGL 752 

Query: 795 IL 796 
IL 

Sbjct: 753 IL 754 

Pedant information for DKFZphtes3_72pl6, frame 3 
Report for DKFZphtes3_72pl6 . 3 

[LENGTH] 796 
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[ MW ] 


91723 67 
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[FUNCAT] 


08. 01 vesicular transport (golgi network, etc.) [S. cerevisiae, YJL154c] 


le-110 




[FUNCAT] 


30.08 organization of golgi [S. cerevisiae, YJL154c] le-110 


[ FUNCAT] 


09. 07 biogenesis of endoplasmatic reticulum [S. cerevisiae, YJL154c] le-110 


[BLOCKS] 


BL01092Q 


[PIRKW] 


yeast vacuole le-108 


[PIRKW] 


membrane protein le-108 


[KW] 


TRANSMEMBRANE 1 


[KW] 


LOW COMPLEXITY 5.40 % 



SEQ MPTTQQSPQDEQEKLLDEAIQAVKVQSFQMKRCLDKNKLMDSLKHASNMLGELRTSMLSP 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ KS YYELYMAISDELHYLEVYLTDEFAKGRKVADLYELVQYAGNII PRLYLLITVGWYVK 

SEG 

PRD cceeeeehhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceeeeeceeeee 

MEM MMMMMMMMMMMMMM 

SEQ SFPQSRKDILKDLVEMCRGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSM 

SEG xxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccccccccccccccch 

MEM MMMMMMMMMM '. 

SEQ DFVLLNFAEMNKLWVRMQHQGHSRDREKRERERQELRILVGTNLVRLSQLEGVNVERYKQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhccchhhhhh 

MEM 

SEQ IVLTGILEQVVNCRDALAQEYLMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNIII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

MEM 

SEQ ALIDRLALFAHREDGPGIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMK 

SEG 

PRD hhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhh 

MEM 

SEQ CYPDRVDYVDKVLETTVEI FNKLNLEHI ATSSAVSKELTRLLKIPVDTYNNILTVLKLKH 

SEG 

PRD cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

MEM 



SEQ FHPLFEYFDYESRKSMSCYVLSNVLDYNTEIVSQDQVDSIMNLVSTLIQDQPDQPVEDPD 

SEG xxxxxxxxxxxx 

PRD hhhheeecccchhhhhhhhhhhhccccceeehhhhhhhhhhhhhhhhhhccccccccccc 

MEM 

SEQ PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLA 

SEG xxx 

PRD ccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhcccccceeeeeccchhhhhhhhh 

MEM 

SEQ FRYKENSKVDDKWEKKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHE 

SEG 

PRD hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 



MEM 

SEQ TVAYEFMSQAFSLYEDEISDSKAQLAAITLIIGTFERMKCFSEENHEPLRTQCALAASKL 

SEG 

PRD eeeeehhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

MEM 

SEQ LKKPDQGRAVSTCAHLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLF 

SEG 

PRD hhcccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 

MEM 

SEQ IEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLR 
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SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RESPESEGPI YEGLIL 

SEG 

PRD hhcccccccceeeccc 

MEM 



(No Prosite data available for DKFZphtes3_72pl6 . 3) 
(No Pfara data available for DKFZphtes3_72pl6 . 3) 
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DKFZphtes3_7b22 



group: cell structure and motility 

DKFZphtes3_7b22 encodes a novel 443 amino acid protein with weak similarity to paramyosins. 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni . 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 



similarity to paramyosins 



complete cDNA, complete cds, few EST hits 



Sequenced by BMFZ 

Locus: /map="3" 

Insert length: 2291 bp 

Poly A stretch at pos. 2241, polyadenylation signal at pos . 2213 



1 GGAAGAAAGG CTAGCGGGCG TTGGCCGTAT GTGGGTGTCT TGAGGCAGTT 

51 TTTCAGTTCT TTCATTTACC AAAGTGACAT GCACCTACTA GGTGCCAGGT 

101 GTTTAGACGT ACATACAACC CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 

151 TATGAAAAGT TTCCAGCCAA GAATTGCCAC TGCACCTGAG ATAAGGGGGA 

201 TCCTGGCCAT TAAGGAAACC TTGCCTTCGA AACTGAGCCG TGAGGAACTA 

251 TACAAAATGG GAAATTGGGA CAAATCCCAG TGGCTCATGA CACTAAGAAG 

301 TAAAATTACG AACTCACTGA GCTGGAAGTC ATTCAACGGG AATTGAATAG 

351 GTAACTGCAC TTTTGTGAGA TTATAAATAT ACCACGGAGG GTAACGAAGC 

401 TACAGAAGAA TGGAAGAAGA CAGCCTGGAA GACTCAAACC TTCCTCCAAA 

451 AGTTTGGCAT TCTGAGATGA CGGTGTCAGT GACAGGCGAA CCACCTAGTA 
501 CCGTAGAAGA AGAAGGAATA CCTAAAGAAA CAGACATAGA AATCATCCCA 

551 GAAATCCCGG AAACTCTAGA GCCACTGTCC CTTCCAGATG TGCTGAGGAT 

601 CTCGGCAGTT CTGGAGGACA CCACAGACCA GCTCTCTATT CTGAACTACA 

651 TCATGCCCGT TCAGTACGAA GGGAGACAGA GCATCTGCGT GAAAAGCAGA 

"7 01 GAAATGAATC TAGAAGGAAC GAATCTAGAC AAACTTCCAA TGGCCTCAAC 

7 51 AATCACAAAA ATACCCAGTC CGTTAATAAC TGAGGAAGGA CCCAACTTGC 

801 CAGAAATCAG ACACAGAGGC CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 

851 GATCTTGTCT TCAAAAAACC TAC AAGGC AG ACCATCATGA CTACGGAGAC 

901 ACTGAAGAAA ATTCAGATTG ATAGGCAGTT TTTCAGCGAT GTGATTGCAG 

951 ATACCATTAA GGAGTTGCAA GATTCGGCCA CTTACAACAG TCTCCTGCAA 

1001 GCTTTGAGCA AAGAGAGGGA AAACAAAATG CATTTCTATG ACATCATTGC 

1051 CAGGGAGGAA AAAGGAAGAA AACAGATAAT ATCACTTCAA AAACAGCTAA 

1101 TTAATGTCAA AAAGGAATGG CAATTTGAAG TCCAGAGTCA GAATGAGTAT 

1151 ATTGCTAACC TCAAGGACCA ACTGCAAGAG ATGAAGGCAA AATCCAACTT 

1201 GGAGAATCGC TACATGAAAA CCAATACCGA GCTGCAGATT GCCCAGACCC 

1251 AGAAAAAGTG TAACAGAACA GAGGAACTCT TGGTGGAAGA GATTGAGAAA 

1301 CTCAGGATGA AAACCGAAGA AGAGGCCCGG ACTCATACAG AGATTGAAAT 

1351 GTTCCTTAGA AAGGAGCAGC AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 

14 01 TGGAGAAATA CGATAAGGAC ACAGAAATGA AACAGAATGA ACTAAATGCT 

14 51 CTCAAAGCCA CAAAGGCCAG TGACTTAGCA CACCTTCAAG ACCTGGCAAA 

1501 GATGATAAGA GAGTATGAAC AGGTCATCAT TGAAGATCGT ATAGAAAAGG 

1551 AGAGGAGCAA GAAGAAGGTA AAACAGGATC TCTTGGAATT AAAGAGCGTT 

1601 ATAAAGCTCC AGGCCTGGTG GCGAGGCACT ATGATACGGA GAGAAATTGG 

1651 TGGTTTCAAG ATGCCTAAAG ACAAAGTTGA TAGCAAGGAT TCAAAAGGCA 

1701 AAGGTAAAGG CAAGGATAAG AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 

1751 CTTTTGTGTT TTCTGCTGGT ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 

1801 AAGAAACACC TGGTACCTCA AAGATGACTC ATCTACAGGT TGTTTCCTAT 

1851 TGAGACTTTC CCAGGGAAGC CTGATTTCAC TTTGCCTGTT AATTTCACTC 

1901 TGCCTGTTAG GTGGGTTTTC AAACCCTGAT TTAGGATTAC ACCATTGACT 

1951 TAGGGCTTCC TCATACCTTG CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 

2001 GATTCATTCT TCTTGCTCTT TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 

2051 AGCTACAGTG ACGCATTGAA ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 

2101 GATTTTACCC AATTTGTCTG TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 

2151 TCAGTAGGAA TTACAATATG ATGTTATTAG CTGTCCAGCA TAATATATAC 

2201 ACAGCAAAGT TTTAATAAAT GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 

2251 AAA AAA A AAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry G36731 from database EMBL: 
SHGC-52923 Human Homo sapiens STS cDNA . 
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Score = 2262, P = 1.3e-97, identities = 462/468 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 410 bp to 1738 bp; peptide length: 443 
Category: similarity to known protein 



1 MEEDSLEDSN LPPKVWHSEM 

51 ETLEPLSLPD VLRISAVLED 

101 LEGTNLDKLP MASTITKIPS 

151 FKKPTRQTIM TTETLKKIQI 

201 KERENKMHFY DIIAREEKGR 

251 LKDQLQEMKA KSNLENRYMK 

301 KTEEEARTHT EIEMFLRKEQ 

351 TKASDLAHLQ DLAKMIREYE 

401 QAWWRGTMIR REIGGFKMPK 



TVSVTGEPPS TVEEEGIPKE TDIEIIPEIP 
TTDQLSILNY IMPVQYEGRQ SICVKSREMN 
PLITEEGPNL PEIRHRGRFA VEFNKMQDLV 
DRQFFSDVIA DTIKELQDSA TYNSLLQALS 
KQIISLQKQL INVKKEWQFE VQSQNEYIAN 
TNTELQIAQT QKKCNRTEEL LVEEIEKLRM 
QKLEERLEFW MEKYDKDTEM KQNELNALKA 
QVIIEDRIEK ER5KKKVKOD LLELKSVIKL 
DKVDSKDSKG KGKGKDKRRG KKK 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_7b22 , frame 2 
SWISSPROT:MYSP_BRUMA PARAMYOSIN., N = 1, Score = 158, P = 5.8e-0B 



PIR:A44972 paramyosin - nematode (Dirofilaria immitis) (fragment), N = 
1, Score = 157, P = 7.1e-08 

SWISSPROT:MYSP_ONCVO PARAMYOSIN . , N - 1, Score = 157, P = 7.4e-08 

PIR:S52537 emm L 15 protein - Streptococcus pyogenes, N = 1, Score = 
151, P = 8.6e-08 



>SWISSPROT:MYSP_BRUMA PARAMYOSIN. 
Length = 880 

HSPs : 



Score 


= 158 


(23.7 bits), Expect = 5.8e-08, P = 5.8e-08 




Identities = 


= 66/259 (25%), Positives = 125/259 (48%) 




Query: 


142 


EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIADTIKELQDSATYNSLLQALSK 


201 






+ K + LK R TE K++ + +D +A + LQ A N LL+ + 




Sbjct: 


169 


QLKKDKHLAEKAAERFEAQTVELSNKVEDLNRHVND-LAQQRQRLQ--AENNDLLKEIHD 


225 


Query: 


202 


ER ENKMHF- YDI IAREEKGRKQI ISLQKQLINVKKEWQFEVQSQNEYIANLKDQLQE 


257 






++ +N H Y + + E+ R+++ +++ ++ + +VQ + + + D+ E 




Sbjct : 


226 


QKVQLDNLQHVKYQLAQQLEEARRRLEDAERERSQLQAQLH-QVQLELDSVRTALDE— E 


282 


Query : 


258 


MKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRMKT-EEEARTHTEIEMFL 


316 






A++ E++ NTE I Q + K + L EE+E LR K +++A +IE+ L 




Sbjct : 


283 


SAARAEAEHKLALANTE— ITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQIEIML 


340 


Query : 


317 


RKEQQ— KLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQDLAKMIREYEQVII 


374 






+K Q K + RL+ +E DEQN+L+K +LK+E+I 




Sbjct: 


341 


QKI SQLEKAKSRLQSEVEVLI VDLEKAQNTIAI LERAK EQLEKTVNELKVRI D 


393 


Query : 


375 


EDRIEKERSKKKVKQDLLELKSVIKL 400 








E +E E ++++ + L EL+ + L 




Sbjct: 


394 


ELTVELEAAQREARAALAELQKLKNL 419 




Score 


= 118 


(17.7 bits), Expect = 1.3e-03, P = 1.3e-03 




Identities = 


= 54/231 (23%), Positives = 108/231 (46%) 




Query: 


181 


DTIKELQDSATYNSLLQ ALSKERENKMHFYDIIAREEKG-RKQIISLQKQLINVKK 


235 






D +KE+ D LQ L4++ E + RE + Q+ +Q +L +V+ 




Sbjct: 


218 


DLLKEIHDQKVQLDNLQHVKYQLAQQLEEARRRLEDAERERSQLQAQLHQVQLELDSVRT 


277 
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Que r y : 


236 


EWQFE--VQSQNEY-IANLKDQLQEMKAKSNLENRYMKTNTE-LQIAQTQKKCNRTEELL 


291 






E +++ E4- +A ++ + K+K + E E L4- QK4- E++ 




Sbjct: 


278 


ALDEESAARAEAEHKLALANTEITQWKSKFDAEVALHHEEVEDLRKKMLQKQAEYEEQIE 


337 


Query : 


292 


VEEIEKLRMKTEEEARTHTEIEMF LRKEQQKLE--ERLEFWMEKI DKDTEMKQNELN 


346 






+ ++K+ 4- ++R +E+E+ L K Q + ER + +EK + +++ 4-EL 




Sbjct: 


338 


IM-LQKISQLEKAKSRLQSEVEVLIVDLEKAQNTIAILERAKEQLEKTVNELKVRIDELT 


396 


Query : 


347 


A-LKATKASDLAHLQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVI 398 








L+A + A L +L K+ YE+ 4- E 4- R KK++ DL E K + 




Sbjct: 


397 


VELEAAQREARAALAELQKLKNLYEKAV-EQKEALARENKKLQDDLHEAKEAL 4 48 




Score 


= 107 


{16.1 bits), Expect = 2.1e-02, P = 2.1e-02 




Identities = 


-- 49/279 (17%), Positives = 124/279 (44%) 




Query: 


123 


ITEEGPNLPEIRHRGRFAV-EFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIAD 


181 






T P T, 4- R A4- F K+4-4-T. K 4-4- 4- F KK+f) D + +AH 




Sbjct: 


392 


IDELTVELEAAQREARAALAELQKLKNLYEKAVEQKEALAREN-KKLQDDLHEAKEALAD 


450 


Query: 


182 


TIKELQDSATYNSLLQALSKERENKMHFYDIIAREEKGRKQ — I ISLQKQLINVKKEWQF 


239 






++L + N4- L +E 4- 4- + R4- 4- R Q + LQ+ I 4-4-+ Q 




Sbjct: 


451 


ANRKLHELDLENARLAGEIRELQTALKESEAARRDAENRAQRALAELQQLRIEMERRLQE 


510 


Query: 


240 


EVQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTE-ELLVEEIEKL 


298 






4- 4- N4-4- 4-4- 4- A L 4- 4- E4- 4- + 4- E E4- V4- 4- 4- 




Sbjct: 


511 


KEEEMEALRKNMQFEIDRLTAA — LADAEARMKAEISRLKKKYQAEI AELEMTVDNLNRA 


568 


Query: 


299 


RMKTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAH 


358 






4-4- 4-4- + 4-E L+ 4-4- 4-L4- 4-4-4-Y 4- Q 4-4-+AL A 4- 4- 




Sbjct: 


569 


NIEAQKTIKKQSEQLKILQASLEDTQRQLQQTLDQY ALAQRKV SALSA- ELEECKV 


623 


Query: 


359 


LQDLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQ 401 








DA R4- ++ 4-E4- + V +L 4-K4- 4-4- + 




Sbjct: 


62 4 


ALDNAIRARKQAEI DLEEANGRITDLVSVNNNLTAIKNKLETE 66 6 





Pedant information for DKFZphtes3_7b22, frame 2 



Report for DKFZphtes3_7b22 .2 



[LENGTH] 44 3 

[MW] 51917.95 

[pi] 6.18 

[HOMOLj PIR:S28589 trichohyalin - rabbit 2e-08 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-07 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [s. cerevisiae, 

7e 07 

[FUNCAT] 1 genome replication, transcription, recombination and repair 



YDL058W] 
[M. 



jannaschii, MJ1322] 5e-06 
[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 03.13 meiosis [S. cerevisiae, Y?R141c] le-05 

[FUNCAT] 11.01 stress response [S. cerevisiae, YPR141c] le-05 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPR141c] le-05 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YPR141C] le-05 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c] le-05 

30.05 organization of centrosome [S. cerevisiae, YPR141c] le-05 

06.10 assembly of protein complexes (S. cerevisiae, YPR141c] le-05 
99 unclassified proteins [S. cerevisiae, YOR216c] 3e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] 6e-05 

30.10 nuciear organization [S. cerevisiae, YKR095w] 6e-05 
30.02 organization of plasma membrane [S. cerevisiae, YER008c] le-04 

08.16 extracellular transport [S. cerevisiae, YER008c] le-04 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YER008c] 



[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
repair ) 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
le-04 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
palmitylation, 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 



30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] 2e-04 
08.01 nuclear transport [S. cerevisiae, YDL207w] 4e-04 

04.07 rna transport [S. cerevisiae, YDL207w] 4e-04 

06.07 protein modification (glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YKL201c] 5e-04 

3.6.1.32 Myosin ATPase 3e-08 
phosphotransferase 6e-06 
citrulline 8e-06 
tandem repeat le-07 
heart 6e-06 
polymorphism 4e-06 

serine/threonine-specific protein kinase 6e-06 
DNA binding 8e-08 
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[ PIRKW] 


mu s cl e con traction 1© — 07 






[PIRKW] 


ATP 3e-08 


[PIRKW] 


thick filament le-07 


[PIRKW] 


phosphoprot ein 3e — 08 


[PIRKW] 


glycoprotein 4 e — 0 6 


[PIRKW] 


skeletal muse 1 e 1 e ~ 0 7 


[ PIRKW] 


c 3 1 <~i urn binding 8 e — 0 6 


[PIRKW] 


alternative splicing 3e - 0 8 


[PIRKW] 


coiled coil 3e — 08 


f PTRKWl 


P-l r>nn 3p-0R 


[ PIRKW] 


hep tad repeat 4e — 06 


[PIRKW] 


methylated amino acid 3e — 08 


[ PIRKW] 


ba s emen t msmb r a ne 4 e — 0 6 


[ PIRKW] 


ca rdi ac mu scle 6e — 06 


[ PIRKW] 


extracellular matrix 4e — 06 


[PIRKW] 


hydrolase 3e — 08 


[PIRKW] 


membrane protein 4e~06 


[ PIRKW] 


EF hand 8e-06 


[ PIRKW] 


cytoskeleton 8e — 06 


r PTRKWl 


V]31 r ft P — 0 fi 
Halt O C V <J 


r qnppAMl 




r QTipraMi 

Lourr rtiM j 


unassigned Ser/Thr or Tyr — specific protein kinases 6e — 06 


r QTIPFIM 1 
[out r rti J J 


Laiiii'juuiiii .l t: j_i e; ci i_ iiuiuuj.uy y 0 c uu 


r cup ftam l 


iiLyuaiii niuLui uuuiaiii 11LJ111LJ _l vjy y — > c uo 


r c n p FAM 1 


trichohyalin 8e — 0 6 


[SUPFAM] 


protein kinase homology 6e-06 


[PROSITE] 


AMIDATION 2 


[PROSITE] 


CAMP PHOSPHO SITE 1 


[PROSITE] 


CK2 PHOSPHO SITE 12 


[PROSITE] 


TYR PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO SITE A 


[PROSITE] 


ASN GLYCOSYLATION 1 


[KW] 


All Alpha 


[KW] 


LOW COMPLEXITY 10.61 % 



SEQ MEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGIPKETDIEIIPEIPETLEPLSLPD 

SEG xxxxxxxxxxxxxxxxxxxxxxx . 

PRD cccccccocccccccccceeeeeccccccceeeeecccccceeeeeecccccccccccoc 

SEQ VLRISAVLEDTTDQLSILNYIMPVQYEGRQSICVKSREMNLEGTNLDKLPMASTITKIPS 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DTIKELQDS ATYNSLLQALSKERENKMHFYDIIAREEKGRKQT T SLQKQLINVKKEWQFE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VQSQNEYIANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELKALKATKASDLAHLQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQAWWRGTMIRREIGGFKMPK 

SEG x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ DKVDSKDSKGKGKGKDKRRGKKK 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccc 



Prosite for DKFZphtes3_7b22.2 



PS00001 


285- 


■>289 


ASN 


GL YCOSYLAT ION 


PDOC00001 


PS00004 


152- 


■>156 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


164- 


•>167 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


182- 


>185 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


280- 


•>283 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


383- 


■>386 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00006 




5->9 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


30->34 


CK2~ 


"PHOSPHO SITE 


PDOC00006 
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PSUUUU O 


4 1 


- >4 5 


CK2 


Dur\c nun 


SITE 




PSO 00 0 6 


5 7 


~ > Ol 


CK2 


rnUornU 


SITE 




PSUUUU b 


104- 


> 1 08 


CK2 


d unc nun 
FriUo frlU 


SITE 


d nfir nnnn s 

fUU<-UUUUD 


PS0000 6 


182- 


■>1 3 6 


CK2 


rnUorHU 


SITE 


Dnnrnnnnfi 

tr JJU<— UUUU D 


PS0000 6 


2 43- 


■>2 4 7 


CK2~ 


"PHOSPHO" 


SITE 


Drinp nn nn fi 


r bU UU U O 






CK2 


"PHOSPHO" 


SITE 


C \J\J\* \t\l \J\J \J 


PS00006 


271- 


>275 


CK2~ 


"PHOSPHO 


"SITE 


PDOC00006 


PS00006 


302- 


>306 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


308- 


>312 


CK2~ 


~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


310- 


>314 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00007 


261- 


>2S9 


tyr" 


"PHOSPHO 


"site 


PDOC00007 


PS00007 


184- 


•>193 


tyr~ 


"PHOSPHO" 


"site 


PDOC00007 


PS00009 


218- 


>222 


AMIDATION 




PDOC00009 


PS00009 


439- 


>443 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphtes3_7b22 . 2 ) 
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DKFZphtes3_7dl7 



group: testes derived 

DKFZphtes3_7dl7 encodes a novel 633 amino acid protein with weak similarity to human KIAA0454. 
Pfam predicts a TNFR/NGFR cysteine-rich region. 

No informative BLAST results; No predictive prosite or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to KIAA0454 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3608 bp 

Poly A stretch at pos . 3587, polyadenylation signal at pos. 3570 

1 GGGAAGTTAC GGCGAAGTCC ACCCAGCGTT TCTCAGGCAA TCTGAAGGCA 

51 AATCCTGTTT AGACCCAGGC GAAGGTTCCT GGTGACCCAG GCTCTCACCA 

101 GCCAATTGTC CCTTGCCGTC CTCCTGAGGG TATCTGGAGC TTCAGTGCTG 

151 TGTGCTCTTG GCCTCCACAC TGGGGATGCC ACTGACTCCC ACTGTCCAGG 

201 GCTTCCAGTG GACTCTCCGA GGCCCTGATG TAGAAACTTC CCCATTCGGT 

251 GCACCAAGAG CAGCCTCACA TGGTGTGGGC CGACATCAAG AGCTGCGAGA 

301 TCCAACAGTC CCTGGCCCCA CCTCTTCTGC CACAAACGTC AGCATGGTGG 

351 TATCTGCCGG CCCTTGGTCC GGTGAGAAGG CAGAGATGAA CATTCTAGAA 

401 ATCAACAAGA AATCGCGCCC CCAGCTGGCA GAGAACAAAC AGCAGTTCAG 

451 AAACCTCAAA CAGAAATGTC TTGTAACTCA AGTGGCCTAC TTCCTGGCCA 

501 ACCGGCAAAA TAATTACGAC TATGAAGACT GCAAAGACCT CATAAAATCT 

551 ATGCTGAGGG ATGAGCGGCT GCTCACAGAA GAGAAGCTTG CAGAGGAGCT 

601 CGGGCAAGCT GAGGAGCTCA GGCAATATAA AGTCCTGGTT CACTCTCAGG 

651 AACGAGAGCT GACCCAGTTA AGGGAGAAGT TACAGGAAGG GAGAGATGCC 

701 TCCCGCTCAT TGAATCAGCA TCTCCAGGCC CTCCTCACTC CGGATGAGCC 

751 GGACAACTCC CAGGGACGGG ACCTCCGAGA ACAGCTGGCT GAGGGATGTA 

801 GGCTGGCACA GCACCTCGTC CAAAAGCTCA GCCCAGAAAA TGATGACGAT 

851 GAGGATGAAG ATGTTAAAGT TGAGGAGGCT GAGAAAGTAC AGGAATTATA 

901 TGCCCCCAGG GAGGTGCAGA AGGCTGAAGA AAAGGAAGTC CCTGAGGACT 

951 CACTGGAGGA GTGTGCCATC ACTTGTTCAA ATAGCCACCA CCCTTGTGAG 

1001 TCCAACCAGC CTTACGGGAA CACCAGAATC ACATTTGAGG AAGACCAAGT 

1051 CGACTCAACT CTCATTGACT CATCCTCTCA TGATGAATGG TTGGATGCTG 

1101 TATGCATTAT CCCAGAAAAT GAAAGTGATC ATGAGCAAGA GGAAGAAAAA 

1151 GGGCCAGTGT CTCCCAGGAA TCTGCAGGAG TCTGAAGAGG AGGAAGCCCC 

1201 CCAGGAGTCC TGGGATGAAG GTGATTGGAC TCTCTCAATT CCTCCTGACA 

1251 TGTCTGCCTC ATACCAGTCT GACAGGAGCA CCTTTCACTC AGTAGAGGAA 

1301 CAGCAAGTCG GCTTGGCTCT TGACATAGGC AGACATTGGT GTGATCAAGT 

1351 GAAAAAGGAG GACCAAGAGG CCACAAGTCC CAGGCTCAGC AGGGAGCTGC 

1401 TGGATGAGAA AGAGCCTGAA GTCTTGCAGG ACTCACTCGA TAGATTTTAT 

1451 TCAACTCCTT TTGAGTACCT GGAACTGCCT GACTTATGCC AGCCCTACAG 

1501 AAGTGACTTT TACTCATTGC AGGAACAACA CCTTGGCTTG GCTCTTGACT 

1551 TGGACAGAAT GAAAAAGGAC CAAGAAGAGG AAGAAGACCA AGGCCCACCA 

1601 TGCCCCAGGC TCAGCAGAGA GCTGCCGGAG GTAGTAGAGC CTGAGGACTT 

1651 GCAGGACTCA CTGGATAGAT GGTATTCGAC TCCTTTCAGT TATCCAGAAC 

1701 TGCCTGATTC ATGCCAGCCC TACGGAAGTT GCTTTTACTC ATTGGAGGAA 

1751 GAACACGTTG GCTTTTCTCT TGACGTGGAT GAAATTGAAA AGTACCAAGA 

1801 AGGGGAAGAA GATCAAAAGC CACCATGCCC CAGGCTCAAC GAGGTGCTGA 

1851 TGGAAGCAGA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATGTTAT 

1901 TCGACTACTT CAACTTACTT TCAACTACAT GCCTCATTCC AGCAGTACAG 

1951 AAGTGCCTTT TACTCATTTG AGGAACAGGA CGTCAGCTTG GCCCTTGACG 

2001 TGGACAATAG GTTTTTTACT TTGACAGTGA TAAGGCACCA CCTGGCCTTC 

2051 CAGATGGGAG TCATATTCCC ACACTAAGCA GCCCTTACTA AGCTGAGAGA 

2101 TGTCATTGCT GCAGGCAGGA CCTATAGGCA CATGTAGGTT TGAATGAAAC 

2151 TGTAGTTCCC TTTGGAAGCC CAGTCATAGG ATGGGAAAGT GGGCATGGCT 

2201 CTATTCCTAT TCTCAGACCA TGCCAGTGGC CACCTGTGCT CAGTCTGAAG 

2251 ACGTTGGACC CAAGTTAGGT GTGACACGTT CACACGACTA TGTAGCACAT 

2301 GCCGGGAGTG ATCTGCCAGA CATTCTAATT TGAACCAGAT ATCTCTGGGT 

2351 AGCTACAAAG TTCCTCAGGG GTTTCATTTT GCAGGCATGT CTCTGAGCTT 

2401 CTATACCTGC TCAAGGTCAG TGTCATCTTT GTGTTTAGCT CATCCAAAGG 

2451 TGTTACCCTG GTTTCATTGA ACCTAACCCC ATTCTTTGTA TCTTCAGTGT 

2501 TGGTTTGTTT TAGCTGATCC ATCTGTAACA CAGGAGGGAT CCTTGGCTGA 

2 551 GGATTGTATT TCAGAACCAC TGACTGCTCT TGACAGTTGT TAACCCACTA 

2 601 GGCTCCTTTG AGTAGAGAAG CCATAGTCCT TCAGCCTCCA ATTGATATCA 

2 651 ATACTTAGGA AGACCACAGC TAGACGGACA AACAGCATTG GGAGGCCTTA 



937 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



2701 GTCCTGCTCC TTTCAATTCC 

2751 TGGCAAGAGA CAGCATGTCA 

2801 AATGCCATGT TCTTGCAGAA 

28 51 TCACCAGACA ACTGCAGAAT 

2901 CTCCTTCACA CAGTCCACGT 

2951 AGATATTTTG GGTTCAGAAG 

3001 AGTTATTTTG AACCCCAAAT 

3051 TTTTGGTGAC ATGGACTTGT 

3101 ATGGTCTACA TTCTGAAGTT 

3151 CCTAAACGTT TCATCAAGAA 

3201 CCTCAGCCCA TCTGTGGGCA 

32 51 CATGATATCA GGACTGGTTA 

3301 CCCTTTTAGA GACACCTTAC 

3351 TCAAAGTAGA AATGTCCTGT 

34 01 CATTTATTAA TCATCCCTGC 

3451 GCTGGAAATT TGCTGCCTCA 

3501 TGTGTTGTTG AAAAAAAAAC 

3551 AAGTTATTTT AATCTATACA 

3601 AAAAAAAA 



ATCCTGTAAA GAACAGGAGT CAGGAGCCGC 
CCTGGGACTC TGCCAGTGCA GAATATGAAC 
AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 
GTAGAACACT GAGCAGGACA ACTGACCTGT 
CACCACGAAT CACACAACAA AAAGGAGGAG 
AAGTAAATGA TAATGTAGCT ACATTTCTTT 
ATTTCCTCAT CTTTTTGTTG TTGTCATTGA 
TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 
GTCTGAAAAT GTCTTCATGA TTAAATTCAG 
CACTACAGAG TCGATACTGT GAGTTTCCAA 
GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 
CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 
TTATGATGAA GTATTTGGGA GAGTGGTTTT 
ATTCCAGTGA TCATCCTCTA AACGTTTTAT 
CTGTGTCTAT TATTATATTC ATATCTCTAC 
ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 
ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 
ATTAAAAACT TTTGCCTATC AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 176 bp to 2074 bp; peptide length: 633 
Category: similarity to known protein 



1 MPLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 

51 SATNVSMVVS AGPWSGEKAE MNILEINKKS RPQLAENKQQ FRNLKQKCLV 

101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGQAEELRQ 

151 YKVLVHSQER ELTQLREKLQ EGRDASRSLN QHLQALLTPD EPDNSQGRDL 

201 REQLAEGCRL AQHLVQKLSP ENDDDEDEDV KVEEAEKVQE LYAPREVQKA 

251 EEKEVPEDSL EECAITCSNS HHPCESNQPY GNTRITFEED QVDSTLIDSS 

301 SHDEWLDAVC I I PENESDHE QEEEKGPVSP RNLQESEEEE APQESWDEGD 

351 WTLSIPPDMS AS YQSDRSTF HSVEEQQVGL ALDIGRHWCD QVKKEDQEAT 

401 SPRLSRELLD EKEPEVLQDS LDRFYSTPFE YLELPDLCQP YRSDFYSLQE 

451 QHLGLALDLD RMKKDQEEEE DQGPPCPRLS RELPEVVEPE DLQDSLDRWY 

501 STPFSYPELP DSCQPYGSCF YSLEEEHVGF SLDVDEIEKY QEGEEDQKPP 

551 CPRLNEVLME AEEPEVLQDS LDRCYSTTST YFQLHASFQQ YRSAFYSFEE 

601 QDVSLALDVD NRFFTLTVIR HHLAFQMGVI FPH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7dl7 , frame 2 

PIR:T00069 hypothetical protein KIAA0454 - human (fragment), N = 1, 
Score = 199, P = le-11 

PIR:A45592 liver stage antigen LSA-1 - Plasmodium falciparum, N = 1 , 
Score = 158, P = 2.7e-07 



>PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 
Length = 1,882 

HSPs : 

Score - 199 (29.9 bits), Expect = 1.0e-ll, P = 1.0e-ll 
Identities = 74/261 (28*), Positives = 122/261 (46%) 

Query: 117 EDCKDLIKSMLRDERLLT EEKLAEELGOAEELRQYKVLVHSQERELTQLREKLQEG 172 

+D + LI+ + + E L EEKLAEEL A +Y L+ Q REL+ LR+K++EG 

Sbjct: 964 KDLESLIQRVSQLEAQLPKNGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query. 


17 3 


Sbjct: 


1024 


Query: 


226 


Sbjct: 


1084 


Que ry i 


285 


Sbjct: 


1140 


Query': 


343 


Sbjct: 


1197 


Score 


= 89 


Identities ! 


Query: 


464 


Sbjct: 


1079 


Query: 


519 


Sbjct : 


1139 


Score 


= 73 


Identities ■ 


Query: 


390 


Sbjct: 


1080 


Query: 


445 


Sbjct: 


1140 


Score 


= 68 


Identities • 


Query: 


31 


Sbjct: 


684 


Query: 


80 


Sbjct: 


744 


Query: 


138 


Sbjct: 


804 


Score 


= 65 


Identities ■ 


Query: 


123 


Sbjct: 


5 


Query: 


179 


Sbjct: 


61 


Score 


= 61 


Identities ■- 


Query: 


134 


Sbjct: 


855 


Query: 


189 


Sbjct: 


913 


Score 


= 57 


Identities = 


Query: 


127 


Sbjct: 


358 



3LNQH LQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 22 5 

+ +H + LL ++ D G+ REQLA+G +L + L KLS ++ 



E + +E L RE+Q+ E+ EV + L+ ++T S+SH +S++ +T 

EKDQAGLEPLA LRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 



+HEA P+ +S+ S + A 

iTHYEEKKAS PSHSDSIHHSSHSAVLSSKPSSTSASQGAK 1196 



ES + +L P + S FH 



-- 35/89 (39%), Positives = 44/89 (49%) 

KDQEEEEDQG PPCPRLSRELPEVVEP-EDLQDSLDRWYSTPFSYPELPDSCQ-PYGS 518 

KD + E+DQ P RLSREL E + E LQ LD TP S L DS + P + 



F S E E D+D + +Y EE + 



L1.0 bits), Expect = 4.8e+00, P = 9.9e-01 
31/88 (35%), Positives = 40/88 (45%) 

DQVKKEDQEATSP RLSRELLD-EKEPEVLQDSLDRFYSTPFEYLELPDLCQ-PYRSD 444 

3 ++DQ P RLSREL + EK EVLQ LD TP L D + P + 



F S L D+D + + EE + 



10.2 bits), Expect = l.le-01, P = 1.0e-01 
36/156 (23%), Positives = 68/156 (43%) 

SHGVGRHQELRDPTV PGPTSSATNVSMVVSAGPWS GEKAEMNILEINKK 79 

S G +HQE +TVPPS + V A G++++ + 

684 S PGKHQHQEEGNVTVRPFPRPQSLDLGATFTVDAHQLDNQSQPRDPGPQSAFSLPGSTQH 743 



R QL++ KQ++++L++K L+++ F AN Y + L+K 



G++E + + + E L+E L EG 



9.8 bits). Expect = 2.2e-01, P = 2.0e-01 
23/96 (23%), Positives = 52/96 (54%) 



++ + D+ + E + E+ EE LRQ ++ V ++ +L +LR+ L ++ t 

LRQRIHDKAVALERAIDEKFSALEEKEKELRQLRLAVRERDHDLERLRDVLS SNEA 60 



Q +++LL ++G ++ EQL+ C+ Q L +*+ 

rMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 93 

5.2 bits), Expect = 5.5e-01, P = 4.2e-01 
27/95 (28%), Positives = 47/95 (49%) 



+E K L +LG+ EE R Y +LV +++ L+ +LQ ++L +++L 



+S R R+ AG ++ SP + DEDE 

3SSLERP-RKLRAVGT LEGSSPHSVPDEDE 94 5 



bits), Expect = 1.4e+00, P = 7.5e-01 
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Query: 


184 


QALLTPDEPDNSQ-GRDLREQLAEGCRLAQHLVQKL 218 








P++S+ R L+ +L EG ++ + ++++ 




Sbjct: 


416 


VKFHA — HPESSERDRTLQVEL-EGAQVLRSRLEEV 448 




Score 


= 54 


(8.1 bits), Expect = 2.7e+00, P = 9.3e-01 




Identities ■• 


= 61/264 (23%), Positives = 121/264 (45%) 




Query : 


3 


LTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQE — LRDPTVPGPTSSATNVSMVVS 


60 






L+ T Q QW L+ ++ET F + + + + L D SAT + + 




Sbjct: 


79 


LSTTCQNLQW-LK-EEMETK-FSRWQKEQESIIQQLQTSLHDRNKEVEDLSAT LLCK 


132 


Query: 


61 


AGPWSGEKAEMNILEINKKSR PQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYE 


117 






GP E AE + +K R L++ +Q L+ + + + ++ R+ 




Sbjct: 


133 


LGPGQSEIAEELCQRLQRKERMLQDLLSDRNKQV — LEHEMEIQGLLQSVSTREQE-SQA 


189 


Query : 


118 


DCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELT QLREKLQEG-- 


172 






+ L+++++ ER +L+LG+L ++ +Q+ E+T +L ++ +G 




Sbjct: 


190 


AAEKLVQALM--ERNSELQALRQYLGGRDSLMS-QAPI SNQQAEVTPTGRLGKQTDQGSM 


246 


Query : 


173 


RDASRSLNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKV 


232 






+ SR + LA P ++ G DL + +A G L ++LS N +E E + 




Sbjct : 


247 


QIPSRDDSTSLTAKEDVSIPRSTLG-DL-DTVA-G LEKELS--NAKEELELMAK 


295 


Query : 


233 


EEAEKVQELYAPREVQKAEEKEVPEDSLEECAIT 266 








+E E EL A + + +E+E+ + + ++T 




Sbjct : 


296 


KERESQMELSALQSMMAVQEEELQVQAADMESLT 329 




Score 


= 49 


(7.4 bits). Expect = 6.3e+00, P = 1.0e+00 




Identities : 


= 21/87 (24%), Positives = 39/87 (44%) 




Query: 


192 


PDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQELYAPREVQKAE 


251 






P ++Q LR QL++ + Q L +KL + +EEK + + +K + 




Sbjct : 


738 


PGSTQ--HLRSQLSQCKQRYQDLQEKLLLS SATVFAQANELEKYRVMLTGESLVKQD 


792 


Query : 


252 


EKEVPEDSLEECAI-TCSNSHHPCESNQ 278 








K++ D L++ TC S + E + 




Sbjct: 


793 


SKQIQVD-LQDLGYETCGRSENEAEREE 619 




Score 


= 46 


(6.9 bits). Expect - 6.3e+00, P = 1.0e+00 




Identities = 


= 19/77 (24%), Positives = 39/77 (50%) 




Query : 


112 


NNYDYEDCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQ- 


170 






+ ++ E+ K+ K + E ++T+E L+E QAE R+ + + + + L+E+L 




Sbjct : 


597 


DGWEIEEDKE— KGEVMVETVVTKEGLSESSLQAE-FRKLQGKLKNAHNI INLLKEQLVL 


653 


Query: 


171 


EGRDASRSLNQHLQALLT 188 








+ + + L L LT 




Sbjct : 


654 


SSKEGNSKLTPELLVHLT 671 





Pedant information for DKFZphtes3_7dl7, frame 2 



Report for DKFZphtes3_7dl7 . 2 



[LENGTH] 633 

[MW] 72951.15 

[pi] 4.40 

[HOMOL] PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 2e-ll 

[BLOCKS] BL00201E 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2PHOSPHOSITE 14 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATION 2 

[ PFAM] TNFR/NGFR cysteine-rich region 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4 . 90 % 

[KW] COILED_COIL 6.95 % 



SEQ MPLTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQELRDPTVPGPTSSATNVSMVVS 

SEG 

PRD ccccceeeeeeeecccccccccccccccccccccccccccccccccccccceeeeeeeee 

COILS 

SEQ AGPWSGEKAEMNILEINKKSRPQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYEDCK 

SEG 

PRD ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhhhcccccccccch 

COILS 
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SEQ DLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEGRDASRSLN 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQE 

SEG xxxxxxxxxxxxxxxx . . 

PRD hhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhh 

COILS CCCCCCC 

SEQ LYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLIDSS 

SEG 

PRD hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccceeeeecccccccccccc 

COILS 

SEQ SHDEWLDAVCII PENESDHEQEEEKGPVSPRNLQESEEEEAPQESWDEGDWTLSI PPDMS 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhheeeccccccchhhhhhcccccccccchhhhhhhccccccccccccccccccc 

COILS 

SEQ ASYQSDRSTFHSVEEQQVGLALDIGRHWCDQVKKEDQEATSPRLSRELLDEKEPEVLQDS 

SEG 

PRD ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccchhhhhhhhhhhheeeecc 

COILS 

SEQ LDRFYSTPFEYLELPDLCQPYRSDFYSLQEQHLGLALDLDRMKKDQEEEEDQGPPCPRLS 

SEG 

PRD hhhhhccceeeeecccccccccccchhhhhhhhhhhhhcchhhhhhhhhhcccccccccc 

COILS 

SEQ RELPEVVEPEDLQDSLDRWYSTPFSYPELPDSCQPYGSCFYSLEEEHVGFSLDVDEIEKY 

SEG 

PRD ccceeeeeccchhhhhhhhhccccccccccccccccccceeeeccceeeccccchhhhhh 

COILS 

SEQ QEGEEDQKPPCPRLNEVLMEAEEPEVLQDSLDRCYSTTSTYFQT.HASFQOYRSAFYSFEE 

SEG 

PRD hcccccccccccchhhhhhhhhchhhhhccccceeecceeeehhhhhhhhhhhhhhhhhc 

COILS 

SEQ QDVSLALDVDNRFFTLTVIRHHLAFQMGVIFPH 

SEG 

PRD cchhhhhhcccchhhhhhhhhhhhhhhhhcccc 

COILS 



Prosite for DKFZphtes3_7dl7 . 2 



PS00001 


54 


: ->58 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


315- 


•>319 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00005 


13 


l->16 


PKC~ 


PHOSPHO, 


SITE 


PDOC00005 


PS00005 


329- 


>332 


PKC" 


~PHOSPHO" 


"site 


PDOC00005 


PS00005 


365- 


•>368 


PKC" 


"PHOSPHO" 


"sits 


PDOC00005 


PS00005 


401- 


>404 


PKC~ 


PHOSPHO 


"site 


PDOC00005 


PS00006 


188- 


>192 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


259- 


>2 63 


CK2~ 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


286- 


>290 


CK2" 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


295- 


>299 


CK2" 


PHOSPHO 


"site 


PDOC00006 


PS00006 


300- 


>304 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


317- 


>321 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


336- 


>340 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


345- 


>3 4 9 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


372- 


>37 6 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


427- 


>431 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


447- 


>451 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


505- 


>509 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


522- 


>526 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


597- 


>601 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00008 


25 


->31 


MYRISTYL 




PDOC00008 


PS00008 


207- 


>213 


MYRISTYL 




PDOC00008 



Pfam for DKFZphtes3_7dl7 . 2 
HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeGtYtDWNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

C+ ++ + N+ ++ + ++ + +++ +++ ++VC 

Query 274 CESNQPYG-NT-RITFEEDQVDS— TLIDSSSHDEWLDAVC 310 
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DKFZphtes3_7j3 



group: cell cycle 

DKFZphtes3_7j3 . 2 encodes a novel 628 amino acid putative protein kinase, which is related to 
the C-TAK1 Cdc25C associated protein kinase. 

Cdc25C is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2. 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
mediates the binding of 14-3-3 protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein 
kinase) phosphorylates Cdc25C on serine 216 in vitro. The new protein is closely related to C- 
Takl and therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to serine/threonine-specific protein kinases 

complete cDNA, complete cds, potential start at Bp 128, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3443 bp 

Poly A stretch at pos. 3399, polyadenylation signal at pos. 3376 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



GTGCTTTACT 
GACCTGTGCC 
GCCGCCCTTG 
GCGCTCCGGC 
AAGGGCTGAT 
CGGCACCACC 
CCTGGGCAAA 
GGCGCCTGGT 
CAAGATCTGA 
CCACCCTCAC 
TCGTGATCGT 
AGCGAGCGGC 
GATCGTCTCT 
ATCTCAAGCT 
GCTGACTTCG 
ATTCTGTGGG 
CCTACACAGG 
ATCCTGGTGC 
AGTGAAACAG 
ATGCCTGTGG 
GCCACCCTGG 
CACCCGAGTG 
GTGACTCTGC 
CCCCTCCTGG 
ACCTGGTGGG 
AGAAGTCCCG 
GCTGATGACA 
GGGCATTCTC 
ACCCTCCGGA 
CTGCTCCCCA 
CTACTACTCC 
GCGACGTGTT 
TCAGGGCTGC 
CTCCCAGACA 
ATGAACTCGC 
GCTGTGAGCG 
CTTGCCTGAA 
ACAACCTCAC 
CTGAGGCGCT 
AGACTGCCAG 
CAAAGCTCAC 
AGATGCAGCT 
AGGACCTGCA 
GAGCAGGGCT 
GTGTCTGTCT 
CGGGAATGCC 
GGGGCCACAG 
TCCCATGAAT 
TACTCATTCC 
GTTCCTACCA 
GCATCCTGGG 



GCGCGCTCTG 
CCGCGCTTCA 
CTCACCTCCT 
CCCACTCCCT 
CAAGTCGCCC 
ACAAGCACAA 
GGCACCTACG 
GGCCATCAAG 
TGCACATACG 
ATCATTGCCA 
CATGGAGTAT 
AGCAGCTCAG 
GCCGTGCACT 
GGAGAACATC 
GCCTCTCCAA 
AGCCCCCTCT 
CCCAGAGGTG 
ATGGCACCAT 
ATCAGCAACG 
CCTGATCCGG 
AGGATGTGGC 
GGAGAGCAGG 
CCGCGCCTCC 
AGAATGGGGC 
GGAAGCACCA 
CAAGGAGAAT 
CTGCCCATCG 
AAGAAGAAGG 
GCTCAGCCCA 
AGAAGGGCAT 
TCTCCCGAGC 
TGTGAGTGGG 
TCCTCCATCG 
GCCTTGGAGC 
CCCACCTCGC 
AGGACAGCAT 
CGGCTCCCAG 
GGGGCTTGAG 
GGCGGCAGGA 
GAGGTGACAG 
CTGAGTGGAG 
GGTTGCACCC 
TCCCAGCTCA 
GGATATGGGA 
TCAGCCCTGC 
CGCGACAGAG 
AGACCTGGAA 
ACTCTGTACA 
CTGCCCAAGT 
ACCACCAGAA 
AATGGTCTGG 



GTACTGCTGT 
GCCCTCCCCG 
GCTCGCCATG 
CGGCCGCAGA 
AAGCCCCTAA 
CCTGCGGCAC 
GGAAGGTGAA 
TCAATCCGGA 
GAGGGAGATT 
TCCATGAAGT 
GCCAGCCGGG 
TGAGCGCGAA 
ATTGCCATCA 
CTCTTGGATG 
CCTCTACCAT 
ATGCCTCGCC 
GACAGCTGGT 
GCCCTTTGAT 
GGGCCTACCG 
TGGCTGTTGA 
CAGTCACTGG 
AGGCTCCGCA 
ATGGCTGACT 
CAAGGTGTGC 
CCCCTGGCCT 
GACATGGCCC 
CCCTGGCAAG 
TGTCAGCCTC 
ATCCCTGCGA 
TCTCAAGAAG 
CCAGTGAATC 
GATCCCAAGG 
CAAAGGCATC 
TCGCGGCCCC 
CCCCTGGCCC 
CCTGTCCTCT 
AGCCCCCACT 
GAGCCCCCCT 
TCCTTTGGGG 
CGACCTACCG 
TAGGCATTGC 
CGAGGGGAGA 
GAAGGCTGAG 
AGTAGGCAAA 
TGAACGAAGA 
TCCACATTGC 
AGAGAACTCT 
CATGGTGCCT 
GGGGCCAGAC 
CTGGATGGTG 
AGTAACGCTT 



GGCTCCCCGT 
CACAGCCTAC 
GAGTCGCTGG 
GCTAGCCCGG 
TGAAGAAGCA 
CGCTACGAGT 
GAAGGCGCGG 
AGGACAAAAT 
GAGATCATGT 
GTTTGAGAAC 
GCGACCTTTA 
GCTAGGCATT 
GAACAGAGTT 
CCAATGGGAA 
CAAGGCAAGT 
AGAGATTGTC 
CCCTGGGTGT 
GGGCATGACC 
GGAGCCACCT 
TGGTGAACCC 
TGGGTCAACT 
TGAGGGTGGG 
GGCTCCGGCG 
AGCTTCTTCA 
GGAGCGCCAG 
AGTCTCTCCA 
AGCAACCTCA 
TGCAGAAGGG 
GCCCAGGGCA 
CCCCGACAGC 
TGGGGAGCTC 
AGCAGAAGCC 
CTCAAACTCA 
CACCACCTTC 
GGCCCAGCCG 
GAGTCCTTTG 
GCGGGGCTGT 
CAGAGGGCCC 
GACAGCTGCT 
ACAGGCACTG 
CCCAGCCCGG 
TGCCTTCTCC 
AGGGTTTGCA 
TGAAATGCGC 
GGATACTAAA 
CTGTTTCTTG 
CCCAGGGCCC 
TCTAAGGACA 
CTCTTTACAC 
GCACCCCTAA 
CGTTATTTTT 



CCTGGTGCGG 
TGATTCCCCT 
TTTTCGCGCG 
CCGCTGGCGG 
GGCGGTGAAG 
TCCTGGAGAC 
GAGAGCTCGG 
CAAAGATGAG 
CATCACTCAA 
AGCAGCAAGA 
TGACTACATC 
TCTTCCGGCA 
GTCCACCGAG 
TATCAAGATT 
TCCTGCAGAC 
AATGGGAAGC 
TCTCCTCTAC 
ATAAGATCCT 
AAACCCTCTG 
CACCCGCCGG 
GGGGCTACGC 
CACCCTGGCA 
TTCCTCCCGC 
AGCAGCATGC 
CATTCGCTCA 
CAGTGACACG 
AGCTGCCAAA 
GTACAGGAGG 
GGCTGCCCCG 
GCGAGTCTGG 
TTGGACGCAG 
TCCGCAAGCT 
ATGGCAAGTT 
GGCTCCCTGG 
ACCCTCAGGG 
ACCAGCTGGA 
GTGTCTGTGG 
TGGAAGCTGC 
TTTCCCTGAC 
AGGGTCTGCT 
TCAGGCTCTC 
CCCACCTCCC 
GTGGAGCCCT 
CAAGGGTTCA 
GAGAGGGGAA 
TGTACATGGG 
ATCTCCTGCA 
GCTCCTTCCC 
ACACATTCCC 
TGTGCATGAG 
ATTTTTATTT 



942 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



2551 TTATTTATTT ATTTATTTTT 

2601 GCTAGAGTGC AATGGCGCGA 

2651 GTTCAAGCGA TTCTCCTGCC 

2701 GCCCGCCACC ATGCCCGGCT 

2751 CTCCATGTTG GTCAGGCTGG 

28 01 CACCTCGGCC TCCCAAAGTG 

2851 CACCTAACCC TTCCTTATTT 

2901 TTCTTCAATG GTTCTCTTCC 

2951 TCCTGAAGTT GCTGCTGTGA 

3001 TGTGTGGACT TCATCTCAAG 

3051 ACCTCAGTGA CTCAGAACTT 

3101 ATGGATGTGT TCTCTAGGCC 

3151 TTATGTTCTT GGCTTTGTGT 

3201 TAATGTGAAT GCTATGTTCT 

32 51 TGTACAGAGA GATATTTTTG 

3301 CACACTCCAC TCCACACTCT 

3351 ATGGACCTCC GTGGCCAAAA 

3401 AAAAAAAAAA AAAAAAAAAA 



TTGAGACGGA GTTTCGCTCT TGGTGCCCAG 
TCTCAGCTCA CCTCAACCTC CGCCTCCCGG 
TCAGCCTCCC TAGTAGCTGG GATTACAGGC 
AATTTTGTAT TTTTAGTAGA GACAGGGTTT 
TCTCAAACTC CCGACCTCAG GTGATCCACC 
CTGGGATTAC AGGCGTGAGC CACCGCGCCC 
AGCCTAGGAG TAAGAGAACA CAATCTCTGT 
CTTTTCCATC CTCCAAACCT GGCCTGAGCC 
ATCTGAAAGA CTTGAAAAGC CTCCGCCTGC 
GGGCCCAGCC TCCTCTGGAC TCCACCTTGG 
CTGCCTCTAA GCTGCTCTAA AGTCCAGACT 
TTCAGGACTC TAGAATGTCC ATATTTATTT 
TTTAGGAAAA GTGAATCTTG CTGTTTTCAA 
GGGAAAATCC ACTATGACAT CTAAGTTTTG 
CAACTATTTC CACCTCCTCC CACAACCCCC 
TGAGTCTCTT TACCTAATGG TCTCTACCTA 
AGTACCATTA AAACCAGAAA GGTGATTGGA 
AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



98202387 : 

C-TAK1 protein kinase phosphorylates human Cdc25C on serine 216 and 
promotes 14-3-3 

protein binding. 



Peptide information for frame 2 



ORF from 128 bp to 2011 bp; peptide length: 628 
Category: strong similarity to known protein 



1 MESLVFARRS GPTPSAAELA RPLAEGLIKS PKPLMKKQAV KRHHHKHNLR 

51 HRYEFLETLG KGTYGKVKKA RESSGRLVAI KSIRKDKIKD EQDLMHIRRE 

101 IEIMSSLNHP HIIAIHEVFE NSSKIVIVME YASRGDLYDY ISERQQLSER 

151 EARHFFRQIV SAVHYCHQNR VVHRDLKLEN ILLDANGNIK IADFGLSNLY 

201 HQGKFLQTFC GSPLYASPEI VNGKPYTGPE VDSWSLGVLL YILVHGTMPF 

251 DGHDHKILVK QISNGAYREP PKPSDACGLI RWLLMVNPTR RATLEDVASH 

301 WWVNWGYATR VGEQEAPHEG GHPGSDSARA SMADWLRRSS RPLLENGAKV 

351 CSFFKOHAPG GGSTTPGLER QHSLKKSRKE NDMAQSLHSD TADDTAHRPG 

401 KSNLKLPKGI LKKKVSASAE GVQEDPPELS PIPASPGQAA PLLPKKGILK 

451 KPRQRESGYY SSPEPSESGE LLDAGDVFVS GDPKEQKPPQ ASGLLLHRKG 

501 ILKLNGKFSQ TALELAAPTT FGSLDELAPP RPLARASRPS GAVSEDSILS 

551 SESFDQLDLP ERLPEPPLRG CVSVDNLTGL EEPPSEGPGS CLRRWRQDPL 

601 GDSCFSLTDC QEVTATYRQA LRVCSKLT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7j3, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7 j3, frame 2 



Report for DKFZphtes3_7j3.2 



[LENGTH] 628 

[MW] 69612.39 

[pi] 9.01 

[HOMOL] TREMBL: AB011109_1 gene: "KIAA0537"; product: "KIAA0537 protein"; Homo sapiens 
mRNA for KIAA0537 protein, complete cds . le-152 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR477w] 

5e-66 

[FUNCAT] 11.01 stress response [S. cerevisiae, YDR477w] 5e-66 
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[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
8e-52 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
repair) 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 



30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 5e-66 

98 classification not yet clear-cut [S. cerevisiae, YLR096w] 6e-54 

30.02 organization of plasma membrane [S. cerevisiae, YLR096w] 6e-54 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 

03.25 cytokinesis [S. cerevisiae, YDR507c] 8e-52 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 9e-51 

30.10 nuclear organization [S. cerevisiae, YKLlOlw] 9e-51 

99 unclassified proteins [S. cerevisiae, YPL1 4 lc ] le-45 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 6e-44 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153C] 6e-44 

base excision repair and nucleotide excision 



11.04 dna repair (direct repair 
[S. cerevisiae, YPL153c] 6e-44 

03.19 recombination and dna repair [S. cerevisiae, 
03.16 dna synthesis and replication [S. cerevisiae, 
10.02.11 key kinases [S. cerevisiae, YBL105c] 3e-34 
04.05.01.04 transcriptional control [S. cerevisiae, 
terminal domain] 2e-28 

[FUNCAT] 03.01 cell, growth [S. cerevisiae, YFR014c] 4e-28 

03.10 sporulation and germination [S. cerevisiae, 
06.13.04 lysosomal and vacuolar degradation 
08 
04 



YPL153C] 6e-44 
YMROOlc] 2e-42 

YKL139w CTK1 - carboxy- 



[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
YPL03 lc ] 
[FUNCAT] 
5e-24 
[ FUNCAT ] 



YGL180w] 2e-26 
S. cerevisiae, YGL180w] 2e-26 
13 vacuolar transport [S. cerevisiae, YGL180w] 2e-26 

99 other transcription activities [S. cerevisiae, YER129w] 4e-26 



5e-24 



[FUNCAT] 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
6e-21 
[ FUNCAT ] 
palmitylation, 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YNL183C] le-17 
[FUNCAT] 
le-17 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
le-15 
[FUNCAT] 
5e-15 
[FUNCAT] 
[ FUNCAT] 
YBR097w] 2e-08 
[ FUNCAT] 
2e-08 
[FUNCAT] 

2e-08 

[FUNCAT] 
[FUNCAT] 
8e-05 
[FUNCAT] 
cerevisiae 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 



02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 
01.04.04 regulation of phosphate utilization [s. cerevisiae, YPL031c] 

mating-type determination, sex-specific proteins 
YHL007C] 6e-24 



03.07 pheromone response, 
[S. cerevisiae, YHL007c] 6e-24 
] 10.05.11 key kinases [S. cerevisiae, 

] 09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] le-22 

10.03.11 key kinases [S. cerevisiae, YNR031c] le-22 
03.13 meiosis [S. cerevisiae, YDR523c] 8e-22 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 



06.07 protein modification ( glycolsylation, acylation, myristylation, 
f arnesylation and processing) [S. cerevisiae, YFL033C) 6e-21 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 7e-19 
10.04.11 key kinases [S. cerevisiae, YDL159w] 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 



08.99 other intracellular-transport activities 



[S. cerevisiae, YNL183c] 



05.07 translational control [S. 
09.04 biogenesis of cytoskeleton 



04.03.99 other trna-transcription activities [S. 

10.04.99 other nutritional-response activities [S. 

c energy conversion [M. genitalium, MG109] 3e-12 
30.09 organization of intracellular transport vesicles 

08.07 vesicular transport (golgi network, etc.) [S. 

06.04 protein targeting, sorting and translocation [S. 



cerevisiae, YDR283C] 2e-17 

[S. cerevisiae, YNL020c] 4e-16 

cerevisiae, YOR061w] 



cerevisiae, YJR059w] 



[S. cerevisiae, 



cerevisiae, 



cerevisiae, 



YBR097W] 
YBR097W] 



30.08 organization of golgi [S. cerevisiae, YBR097w] 2e-0B 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

[S. 



[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[EC] 

[EC] 



01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis 
YHR079C] 8e-05 

BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus le-77 

5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) 4e-68 
5.1.1.1.7 (1-350) Twitchin, kinase domain [Caenorhabditi 2e-85 
5.1.1.1.6 Twitchin, kinase domain [California sea har le-80 
5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 2e-7 6 
5.1.1.2.4 insulin receptor (Human (Homo sapiens) le-69 
5.1.1.1.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-84 
5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn le-68 
cAMP-dependent PK, catalytic subunit [bovine (Bo 9e-85 
(168-437) c-src tyrosine kinase [human (Horn le-69 
cAMP-dependent PK, catalytic subunit [pig (Su le-85 
(167-437) Haemopoetic cell kinase Hck [huma 5e-66 
5.1.1.1.11 Casein kinase-1, CK1 [ Schizosaccharomyces pombe 9e-47 
5.1.1.1.1 Cyclin-dependent PK [Human (Homo sapiens) le-75 



dlwf c 

dlkoa_2 
dlkoba_ 

dlphk 

dlirk 

dlapme_ 
dlfgka_ 
dlydre 
dlfmk_3 
dlcdka 



5.1.1.1.3 
5.1.1.2.2 
5.1.1.1.2 
d2hcka3 5.1.1.2.1 

dlcsn 

dl j sua_ 



dlckja_ 5.1.1.1.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) 5e-54 
2.7.1.38 Phosphorylase kinase le-36 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 4e-40 
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[EC] 


2.7.1.128 [ Acetyl -CoA carboxylase] kinase le-61 




[EC] 


2.7.1.117 Myosin-light-chain kinase 2e-40 




[ EC] 


2.7.1.109 [Hydroxymethylglutaryl-CoA reductase (NADPH) ] 


kinase le-61 


[EC] 


2.7.1.37 Protein kinase 7e-42 




[PIRKW] 


phosphotrans ferase 6e-66 




[PIRKW] 


nucleus le - 6 4 




[ PIRKW] 


calcium 7e-35 




[PIRKW] 


duplication 1 e-38 




[PIRKW] 


tandem repeat 4e-39 




[PIRKW] 


phorbol ester binding le- 38 




[ PIRKW] 


zinc le-38 




[PIRKW] 


cell cycle control le-42 




[PIRKW] 


serine/threonine-specif ic protein kinase 8e-68 




[ PIRKW] 


oncogene le- 4 0 




[PIRKW] 


phospholipid binding le-38 




[PIRKW] 


autophosphorylation le-64 




[PIRKW] 


brain le-40 




[PIRKW] 


heterotetramer 2e-36 




[PIRKW] 


mitosis 7e-42 




[PIRKW] 


polymer le-35 




[PIRKW] 


magnesium 6e-66 




[PIRKW] 


ATP 8e-68 




'PIRKW] 


polyprot ein le - 4 0 




'PIRKW] 


phosphoprotein le- 64 




[PIRKW] 


apoptosis 4e-39 




[ PIRKW] 


gl ycopro te in 7e -4 2 




[ PIRKW] 


leucine zipper 3e-35 




[PIRKW] 


skeletal muscle 7e-35 




[PIRKW] 


protein kinase 5e- 4 1 




' PIRKW] 


cAMP binding 3e-38 




' PIRKW] 


testis 9e-36 




' PIRKW] 


purine nucleotide binding 2e-4 9 




' PIRKW] 


calcium binding 8e — 39 




' PIRKW] 


alternative splicing 3e— 37 




'PIRKW] 


P-laop 2e-49 




' PIRKW] 


lipoprotein 2e— 33 




[PIRKW] 


segmentation le-3 3 




[ PIRKW] 


core protein le-40 




'PIRKW] 


muscle 7e — 35 




'PIRKW] 


my r is t ylat ion 2 e— 3 3 




'PIRKW] 


EF hand 8e-39 




[PIRKW] 


ce 1 1 di vis ion 2e~ 4 0 




[PIRKW] 


calmodulin binding 4e- 40 




SUPFAM] 


ribosomal protein S6 kinase II 5e — 36 




[SUPFAM] 


f ibronectin type III repeat homology 3e-33 




[SUPFAM] 


immunoglobul in homology 3 e - 3 3 




SUPFAM] 


ca lei um- dependent prot e in kinase 8e-39 




[SUPFAM] 


AMP— act iva t ed protein kinase 6e- 66 




SUPFAM] 


protein kinase akt 3e — 42 




[SUPFAM] 


protein kinase SPK1 le-42 




[SUPFAM] 


unas signed Ser/Thr or Tyr-specif ic protein kinases 8e- 


68 


[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase 3e-37 




SUPFAM] 


calmodulin repeat homology 8e-39 




[SUPFAM] 


cAMP receptor protein cyclic nucleotide-binding domain 


homology 6e-33 


SUPFAM] 


protein kinase C zeta le-36 




SUPFAM] 


Dictyostelium cAMP-dependent protein kinase catalytic 


chain le-34 


[SUPFAM] 


death-associated protein kinase 4e-39 




[SUPFAM] 


pleckstrin repeat homology 3e-42 




[SUPFAM] 


ankyrin repeat homology 4e-39 




[SUPFAM] 


protein kinase homology 8e-68 




[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase I I 8e-41 




[SUPFAM] 


protein kinase C zinc-binding repeat homology le-38 




[SUPFAM] 


twitchin 3e-33 




[SUPFAM] 


protein kinase C delta le-38 




[SUPFAM] 


cGMP-dependent protein kinase 6e-33 




[SUPFAM] 


protein kinase cdrl 7e-42 




[SUPFAM] 


protein kinase C C2 region homology 3e-37 




[SUPFAM] 


protein kinase C alpha 3e-37 




[SUPFAM] 


yeast protein kinase C 5e-36 




SUPFAM] 


kinase- related trans forming protein le-4 1 




SUPFAM] 


kinase interaction domain homology le— 42 




SUPFAM] 


gag-akt polyprot ein le-40 




[SUPFAM] 


Ca2+/calmodulin-dependent protein kinase I 4e-40 




SUPFAM] 


protein kinase C mu 4e-33 




PRObl 1 b J 


DC Arpf T M 7TMAC1T STD 0 




;prosite] 


RGD 1 




PROSITE] 


MYRISTYL 4 




n Z\XJ o -L i. hj J 


CAMP PHOSPHO SITE 3 




PROSITE] 


CK2 PHOSPHO SITE 13 




PROSITE] 


TYR PHOSPHO SITE 2 




PROSITE] 


PKC PHOSPHO_SITE 12 
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[PROSITE] ASN_GLYCOS YLATION 2 

[PROSITE] PROTEIN KINASE_ST 1 

[PFAM] Eukaryotic protein kinase domain 

[KW] All_Alpha 

[KW] 3D 

[KW] LOW_COMPLEXITY 10.51 % 

SEQ MESLVFARRSGPTPSAAELARPLAEGLIKSPKPLMKKQAVKRHHHKHNLRHRYEFLETLG 

SEG xxxxxxxxxxxx 

IctpE HHHHHHHHHHHHHHHCCCCCCCC— GGGEEEEEEEE 

SEQ KGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREIEIMSSLNHPHIIAIHEVFE 

SEG 

IctpE CTTTEEEEEEEETTTEEEEEEEEEHHHHHHHCCHHHHHHHHHHHHCCCTTTBCCEEEEEE 

SEQ NSSKIVIVMEYASRGDLYDYISERQQLSEREARHFFRQIVSAVHYCHQNRVVHRDLKLEN 

SEG 

IctpE ETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHCCEECCCCCGGG 

SEQ ILLDANGNIKIADFGLSNLYHQGKFLQTFCGSPLYASPEI VNGKPYTGPEVDSWSLGVLL 

SEG 

IctpE EEETTTTCEEECCTTTTEET-TTT-BCCCCCCGGGCCHHHHHCCCBC-HHHHHHHHHHHH 

SEQ YILVHGTMPFDGHDHKILVKQI SNGAYREPPKPSDACGLIRWLLMVNPTRRATLEDVASH 

SEG 

IctpE HHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTCHHHHHHHHHTTTTTGGGTTTHHHHHHC 

SEQ WWVNWGYATRVGEQEAPHEGGHPGSDSARASMADHLRRSSRPLLENGAKVCSFFKQHAPG 

SEG 

IctpE GG 

SEQ GGSTTPGLERQHSLKKSRKENDMAQSLHSDTADDTAHRPGKSNLKLPKGILKKKVSASAE 

SEG 

IctpE 

SEQ GVQEDPPELSPIPASPGQAAPLLPKKGILKKPRQRESGYYSSPEPSESGELLDAGDVFVS 

SEG xxxxxxxxxxxx . . . xxxxxxxxxxxxxxx 

IctpE 

SEQ GDPKEQKPPQASGLLLHRKGILKLNGKFSQTALELAAPTTFGSLDELAPPRPLARASRPS 

SEG xxxxxxxxxxxxxx 

IctpE 

SEQ GAVSEDSILSSESFDQLDLPERLPEPPLRGCVSVDNLTGLEEPPSEGPGSCLRRWRQDPL 

SEG xxxxxxxxxxxxx 

IctpE 

SEQ GDSCFSLTDCQEVTATYRQALRVCSKLT 

SEG 

IctpE 



Prosite for DKFZphtes3_7 j 3 . 2 



PS00001 


121- 


>125 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00001 


576- 


>580 


ASN 


"glycosylation 


PDOC00001 


PS00004 


290- 


>294 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


337- 


>341 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


413- 


>417 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


3C 


^->33 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


74 


->11 


PKC 


>HOSPHO_SITE 


PDOC00005 


PS00005 


82 


->85 


PKC - 


"PHOSPHO SITE 


PDOC00005 


PS00005 


122- 


>125 


PKC 


PHOSPHO SITE 


PDOC 00005 


PS00005 


142- 


>145 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


148- 


>151 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


289- 


>292 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


327- 


>330 


PKC 


"PHOSPHO_SITE 


PDOC00005 


PS00005 


339- 


>342 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


373- 


>376 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


377- 


>380 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


616- 


>619 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


15 


->19 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


133- 


>137 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


148- 


>152 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


227- 


>231 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


293- 


>297 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


331- 


>335 


CK2 


'PHOSPHO SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


391- 


>395 


CK2 


'PHOSPHO SITE 


PDOC00006 
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PS00006 


461- 


>465 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


511- 


>515 


CK2 _ PHOSPHO- SITE 


PDOC00006 


PS00006 


523- 


>527 


CK2 - PHOSPHO~SITE 


PDOC00006 


PS00006 


578- 


>582 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


606- 


>610 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


453- 


>460 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


453- 


>461 


TYR _ PHOSPHO SITE 


PDOC00007 


PS00008 


320- 


>326 


MYRISTYL 


PDOC00008 


PS00008 


324- 


>330 


MYRISTYL 


PDOC00008 


PS00008 


347- 


>353 


MYRISTYL 


PDOC00008 


PS00008 


360- 


>366 


MYRISTYL 


PDOC00008 


PS00016 


134- 


>137 


RGD 


PDOC00016 


PS00107 


5S 


>->82 


PROTEIN KINASE ATP 


PDOC00100 


PS00107 


55 


>->86 


PROTEIN KINASE ATP 


PDOC00100 


PS00108 


171- 


>184 


PROTEIN KINASE ST 


PDOC00100 



Pfam for DKFZphtes3_7 j3 . 2 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



53 



"■YeigRilGeGsFGtVYkCiWrTGelVAIKIIkkrsms Fl RE I 

YE+++++G+G++G+V+K+++ +G++VAIK I+K++++ ++REI 
YEFLETLGKGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHIRREI 



101 



CJIMRrLnHPNIIRFYDwFedddDHI YMIMEYMeGGDLFDYI rrngpMsEw 
+IM +LNHP+II + ++FE ++ I ++MEY+ GDL+DYI+++ ++SE+ 
102 EIMSSLNHPHIIAIHEVFE-NSSKIVIVMEYASRGDLYDYISERQQLSER 150 

elrflMyQILrGMeYLHSMgllHRDLKPENILIDeNgqlKIcDFGLARqM 
E+R++++QI++++ Y+H ++++HRDLK ENIL+D NG+IKI+DFGL+ ++ 
151 EARHFFRQI VSAVHYCHQNRVVHRDLKLENILLDANGNIKI ADFGLSNLY 200 

nnYe rMt t f CGT PWYMMAPEVI Img . nyYt t kVDMWS FGCILWEMMTGep 
+ + ++ TFCG+P Y +PE+ ++G +Y +++VD WS+G++L++++ G+ 
201 HQGKFLQTFCGSPLYA-SPEI-VNGKPYTGPEVDSWSLGVLLYILVHGTM 248 

PFyddnMeralmrliqrf rrpfWpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF+++ ++ I + +++ +P S+ + ++RW++ ++P++R T +++ 
249 PFDGHDHKI LVKQI SNGAYREPPKPSD-ACGLI RWLLMVN PTRRATLEDV 297 

LnHPWF* 
H W+ 

298 ASHWWV 303 
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DKFZphtes3_7j8 



group: testes derived 

DKFZphtes3_7j8 encodes a novel 410 amino acid protein nearly identical to human 
WUGSC : H_DJ1 159004 . 1 . 

The novel protein contains an additional C-terminal domain, which is not present in 
WUGSC :H_DJ1159O04 .1. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

WUGSC :H_DJ1 15 900 4.1 similarity to YBL104p 

verifies and extends the genmodel WUGSC:H_DJ1159O04 . 1 
similarity to S.cerevisiae YBL104p 

Sequenced by BMFZ 

Locus: /map="7p21-p22" 

Insert length: 3353 bp 

Poly A stretch at pos . 3231, no polyadenylation signal found 



1 GCAAAATATG TTGTATTTGT GGCATAGTTC ATATTTACAC TATCATAAAA 

51 TTATGGCCGA GAAGTTAAAT ATTCTAAATG TGTCAACATA GTTCTCTGTA 

101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 

151 TTTTCTGTTT TTAGGAATGG TGGAAAGCAG CAGACATAAT TGGAGTGGGT 

201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 

2 51 TTACAGCTTT GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 

301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 

351 TGGCATTGTT CAACTTGGAT ATTCGCCGAG CAATCCAAAT CCTGAATGAA 

4 01 GGGGCATCTT C TG AAAAAGG AGATCTGAAT CTCAATGTGG TAGCAATGGC 

451 TTTATCGGGT TATACGGATG AGAAGAACTC CCTTTGGAGA GAAATGTGTA 

501 GCACACTGCG ATTACAGCTA AATAACCCGT ATTTGTGTGT CATGTTTGCA 

551 TTTCTGACAA GTGAAACAGG ATCTTACGAT GGAGTTTTGT ATGAAAACAA 

601 AGTTGCAGTA CGTGACAGAG TGGCATTTGC TTGTAAATTC CTTAGTGATA 

651 CTCAGTTAAA TAGATACATC GAAAAGTTGA CCAATGAAAT GAAAGAGGCT 

701 GGAAATTTGG AAGGAATTTT GCTTACAGGC CTTACTAAAG ATGGAGTGGA 

751 CTTAATGGAG AGTTATGTTG ATAGAACTGG AGATGTTCAA ACAGCAAGTT 

801 ACTGTATGTT ACAGGGTTCA CCTTTAGATG TTCTTAAAGA TGAAAGGGTT 

8 51 CAGTACTGGA TTGAGAATTA TAGAAATTTA TTAGATGCCT GGAGGTTTTG 

901 GCATAAACGA GCTGAATTTG ATATTCACAG GACTAAGTTG GATCCCAGTT 

951 CCAAGCCTTT AGCACAAGTT TTTGTGAGTT GCAATTTCTG TGGCAAGTCA 

1001 ATCTCCTACA GCTGTTCAGC TGTGCCTCAT CAGGGCAGAG GTTTTAGTCA 

1051 GTATGGTGTG AGTGGCTCAC CAACGAAATC TAAAGTCACA AGTTGTCCTG 

1101 GCTGTCGAAA ACCACTTCCT CGATGTGCGC TTTGTCTCAT TAATATGGGA 

1151 ACACCAGTTT CTAGCTGTCC TGGAGGAACC AAATCAGATG AAAAAGTGGA 

1201 CTTGAGCAAG GACAAAAAAT TAGCCCAATT TAACAACTGG TTTACATGGT 

1251 GTCATAATTG CAGGCACGGT GGACATGCTG GACATATGCT TAGTTGGTTC 

1301 AGGGACCATG CAGAGTGCCC TGTGTCTGCA TGCACGTGTA AATGTATGCA 

1351 GTTGGATACA ACGGGGAATC TGGTACCTGC AGAGACTGTC CAGCCATAAA 

14 01 ATGTTACCAC CTTAAGAGAA CCCTTCAAGT GTGGAGCTTT CTAGTAGGTG 

14 51 TCCTTCATAG CTCAGAAACA TACCTCAGAA CAAGCCATTC ATGACTTACC 

1501 TGTAATGGGA AAATAAATCA TTCTATCAGA TCAGCAGTTT TGATGTTTGA 

1551 GTGATTTTGA TATGCTTCAC AGAGACAAAT GCTGCCAAAA TAAACATCGA 

1601 AGTATAGACA TGAGTTCTGT TCAGCAGGTT GAAAAGTCTG ATTTAGAAAA 

1651 ACTTTCTAAG TTTTGGTTGA AATTATGAAC ACTCTAGAAG CAGAATTTCT 

1701 GGAAGAGCCA AGAACAGACT TTGAGCCTAT ATCTTCAAAG CTGAAACTGG 

1751 ATATCTTTCA ATAAAATATG TGCACTTTTA AAATAAAATG ACTAATTCTG 

18 01 TGATTCAGAC AATAGTTTTA AGTTCAGCTG TGCTTAGATT TCTTTCAGAT 

18 51 TAATTTAAAA TTATAGATTT TTACTTTTAG AATTGCAGAG CCCCTATCCC. 

1901 ACACTGGAGA ATATTTTTTA TTACTGTCTG TTATATATGT GTCTATGTGT 

1951 GTGTGTATAT TTATGTGTGT ATGTATAAAT ATGTACTTTT TAAAGGAGCC 

2001 TTTTCCCTCC TTTGATTTTA AGATAAGCAA TCTTTTGGCA TAACATTATC 

2051 GTCTTCCTAG AAAAGCCAAG ATGAAGAATC TATCTTACAA CTTTTTCTCT 

2101 TCAGTAGAGA AAAACATGTA CCATTTCAGG TGAACATACA AAATTTTCAC 

2151 TTTCTACCTT TTGCCTTCCA ATGTCCTGAT TTGTCTTCAA AGGTTTTTCT 

2201 CCATATTAAT TTGTCATCTT ATCCTCATCA CCTGAGAACA TTTTACTGCA 

2251 TACAAAGTCT ATGCAAGATT ATATGTAACT AGCCATTTAG TATAATCTAT 

2301 GTCAGTGTTT CTGTGCTGTC AAATTCCGTC CTGATTTGGA ATACCATACC 

2351 TTGTTCTTTC CAAGGTAGAC TAGGAAGTGT TGGGGAAATA GGGTCACTTC 

24 01 AGAGACCATT TTAGATGTAA GTTTTTAAAT GTAAGTGTTA CTGGGGCTAA 

24 51 GTCAGGGACT TTATTTAAAA CATTTTTTTT TTCTCATTTC ATAGCTAGAT 

2501 AGTTGTAAGA GAAATACAAA GAATTTACAA GATGCTTCTC TGTCATCTGC 
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2551 
2S01 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 



CGTATGCAGA 
AAAGAGTAAA 
AAGAGCTTTC 
ATTAGTTTGA 
TACCTTGAGT 
TAGTACATAT 
GAAAGACATG 
TTTTTCCAGC 
ATATACCCTT 
CTGTCTTAAA 
TTTCATTCTC 
AAGTTTTGGA 
ATGCTTATTT 
GTATGTGTCA 
AAAAAAAAAA 
AAAAAAAAAA 
AAA 



GGGACTGAAC 
TCTTATTTTA 
GTATTAGCAG 
GGTGTAACCT 
GTCTGATACA 
TTACTCTAAA 
GTAATTGCAA 
CTTCATTTGA 
TACCTTTAAT 
TATGAAAGTC 
ATTAGCTAAA 
AATACAGTAT 
GTAATCCTAA 
ACCTCTTAAA 
AAAAAAAAAA 
AAAAAAAAAA 



TAGGAATTTT 
TAGATTTTGG 
TTTTGCCTTA 
AAATATTAAA 
TAAAACCCTT 
TGTCTCACCT 
TTTTTTTTTA 
GTAAATCTTA 
ATTTCATTTG 
AGCTTTAAGT 
GTAAAATGTA 
AAAACATGAA 
TATATGAGGG 
TGTTTTCTGT 
AAAAAAAAAA 
AAAAAAAAAA 



GTAGTTGAAG 
AGAAATAAAA 
TAAAAACTAA' 
AGTAGATTAA 
TTCTAGGAAA 
GCATGACAGT 
AAGATTGCTA 
ATTGATTTCA 
AAGTGTTCCT 
AATGTCAGAC 
AAATTATCTC 
TGTAAAGTCT 
TGACATTTTT 
GAAAAAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



CTGTGTTCAT 
CAAGAATTTT 
GATTTGTCAG 
ATTTATTTTT 
ACATTGGAAG 
CTTTTCAAAT 
TTAAGGGTAC 
TTTTATTAAC 
TTCAAACTTA 
TCATATGCAT 
AAATAGTTAC 
ATTATGTAAT 
AAGATTGTAT 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 167 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 



1 MVESSRHNWS GLDKQSDIQN LNEERILALQ LCGWIKKGTD VDVGPFLNSL 

51 VQEGEWERAA AVALFNLDIR RAIQILNEGA SSEKGDLNLN VVAMALSGYT 

101 DEKNSLWREM CSTLRLQLNN PYLCVMFAFL TSETGSYDGV LYENKVAVRD 

151 RVAFACKFLS DTQLNRYIEK LTNEMKEAGN LEGILLTGLT KDGVDLMESY 

201 VDRTGDVQTA SYCMLQGSPL DVLKDERVQY WIENYRNLLD AWRFWHKRAE 

251 FDIHRSKLDP SSKPLAQVFV SCNFCGKSIS YSCSAVPHQG RGFSQYGVSG 

301 SPTKSKVTSC PGCRKPLPRC ALCLINMGTP VSSCPGGTKS DEKVDLSKDK 

351 KLAQFNNWFT WCHNCRHGGH AGHMLSWFRD HAECPVSACT CKCMQLDTTG 

401 NLVPAETVQP 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_7 j 8 , frame 2 



PIR:S45391 probable membrane protein YBL104C - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 446, P = 4.5e-47 

TREMBL:AC004982_1 gene: "WUGSC : H_DJ11 59O04 . 1"; Homo sapiens PAC clone 

DJ1159O04 from 7p21-p22, complete sequence., N = 1, Score = 2038, P = 
7.6e-211 



>TREMBL:AC004982_1 gene: "WUGSC : H_DJ1 159004 . 1"; Homo sapiens PAC clone 
DJ1159O04 from 7p21-p22, complete sequence. 
Length = 379 



HSPs: 



Score = 2038 [305.8 bits), Expect = 7.6e-211, P = 7.6e-211 
Identities = 379/379 (100%), Positives = 379/379 (100%) 



Query: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 

Sbjct: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 60 

Query: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 

AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 

Sbjct: 61 AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 
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Query : 


121 


PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 


180 




PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 




Sbjct : 


121 


PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 


180 


Query ; 


181 


LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 


240 






LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 




Sbjct: 


181 


LEGILLTGLTKDGVDLMES YVDRTGDVQT AS YCMLQGSPLDVLKDERVQYWI EN YRNLLD 


240 


Query : 


241 


AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 


300 




AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 




Sbjct: 


241 


AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 


300 


Query: 


301 


SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 


360 




SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 




Sbjct: 


301 


SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 


3 60 


Query: 


361 


WCHNCRHGGHAGHMLSWFR 37 9 








WCHNCRHGGHAGHMLSWFR 




Sbjct: 


361 


WCHNCRHGGHAGHMLSWFR 37 9 





Pedant information for DKFZphtes3_7j8, frame 2 



Report for DKFZphtes3_7 j8 . 2 



[LENGTH] 410 

[MW] 45862.45 

[pi] 6.51 

[HOMOL] TREMBL:AC004982_1 gene: "WUGSC : H_DJ11 59O04 . 1" ; Homo sapiens PAC clone DJ1159O04 
from 7p21-p22, complete sequence. 0.0 

[ FUNCAT ] 99 unclassified proteins [S. cerevisiae, YBL104c] 7e-48 

[BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

[BLOCKS] BL00534A Ferrochelatase proteins 

[PIRKW] transmembrane protein 2e-46 

[KW] All_Alpha 



SEQ MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 

PRD cccccccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccccchhhhh 

SEQ AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 

prd hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhccc 

SEQ PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFL3DTQLNRYIEKLTNEMKEAGN 

prd ccccceeeccccccccccceeeccchhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcc 

SEQ LEGILLTGLTKDGVDLMES YVDRTGDVQT AS YCMLQGS PLDVLKDER VQYW I EN YRNLLD 

PRD cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhhhhhhhhhhhhhhhh 

SEQ AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 

PRD hhhhhhhhhhhhhhcccccccccceeeeeeeccccccccccccccccccccccccccccc 

SEQ SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 

PRD ccccccccccccccccccceeeeecccccccccccccccccceeeehhhhhhhhhcceee 

SEQ WCHNCRHGGHAGHMLSWFRDHAECPVSACTCKCMQLDTTGNLVPAETVQP 

PRD eecccccccccchhhhhhhhhccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphtes3_7 j8 . 2 ) 
(No Pfam data available for DKFZphtes3_7j8 .2) 
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DKFZphtes3_7plO 



group: Cell Cycle 

DKFZphtes3_7plO . 1 encodes a novel 422 amino acid putative protein, which is closely related to 
the Xenopus laevis XPMC2 protein. 

In fission yeast the kinases Weel and Mikl control that initiation of mitosis starts after 
completion of DNA synthesis. Yeast in which both Weel and Mikl kinases are defective exhibit a 
mitotic catastrophe phenotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Weel/Mikl kinase function. The XPMC2 protein is localised in 
the nucleus in Xenopus oocytes. The new protein is the human orthologue of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to XPMC2 protein 
complete cDMA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map="9q34" 
Insert length: 2380 bp 

Poly A stretch at pos. 2341, polyadenylation signal at pos . 2318 



1 AGCGTGCGTG CTGAGGTATG CGCAACGCGT GCGGGGTCTC TTCCGGAGTC 
51 TTTTCCTGGA CGGGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 
101 AGGCGCTTGT GCTGCCAGGG CGCCGGGCCC GGGGAGGCCG GGGTCTCGGG 
151 TGGCCGCCGG CCCAGGCGCT GGACGGCAGC AGGATGGGGA AGGCGAAGGT 
201 CCCCGCCTCC AAGCGCGCCC CGAGCAGCCC CGTGGCTAAG CCGGGTCCTG 
2 51 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 
301 AGCAAGGCGC GGGAAGTAAG CAAGAAGCCA GCAAGCGGCC CCGGTGCTGT 
351 GGTGCGACCT CCAAAGGCAC CAGAAGACTT TTCTCAAAAC TGGAAGGCGC 
401 TGCAAGAGTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAAGCCTCTT 
451 GTCATCTCTC AGATGGGTTC CAAAAAGAAG CCCAAAATTA TCCAGCAAAA 
501 CAAAAAAGAG ACCTCGCCTC AAGTGAAGGG AGAGGAGATG CCGGCAGGAA 
551 AAGACCAGGA GGCCAGCAGG GGCTCTGTTC CTTCAGGTTC CAAGATGGAC 
601 AGGAGGGCGC CAGTACCTCG CACCAAGGCC AGTGGAACAG AGCACAATAA 
651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTGTTCCA GAACGAGGGG 
7 01 ACATCGAGCA TAAGAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 
751 ACCGAGGAAG ACATCTGGTT TGACGACGTG GACCCAGCGG ATATCGAAGC 
801 TGCCATAGGT CCAGAGGCGG CCAAGATAGC GAGGAAACAG TTGGGTCAGA 
851 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCGGCGGC 
901 CTGACAAGAG CCTTAGCCTT GGACTGTGAG ATGGTGGGCG TGGGCCCTAA 
951 GGGGGAGGAG AGCATGGCCG CCCGTGTGTC CATCGTGAAC CAGTATGGGA 
1001 AGTGCGTTTA TGACAAGTAC GTCAAACCAA CTGAGCCCGT GACGGACTAT 
1051 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 
1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA GATGCTGAAG GGCAGAATTC 
1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 
1201 CCAAAAAAGA AGATTCGGGA CACACAGAAA TATAAACCTT TCAAGAGTCA 
1251 AGTAAAGAGT GGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 
1301 GGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 
1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG GAGTGGGAGA GCATGGCCCG 
14 01 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 
14 51 AGCAGTCCTG CCCTGCTGCT GCTGCCGCCC CGCTACAGAG GCAATGTGAC 
1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 
1551 CTTTTCAGAA TCATGGCAGA GGGGCGTGGC GTGGTGCTAC TGAGAAGGTC 
1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 
1651 TGTTTGGGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 
17 01 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TGAGGTTGGG 

17 51 GCCTGTGGGC CGCCAGTCCA TACGGTGCTG TCACTGCCCA TCTTCGGTGA 

18 01 CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 
1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCTT GAGGGACCCA 
1901 GACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 
1951 GTGTTACATC AGGGGTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 
2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 
2051 CCCCATAGCA CGAGGATGTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 
2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 
2151 TGTGATGCTA GCCGGTGGCT TTCTGGGCTA GGCCCCAGTT TGAGGCTCCC 
2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAG GGGACGGAGT 
2251 CCAAGGCGTT ATTGGGCCAC CTGACAGCTG GACAGAAAAG GGGCAGACAC 
2301 ACCGAGGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 
2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



951 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Entry HSAC2099 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Genomic sequence from Human 9q34; HTGS 
phase 1, 2 unordered pieces. 

Score = 5055, P = 0.0e+00, identities = 1011/1011 
8 exons Bp 104219-116190 



Medline entries 



95157530 : 

Cloning and expression of a Xenopus gene that prevents mitotic 
catastrophe in fission yeast. 



Peptide information for frame 1 



ORF from 184 bp to 1449 bp; peptide length: 422 
Category: strong similarity to known protein 



1 MGKAKVPASK RAPSSPVAKP GPVKTLTRKK NKKKKRFWKS KAREVSKKPA 
51 SGPGAVVRPP KAPEDFSQNW KALQEWLLKQ KSOAPEKPLV ISQMGSKKKP 
101 KIIQQNKKET SPQVKGEEMP AGKDQEASRG SVPSGSKMDR RAPVPRTKAS 
151 GTEHNKKGTK ERTNGDIVPE RGDIEHKKRK AKEAAPAPPT EEDIWFDDVD 
201 PADIEAAIGP EAAKI ARKQL GQSEGSVSLS LVKEQAFG3L TRALALDCEM 
251 VGVGPKGEES MAARVSIVNQ YGKCVYDKYV KPTEPVTDYR TAVSCIRPEN 
301 LKQGEELEVV QKEVAEMLKG RILVGHALHN DLKVLFLDHP KKKIRDTQKY 
351 KPFKSQVKSG RPSLRLLSEK ILGLQVQQAE HCSIQDAQAA MRLYVMVKKE 
401 WESMARDRRP LLTAPDHCSD DA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7pl0, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7plO, frame 1 



Report for DKFZphtes3_7plO . 1 



[LENGTH] 422 

[MW] 46671.91 

[pi] 9.79 

[HOMOL] PIR:S53818 XPMC2 protein - African clawed frog 7e-96 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c] 2e-42 

[FUNCAT] 01.03.16 polynucleotide degradation [S. cerevisiae, YGR276C] 2e-19 

[FUNCAT] 05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

YGL094C] 7e-13 

[FUNCAT] 04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) [S. 

cerevisiae, YGL094c] 7e-13 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR107w] 6e-10 

[PROSITE] RGD 1 

[PROSITE] MYRISTYL 4 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PH0SPH0_SITE 6 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 8 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 11.37 % 

SEQ MGKAKVPASKRAPSSPVAKPGPVKTLTRKKNKKKKRFWKSKAREVSKKPASGPGAVVRPP 

SEG xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ KAPEDFSQNWKALQEWLLKQKSQAPEK PL VI SQMGSKKKPKIIQQNKKET SPQVKGEEMP 
SEG xxxxxxxxxxxx 



PRD cccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeecccccccccccccee 
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SEQ AGKDQEASRGSVPSGSKMDRRAPVPRTKASGTEHNKKGTKERTNGDI VPERGDIEHKKRK 

SEG xxxxxx 

PRD ecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ AKEAAPAPPTEEDIWFDDVDPADIEAAIGPEAAKIARKQLGQSEGSVSLSLVKEQAFGGL 

SEG xxxxxxxxxxxx 

PRD hhhhcccccccceeeecccccchhhhhhccchhhhhhhhhhcccccchhhhhhhhhhhhh 

SEQ TRALALDCEMVGVGPKGEESMAARVSI VNQYGKCVYDKYVKPTEPVTDYRTAVSGIRPEN 

SEG 

PRD hhhcccccccccccccchhhhhhhhhccccccceeeeeeecccccccccccccccccccc 

SEQ LKQGEELEVVQKEVAEMLKGRILVGHALHNDLKVLFLDHPKKKIRDTQKYKPFKSQVKSG 

SEG 

PRD ccccchhhhhhhhhhhhhhcceeeeccchhhhhhhhhcccccccccceeecccccccccc 

SEQ RPSLRLLSEKILGLQVQQAEHCSIQDAQAAMRLYVMVKKEWESMARDRRPLLTAPDHCSD 

SEG 

PRD chhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

SEQ DA 
SEG 

PRD CC 



Prosite for DKFZphtes3_7plO . 1 



PS00002 


51->55 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


107->111 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


156->160 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


9->12 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


27->30 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


46->49 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


96->99 


PKC PHOSPHO 


"site 


PDOC0000 5 


PS00005 


347->350 


PKC PHOSPHO 


"site 


PDOC000C5 


PS00005 


359->362 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


363->366 


PKC PHOSPHO 


"STTE 


PDOC00005 


PS00005 


368->371 


PKC PHOSPH0_ 


"site 


PDOC00005 


PS00006 


136->140 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


150->154 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


163->167 


CK2_PHOSPHO_ 


site 


PDOC00006 


PS00006 


190->194 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


383->387 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00006 


413->417 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


343->351 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


342->351 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


130->136 


MYRISTYL 




PDOC00008 


PS00008 


151->157 


MYRISTYL 




PDOC00008 


PS00008 


221->227 


MYRISTYL 




PDOC00008 


PS00008 


239->245 


MYRISTYL 




PDOC00008 


PS00016 


171->174 


RGD 




PDOCi30016 



(No Pfam data available for DKFZphtes3_7plO . 1 ) 
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DKFZphtes3_7p9 



group: nucleic acid management 

DKFZphtes3_7p9 encodes a novel 691 amino acid protein with similarity to human nuclear domain 
10 protein NDP52. 

The nuclear domain (ND)10 also described as POD or Kr bodies is involved in the development of 
acute promyelocytic leukemia and virus-host interactions. The NDP52 protein is part of this 
complex structure. In vivo, NDP52 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. ND10 plays an important role in the viral life 
cycle . 

The novel protein is similar to NDP52. It contains three leucine zippers and a RGD cell 
attachment site. This protein seems to be a novel part of the ND819) complex. 

The new protein can find application in modulation of viral infections and tumour events. 



similarity to nuclear domain 10 protein NDP52 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="329.1 cR from top of Chrl2 linkage group" 
Insert length: 3003 bp 

Poly A stretch at pos. 2957, no polyadenylation signal found 



1 AAGGTGAGGG GAACAGCTGA TCCGTCTGTT GGGAGGACAG ATATCTCAAG 
51 GCCAGGATGG AAGAATCACC ACTAAGCCGG GCACCATCCC GTGGTGGAGT 
101 CAACTTTCTC AATGTAGCCC GGACCTACAT CCCCAACACC AAGGTGGAAT 
151 GTCACTACAC CCTTCCCCCA GGCACCATGC CCAGTGCCAG TGACTGGATT 
201 GGCATCTTCA AGGTGGAGGC TGCCTGTGTT CGGGATTACC ACACATTTGT 
251 GTGGTCTTCC GTGCCTGAAA GTACAACTGA TGGTTCCCCC ATTCACACCA 
301 GTGTCCAGTT CCAAGCCAGC TACCTGCCCA AACCAGGAGC TCAGCTCTAC 
351 CAGTTCCGAT ATGTGAACCG CCAGGGCCAG GTGTGTGGGC AGAGCCCCCC 
4 01 TTTCCAGTTC CGAGAGCCAA GGCCCATGGA TGAACTGGTG ACCCTGGAGG 
451 AGGCTGATGG GGGCTCTGAC ATCCTGCTGG TTGTCCCCAA GGCAACTGTG 
501 TTACAGAACC AGCTCGATGA GAGCCAGCAA GAACGGAATG ACCTGATGCA 
551 GCTGAAGCTA CAGCTGGAGG GACAGGTGAC AGAGCTGAGG AGCCGAGTGC 
601 AGGAGCTCGA GAGGGCTCTG GCAACTGCCA GGCAGGAGCA CACGGAGCTG 
651 ATGGAACAGT ACAAGGGGAT TTCCCGGTCC CATGGGGAGA TCACAGAAGA 
701 GAGGGACATC CTGAGCCGGC AACAGGGAGA CCATGTGGCA CGCATCCTGG 
7 51 AGCTAGAGGA TGACATCCAG ACCATCAGTG AGAAAGTGCT GACGAAGGAA 
801 GTGGAGCTGG ACAGGCTTAG AGACACAGTG AAGGCCCTGA CTCGGGAACA 
851 AGAGAAGCTC CTTGGGCAAC TGAAAGAAGT ACAAGCAGAC AAGGAGCAAA 
901 GTGAGGCTGA GCTCCAAGTG GCACAACAGG AGAACCATCA CTTAAATTTG 
951 GACCTGAAGG AGGCGAAGAG CTGGCAAGAG GAGCAGAGTG CTCAGGCTCA 
1001 GCGACTGAAA GACAAGGTGG CCCAGATGAA GGACACCCTA GGCCAGGCCC 
1051 AGCAGCGGGT GGCCGAGCTG GAGCCCTTGA AGGAGCAGCT TCGAGGGGCC 
1101 CAGGAGCTTG CAGCCTCAAG CCAGCAGAAA GCCACCCTTC TTGGGGAGGA 
1151 GTTGGCCAGC GCAGCAGCAG CCAGGGACCG CACCATAGCC GAACTACACC 
1201 GCAGCCGCCT GGAAGTGGCT GAAGTTAACG GCAGGCTGGC TGAGCTCGGT 
1251 -TTGCACTTGA AGGAAGAAAA ATGCCAATGG AGCAAGGAGC GGGCAGGGCT 
1301 GCTGCAGAGT GTGGAGGCAG AGAAGGACAA GATCCTGAAG CTGAGTGCAG 
1351 AGATACTTCG ATTGGAGAAG GCAGTTCAGG AGGAGAGGAC CCAAAACCAA 
14 01 GTGTTCAAGA CTGAGCTGGC CCGGGAGAAG GATTCTAGCC TGGTACAGTT 
14 51 GTCAGAAAGT AAGCGGGAGC TGACAGAGCT GCGGTCAGCC CTGCGTGTGC 
1501 TCCAGAAGGA AAAGGAGCAG TTACAGGAGG AGAAACAGGA ATTGCTAGAG 
1551 TACATGAGAA AGCTAGAGGC CCGCCTGGAG AAGGTGGCAG ATGAGAAGTG 
1601 GAATGAGGAT GCCACCACAG AGGATGAGGA GGCCGCTGTG GGGCTGAGCT 
1651 GCCCGGCAGC TCTGACAGAC TCAGAGGACG AGTCCCCAGA AGACATGAGG 
1701 CTCCCACCCT ATGGCCTTTG TGAGCGTGGA GACCCAGGCT CCTCTCCTGC 
1751 TGGGCCTCGA GAGGCTTCTC CCCTTGITGT CATCAGCCAG CCGGCTCCCA 
1801 TTTCTCCTCA CCTCTCTGGG CCAGCTGAGG ACAGTAGCTC TGACTCGGAG 
18 51 GCTGAAGATG AGAAGTCAGT CCTGATGGCA GCTGTGCAGA GTGGGGGTGA 
1901 GGAGGCCAAC TTACTGCTTC CTGAACTGGG CAGTGCCTTC TATGACATGG 
1951 CCAGTGGCTT TACAGTGGGT ACCCTGTCAG AAACCAGCAC TGGGGGCCCT 
2001 GCCACCCCCA CATGGAAGGA GTGTCCTATC TGTAAGGAGC GCTTTCCTGC 
20 51 TGAGAGTGAC AAGGATGCCC TGGAGGACCA CATGGATGGA CACTTCTTTT 
2101 TCAGCACCCA GGACCCCTTC ACCTTTGAGT GATCTTACTC CCTCGTACAT 
2151 GCACAAATAC ACACTCATGC ACACACACAC TCACACACAT GCATACACTT 
2201 AGGTTTCATG CCCATTTTCT ATCACACTGG GCTCCATGAT ATTCTGTTCC 
2251 CTAAGAACTG CTTCTGTGTG CCCTGTTTTC ATCCCAAGAT TTCTCACTTC 
2301 ATCCTCTCCT ACCTGGCTCT TTTGTCCCAG GGAGGGGTCC TGTTCGGAAG 

23 51 CAGTGGCTGA ATTTATCCCC TGAAAGTGGT TTTGGAGGAA CCGGGATGGA 

24 01 GGAGGCCTTC CCCTGTGGGA ATAGAATCGT CCACTCCTAG CCCTGGTTGC 
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2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 



TTCTGATACA 
CTGATGCCCC 
GTTTCCGTTG 
GCATGGACAT 
TTCCTCTTTC 
CATAACCTTA 
CCTGTATTCT 
GTATTTGTGC 
CCCATTTCCA 
GCTTCCCCCT 
GATATGTAAA 
AAA 



CAGCCACTGC 
AAAGCCAATT 
GTTTACCTGA 
CATGGCCTCT 
TGTTCCCCCA 
GGTATTCAGT 
GTATCCTCTC 
CTTCTGTGAG 
AGGCCCCCCT 
GATATCCATC 
AAAAAAAAAA 



ACACACACAC 
CCTGGGGCAC 
GTTTTCTCTG 
CAGGTCCCTT 
TTGACTTCTG 
TTGGAGGGGT 
CTCGCATCTC 
GAATGGGGGG 
CCCTCTCCAG 
CCTTTGTAGT 
AAAAAAAAAA 



TCACACTCAC 
CCTACCCTCT 
GGGTCTGCAC 
TTGGTTCTCA 
TGCCCCACCC 
TTTTTGTATT 
CTCACATGGA 
AACAAGTGGT 
GTCCCCCCAC 
TTGAACAAAT 
AAAAAAAAAA 



ACTCCCTTGT 
CTTATTTGGA 
AGAGGCAGCA 
GTTTCATTGG 
TAGCCTTTTC 
TTTGAGGATT 
AAGAAATAAT 
CCCAGGTATC 
AGCAATAAAA 
ATATTTATAT 
AAAAAAAAAA 



BLAST Results 



Entry HS189353 from database EMBL: 
human STS WI-11261. 
Score = 2191, P = 1.4e-92, identities = 463/485 



Medline entries 



95310349: 

Molecular characterization of NDP52, a novel protein of the 
nuclear domain 10, which is redistributed upon virus 
infection and interferon treatment. 

97375672 : 

Cellular localization, expression, and structure of the nuclear 
dot protein 52. 



Peptide information for frame 3 



ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to known protein 
Prosite motifs: RGD (557-560) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (475-497) 
LEUCINE_ZIPPER (482-504) 



1 MEESPLSRAP SRGGVNFLNV ARTYI PNTKV ECHYTLPPGT MPSASDWIGI 
51 FKVEAACVRD YHTFVWSSVP ESTTDGSPIH TSVQFQASYL PKPGAQLYQF 
101 RYVNRQGQVC GQSPPFQFRE PRPMDELVTL EEADGGSDIL LVVPKATVLQ 
151 NQLDESQQER NDLMQLKLQL EGQVTELRSR VQELERALAT ARQEHTELME 
201 QYKGISRSHG EITEERDILS RQQGDHVARI LELEDDIQTI SEKVLTKEVE 
251 LDRLRDTVKA LTREQEKLLG QLKEVQADKE QSEAELQVAQ QENHHLNLDL 
301 KEAKSWQEEQ SAQAQRLKDK VAQMKDTLGQ AQQRVAELEP LKEQLRGAQE 
351 LAASSQQKAT LLGEELASAA AARDRTIAEL HRSRLEVAEV NGRLAELGLH 
401 LKEEKCQWSK ERAGLLQSVE AEKDKILKLS AEILRLEKAV QEERTQNQVF 
451 KTELAREKDS SLVQLSESKR ELTELRSALR VLQKEKEQLQ EEKQELLEYM 
501 RKLEARLEKV ADEKWNEDAT TEDEEAAVGL SCPAALTDSE DESPEDMRLP 
551 PYGLCERGDP GSSPAGPREA SPLVVISQPA PISPHLSGPA EDSSSDSEAE 
601 DEKSVLMAAV QSGGEEANLL LPELGSAFYD MASGFTVGTL SETSTGGPAT 
651 PTWKECPICK ERFPAESDKD ALEDHMDGHF FFSTQDPFTF E 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_7p9, frame 3 

PIR:A56733 nuclear domain 10 protein NDP52 - human, N = 2, Score = 307, 
P - 7.7e-28 

TREMBL:AB008852_1 gene: "NDP"; product: "NDP52"; Bos taurus mRNA for 
NDP52, complete cds . , N = 2, Score = 302, P = 4e-27 

TREMBL :AC004549_1 gene: "WUGSC : H_RG459N13 . 1"; product: "TXBP151"; Homo 
sapiens BAC clone RG459N13 from 7pl5, complete sequence., N = 2, Score 
= 275, P = 2.3e-25 
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PIR:G02043 TXBP151 - human, N = 2, Score = 270, P = 8.5e-25 

TREMBL: DM3581 6_4 gene: "zip"; product: "nonmuscle myosin-II heavy 
chain"; Drosophila melanogaster nonmuscle myosin-II heavy chain (zip) 
gene, complete cds . , N = 1, Score = 254, P = 1.4e-17 



>PIR:A56733 nuclear domain 10 protein NDP52 - human 
Length = 446 

HSPs: 



Score 


= 307 


(46.1 bits). Expect = 7.7e-28, Sum P(2) = 7.7e-28 




Identities - 


= 104/323 (32%), Positives = 158/323 (48%) 




Query: 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 


74 






V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 




Sbjct : 


23 


VIFNSVEKFYIPGGDVTCHYTFTQHFI PRRKDWIGI FRVGWKTTREYYTFMWVTLPI DLN 


82 


Query: 


75 


DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 


134 






+ S VQF+A YLPK + YQF YV+ G V G S PFQFR D LV + 




Sbjct : 


83 


mrc AKOOFVOFK AYYT.PKnrt-FYYClFr WnFRGWRGA^T PFOFR PFNFFDTT.WTTO — 


139 


Query: 


135 


GGSDILLVVPKATVLQNQ-LDES QQERNDLMQLKLQLEGQVTE-LRSRVQELERALA 


189 






G + + K +NQ L +S Q++N MQ +LQ + + E L+S ++LE + 




Sbjct: 


140 


GFVEEIEOHNKELCKENOELKDSCI SLOKONSDMOAELOKKOEELETLOSINKKLELKVK 


199 


Query: 


190 


TARQE-HTELMEQYKGISRSHGEITEERDI -LSRQQGDHVARILELEDDIQTISEKVLTK 


247 






+ TEL+ QK++ E+I + +Q + E+E +Q +K T+ 




Sbjct: 


200 


EQKDYWETELL-QLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEMEKLVQGDQDK — TE 


256 


Query: 


248 


EVE-LDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSW 


306 






++E L + D + EQ K + L++ +Q+E QQE N DL + S 




Sbjct: 


257 


QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQNiTTAMKKQQELMDENFDLSKRLSE 


316 


Query: 


307 


QEEQS AQAQRLKDKVAQMKDTLGQAQQRV 335 








E QR K + + + D L + R+ 




Sbjct: 


317 


NEI ICNALQRQKF,RLF,GF.NDTJ,KRENSRL 3 45 




Score 


= 304 


(45.6 bits), Expect = 2.1e-27, Sum P(2) = 2.1e-27 




Identities ■ 


= 98/337 (29%), Positives = 163/337 (48%) 




Query : 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGI FKVEAACVRDYHTFVWSSVPESTT 


74 




V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 




Sbjct : 


23 


VIFNS VEKFY I PGGDVTCHYTFTQHFI PRRKDWIGI FRVGWKTTREYYTFMWVTLPI DLN 


82 


Query: 


75 


DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 


134 






+ S VQF+A YLPK + YQF YV+ G V G S PFQFR P +E 




Sbjct: 


83 


NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASI PFQFR PENE 


130 


Query: 


135 


GGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQE 


194 






DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE 




Sbjct: 


131 


--EDILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQE 


182 


Query: 


195 


HTELMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDR 


253 






E ++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+ 




Sbjct : 


183 


ELETLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQ 


232 


Query: 


254 


LRDTVKALTREQEKLL--GQLKEVQAD KEQSEAELQVAQQENHHLNLDLKEAKSWQE 


308 






L+ + +E EKL+ Q K Q + KE L + +Q L+ + Q 




Sbjct: 


233 


T.DAOT ^TOFKFMFKT.VnGDODKTFOT.FOT.KKFNDHT.FT TFORK DOKKT.FOTVFOMKON 


292 


Query: 


309 


EQSA — QAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQEL 351 








E +A + Q L D+ + L + + L+ KE+L G +L 




Sbjct: 


293 


ETTAMKKQQELMDENFDLSKRLSENEIICNALQRQKERLEGENDL 337 




Score 


= 124 


(18.6 bits), Expect = 2.3e-06, Sum P(2) = 2.3e-06 




Identities = 


■ 53/227 (23%), Positives = 113/227 (49%) 




Query: 


138 


DILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQEHTE 


197 






DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE E 




Sbjct: 


132 


DILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQEELE 


185 


Query: 


198 


LMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDRLRD 


256 






++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+L+ 




Sbjct: 


186 


TLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQLQA 


235 


Query: 


257 


TVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSWQEEQSAQAQR 


316 






+ +E EKL VQ D++++E +L+ ++EN HL L L E + Q++ ++ 




Sbjct: 


236 


QLSTQEKEMEKL VQGDQDKTE-QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQ 


288 






956 





12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Query: 


317 


Sbjct: 


289 


Score 


= 103 


Identities ! 


Query: 


299 


Sbjct: 


141 


Query: 


355 


Sbjct: 


200 


Query: 


415 


Sbjct: 


257 


Query: 


471 


Sbjct: 


314 


Query: 


523 


Sbjct: 


369 


Scare 


= 64 


Identities • 


Query: 


651 


Sbjct: 


417 


Score 


= 64 


Identities > 


Query: 


470 


Sbjct: 


154 


Query: 


516 


Sbjct: 


214 


Score 


= 47 


Identities ■ 


Query: 


631 


Sbjct: 


374 



MK + Q+ + E L ++L + + A +QK L GE 
hMK KQQELMDENFDLSKRLSEHEIICNALQRQKERLEGE 334 



123/278 (44%) 



+++E + +E + Q LKD 



Q+ 



L+ ++ E D + L L+ + 



+ Q++ ELE L + + 



EL 



-RLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKR 47 0 
4LE+ V E+ QN+ T + ++++ SKR 



GL+ 



LQ++KE+L+ E +LL 



SP 



++ +RL +N T DE A 

-KRENSRLLSYMGLDFNSLPYQVPTSDEGGA 368 



+C+ 



++ PL 



3.6 bits). Expect = 7.7e-28, Sum P(2) 
13/29 (44%), Positives = 17/29 (58%) 



= 7.7e-28 



CPIC + FPA ++K 



EDH+ 



5.8e+00, Sum P(2) 
ives = 45/90 (50%) 



1.0e+00 



+E EL+ + 



LQK+ 



+Q F. 



-KQELLEYMRKLEARLE-KVADEK— W- 
KQE LE ++ + +LE KV ++K W 



515 



N+ ++E+E+ + + 



A L+ EE 



7.1 bits), Expect = 4.6e-26, Sura P(2) 
11/30 (36%), Posxtives = 17/30 (56%) 



4 ,6e-26 



+A G 



+ E+S+ P + K+CPICK 



Pedant information for DKFZphtes3_7p9, frame 3 
Report for DKFZphtes3_7p9 . 3 

[LENGTH] 691 

[MW] 77336.52 

[pi] 4.77 

t HOMOL ] PIR:A56733 nuclear domain 10 protein NDP52 - human 2e-29 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YDR356w] 2e-ll 

[FUNCAT] 30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] 2e-ll 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

2e-ll 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] 2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 2e-ll 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YLR309c] 2e-08 

[ FUNCAT ] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 

MYOl - myosin-1 isoform] 3e-07 

[FUNCAT] 08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

myosin-1 isoform] 3e-07 

[FUNCAT] 03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-07 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YJL074c] 4e-07 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YNL250w] 4e-06 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YBR289w] 4e-06 
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[FUNCATJ 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YBR289w] 

4e-06 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YBR289w] 4e-06 

[FUNCAT] 03.19 recombination and dna repair [S. cerevisiae, YNL250w] 4e-06 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YNL250w] 4e-06 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 
jannaschii, MJ1643] le-05 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJR134c] 4e-05 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YKR095w] 4e-05 

[ FUNCAT ] 08.19 cellular import [S. cerevisiae, YNL243w] 7e-05 

[FUNCAT] 01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] 7e-05 

[ FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YNL243w] 7e-05 

[FUNCAT] 08.99 other intracellular-transport activities [S. cerevisiae, YNL079c] 

2e-04 

[FUNCAT] 03.01 cell growth [S. cerevisiae, YNL079c] 2e-04 

[BLOCKS] BL00682B ZP domain proteins 

[EC] 3.6.1.32 Myosin ATPase le-13 

[PIRKW] nucleus 6e-10 

[PIRKW] phosphotransferase 2e-07 

[PIRKW] duplication 9e-07 

[PIRKW] citrulline le-09 

[PIRKW] tandem repeat le-13 

[PIRKW] heart 5e-ll 

[PIRKW] endocytosis 5e-09 

[PIRKW] polymorphism 3e-06 

[PIRKW] cornified cell envelope le-06 

[PIRKW] transmembrane protein 6e-12 

[PIRKW] serine/threonine-specific protein kinase 2e-07 

[PIRKW] cell wall le-06 

[PIRKW] zinc finger Se-09 

[PIRKW] metal binding 5e-09 

[PIRKW] DNA binding 8e-08 

[PIRKW] muscle contraction le-11 

[PIRKW] IgG constant region-binding le-06 

[PIRKW] acetylated amino end 4e-09 

[PIRKW] actin binding le-13 

[PIRKW] mitosis 9e-09 

[PIRKW] microtubule binding 9e-09 

[PIRKW] ATP le-13 

[PIRKW] thick filament le-10 

[PIRKW] phosphoprotein le-13 

[PIRKW] epidermis le-06 

[PIRKW] leucine zipper le-07 

[PIRKW] glycoprotein 4e-07 

[PIRKW] skeletal muscle 4e-10 

[PIRKW] disulfide bond le-07 

[PIRKW] calcium binding le-09 

[PIRKW] alternative splicing le-10 

[PIRKW] coiled coil le-13 

[PIRKW] P-loop le-13 

[PIRKW] heptad repeat 6e-10 

[PIRKW] methylated amino acid le-13 

[PIRKW] basement membrane 3e-06 

[PIRKW] immunoglobulin receptor 2e-07 

[PIRKW] peripheral membrane protein 5e-09 

[PIRKW] dimer le-07 

[PIRKW] cardiac muscle le-10 

[PIRKW] extracellular matrix 3e-06 

[PIRKW] hydrolase le-13 

[ PIRKW] microtubule 6e-10 

[PIRKW] muscle 2e-09 

[PIRKW] membrane protein 3e-06 

[PIRKW] EF hand le-09 

[PIRKW] cytoskeleton 6e-12 

[PIRKW] hair le-09 

[PIRKW] calmodulin binding 5e-09 

[PIRKW] Golgi apparatus 3e-08 

[SUPFAM] myosin heavy chain le-13 

[SUPFAM] conserved hypothetical P115 protein le-08 

[SUPFAM] hypothetical protein YJL074C 5e-07 

[SUPFAM] centromere protein E 9e-09 

[SUPFAM] unassigned Ser/Thr or Tyr-specific protein kinases 2e-07 

[SUPFAM] calmodulin repeat homology le-09 

[SUPFAM] myosin motor domain homology le-13 

[SUPFAM] alpha-actinin actin-binding domain homology 3e-13 

[SUPFAM] tropomyosin 3e-07 

[SUPFAM] plectin 3e-13 

[SUPFAM] trichohyalin le-09 

[SUPFAM] pleckstrin repeat homology 4e-06 

[SUPFAM] ribosomal protein S10 homology 3e-13 
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[SUPFAM] 


giantin 3e-08 


[SUPFAM] 


protein kinase homology 2e-07 


[ SUPFAM] 


protein kinase C zinc-binding repeat homology 4e-06 


[ SUPFAM] 


involucrin le-06 


[SUPFAM] 


kinesin motor domain 'homology 9e-09 


[ SUPFAM] 


human early endosome antigen 1 5e-09 


[SUPFAM] 


unassigned kinesin-related proteins 8e-08 


[SUPFAM] 


M5 protein 3e-08 


[SUPFAM] 


cytoskeletal keratin 3e-08 


[PROSITE] 


LEUCINE ZIPPER 3 


[PROSITE] 


RGD 1 


[PROSITE] 


MYRISTYL 6 


[PROSITE] 


CK2 PHOSPHO SITE 25 


[PROSITE] 


PKC PHOSPHO SITE 6 


[KW] 


All Alpha 


[KW] 


LOW COMPLEXITY 9.12 % 


[KW] 


COILED COIL 39.36 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MEESPLSRAPSRGCVNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRD 
cccccccccccccceeeecceeeeeccccceeeeeccccccccccceeeeeeeeeecccc 

YHTFVWSSVPESTTDGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFRE 
eeeeeeeecccccccccchhhhhhhhhhhhccccccceeeeecccccccccccccccccc 

PRPMDELVTLEEADGGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSR 

cccccceeehhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

VQELERALATARQEHTELMEQYKGISRSHGEITEERDILSRQQGDHVARILELEDDIQTI 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEKVLTKEVELDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

KEAKSWQEEQSAQAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELAASSQQKAT 

xx 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

CCCCC . . CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC . CCCCCCCCCCCCCCCCCCCC 



SEQ LLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAGLLQSVE 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCC CCCCCCCCCCC 

SEQ AEKDKILKLSAEILRLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKRELTELRSALR 

SEG ; 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCC 

SEQ VLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNEDATTEDEEAAVGLSCPAALTDSE 

SEG . xxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



SEQ 
SEG 
PRD 
COILS 



DESPEDMRLPPYGLCERGDPGSSPAGPREASPLVVISQPAPISPHLSGPAEDSSSDSEAE 

xxxxxxxxxxx 

hhhhccccccccccccccccccccccccccceeeeeeccccccccccccccccccccchh 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



DEKSVLMAAVQSGGEEANLLLPELGSAFYDMASGFTVGTLSETSTGGPATPTWKECPICK 
XX 

hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 



ERFPAESDKDALEDHMDGHFFFSTQDPFTFE 
cccccccchhhhhhhccccceeecccccccc 
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Prosite for DKFZphtes3_7p9-3 



□c no o n s 


190- 


>193 


PKC PHOSPHO 


SITE 


PDOC00005 


□e noo n s 


24 1- 


>24 4 


PKC~~PH0SPH0~ 


"site 


PDOC00005 


pq nnn n s 


257- 


>260 


PKC^PHOSPHO 


SITE 


PDOC00005 


pqo nn n s 


468- 


>471 


PKC _ PHOSPHO" 


"site 


PDOC00005 


pq ooo n s 


652- 


>655 


PKC~~PH0SPH0 


"site 


PDOC00005 


CtD \J \J V \J <-> 


667- 


>670 


PKC _ PHOSPHO - 


"site 


PDOC00005 


pq ooo o 6 


28 


->32 


CK2~PH0SPH0" 


"site 


PDOC00006 


PS00006 


43 


->47 


CK2 _ PH0SPH0~ 


"site 


PDOC00006 


pqooon6 


68 


->72 


CK2~PH0SPH0~ 


"site 


PDOC00006 


PS 00006 
r o v u u j v 


72 


_>7 6 


CK2 PHOSPHO 


"site 


PDOC00006 


pc no o o 6 


129- 


>133 


CK?~ PHOSPHO" 


"site 


PDOC00006 


□c noo o 6 


15 6- 


>1 60 


CK2 PHOSPHO 


site 


PDOC00006 


pqOOOOfi 

c O U *J U U O 


208- 


>21 2 


CK2 _ PHOSPHO" 


"site 


PDOC00006 


pcnnn n 

r o Uu U U O 


239- 


>24 3 


CK2 PHOSPHO 


site 


PDOC00006 


pq 00006 

C O \J 'J \J i_* o 


282- 


>28 6 


CK7~ PHOSPHO - 


"site 


PDOC00006 


pq00 0 0 6 


305- 


>30 9 


CK7 PHOSPHO 


SITE 


PDOC00006 


pcnooofi 

IT O \J \J vJ \J U 


37 6- 


>38 0 


CV7 PHOSPHO 


SITE 


PDOC00C06 


pqn0006 


383- 


>387 


OK? PHOSPHO 


SITE 


PDOC00006 


pcnoOOfi 


4 68- 


>472 


CK? PHOSPHO 


SITE 


PDOC00006 


pqoo o o 6 

tr O \J VJ U U D 


520- 


>52 4 


rK?~PHOSPHO" 


"site 


pnor*00006 


PQOO 0 0 6 


537- 


>54 1 


cvy PunQPHn 


SITE 




PQ00 0 0 6 


539- 


>54 3 


OK? - PHOSPHO 


SITE 


PDOC00006 


PQOO 0 0 6 


54 3- 


>54 7 


CK2 PHOSPHO 


SITE 


PDOCO 0C0 6 


pcfififl 0 6 


593- 


>597 


0K2~PHOSPHO 


"site 


PDOC00006 


pq nn n n 6 


595- 


>599 


r\ ^ t n w j rnu 


SITE 


pDorononfi 

c UuL u u u u u 


PQOOO 0 6 


597- 


>60 1 


rt<rp — PHOSPHO~ 


"site 


pnoronoofi 


pq nnn o 6 


612- 


>61 6 


OK?~PHOSPHO 


SITE 




DQnnn n 6 


63 9- 


>64 3 




SITE 


PDOC00C0 6 


dq nnno 6 


652- 


>65 6 


CK2~PH0SPH0" 


"site 


PDOC00006 


roUUU U D 


667- 


> 67 1 


CK2 PHOSPHO" 


SITE 


tr LJ\J\^ KJ U KJ \J U 


PS00006 


68 3- 


>687 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


3S 


->45 


MYRISTYL 




PDOC00008 


PS00008 


107- 


>113 


MYRISTYL 




PDOC00008 


PS00008 


204- 


>210 


MYRISTYL 




PDOC00008 


PS00008 


414- 


>420 


MYRISTYL 




PDOC00008 


PS00008 


561- 


>567 


MYRISTYL 




PDOC00008 


PS00008 


613- 


>619 


MYRISTYL 




PDOC00008 


PS00016 


557- 


>560 


RGD 




PDOC00016 


PS00029 


163- 


>185 


LEUCINE ZIPPER 


PDOC0002 9 


PS00029 


475- 


>497 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


482- 


>504 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for DKFZphtes3_7p9 . 3! 
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DKFZphtes3_8e24 



group: signal transduction 

DKFZphtes3_8e24 . 3 encodes a novel 658 amino acid putative GTP-binding protein, related to 
yeast YGL099w and mouse MMR1 putative GTP-binding proteins. 

GTP-binding proteins are involved in various signal transduction pathways, transferring the 
signal of a cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 



strong similarity to guanine nucleotide binding proteins 

complete cDNA, complete cds, potential start at Bp 31, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 3290 bp 

Poly A stretch at pos. 3269, polyadenylation signal at pos. 3251 



1 CGTCCAGCGG TCGTGTTGCC ATGGGCCGGA GGAGAGCCCC GGCCGGTGGG 
51 TCGCTGGGAC GGGCCCTTAT GCGCCATCAG ACTCAGCGGA GCCGAAGCCA 
101 TCGTCACACT GACTCCTGGT TGCACACAAG TGAACTCAAT GATGGCTATG 
151 ATTGGGGTCG TCTTAATCTT CAGTCAGTGA CTGAACAGAG CTCCCTTGAT 
201 GACTTCCTTG CTACTGCAGA ACTTGCAGGA ACAGAGTTTG TAGCTGAAAA 
251 ACTTAATATT AAGTTTGTGC CTGCTGAGGC TAGAACTGGA CTACTGTCTT 
301 TCGAGGAGAG CCAGAGAATT AAGAAGCTCC ATGAAGAAAA CAAACAGTTC 
351 TTGTGTATAC CGAGGAGACC AAACTGGAAC CAAAATACTA CCCCAGAAGA 
401 ACTCAAACAA GCAGAGAAAG ATAACTTTCT AGAATGGAGA CGTCAGCTTG 
451 TCCGGCTAGA AGAGGAACAG AAGCTGATAT TGACTCCATT TGAACGAAAT 
501 TTGGACTTTT GGCGCCAGCT CTGGAGAGTC ATTGAGAGAA GTGATATTGT 
551 GGTCCAGATA GTAGATGCTC GAAACCCACT CCTGTTTAGA TGTGAGGATT 
601 TGGAATGTTA TGTGAAAGAA ATGGATGCCA ATAAGGAGAA CGTCATTCTG 
651 ATCAACAAGG CAGACTTGCT GACTGCTGAG CAGCGGAGTG CCTGGGCCAT 
701 GTACTTCGAA AAAGAAGATG TGAAGGTTAT TTTCTGGTCA GCTTTGGCCG 
751 GAGCCATTCC CCTGAATGGT GACTCTGAGG AAGAGGCAAA CAGAGATGAT 
301 AGACAAAGCA ACACAACTGA GTTTGGACAT TCCAGTTTCG ACCAGGCTGA 
851 AATTTCCCAC AGTGAATCCG AACATCTCCC AGCTAGGGAT TCTCCTTCAC 
901 TTAGTGAAAA TCCCACAACG GATGAAGATG ACAGTGAGTA TGAGGACTGT 
951 CCAGAGGAGG AGGAAGACGA CTGGCAGACG TGCTCAGAAG AAGACGGTCC 
1001 CAAGGAAGAG GACTGCAGCC AGGACTGGAA GGAAAGCTCT ACTGCAGATT 
1051 CTGAGGCTCG GAGCAGGAAA ACCCCACAGA AGAGGCAGAT ACACAATTTT 
1101 AGCCATCTGG TATCCAAGCA GGAGTTACTG GAGCTCTTTA AGGAGCTACA 
1151 CACTGGGAGA AAGGTGAAAG ATGGGCAACT TACGGTCGGA CTGGTGGGCT 
1201 ACCCTAATGT TGGTAAGAGT TCAACAATCA ACACCATCAT GGGCAACAAG 
1251 AAAGTATCTG TGTCTGCCAC ACCTGGTCAC ACAAAGCACT TTCAGACTCT 
1301 CTATGTGGAG CCTGGCCTCT GCCTGTGTGA CTGTCCTGGC TTGGTGATGC 
1351 CATCTTTTGT GTCTACCAAG GCAGAAATGA CTTGCAGCGG AATCCTCCCA 
1401 ATTGATCAGA TGAGAGATCA TGTTCCTCCT GTATCACTAG TTTGCCAGAA 
1451 TATTCCAAGA CATGTTTTAG AAGCTACCTA TGGCATTAAC ATCATAACGC 
1501 CTAGAGAGGA TGAAGATCCC CACCGACCTC CAACATCGGA AGAACTGTTG 
1551 ACAGCTTATG GATACATGCG AGGATTCATG ACAGCGCATG GACAGCCAGA 
1601 CCAGCCTCGA TCTGCGCGCT ACATCCTGAA GGACTATGTC AGTGGTAAGC 
1651 TGCTGTACTG CCATCCTCCT CCTGGAAGAG ATCCTGTAAC TTTTCAGCAT 
1701 CAACACCAGC GACTCCTAGA GAACAAAATG AACAGTGATG AAATAAAAAT 
1751 GCAGCTAGGC AGAAATAAAA AAGCAAAGCA GATTGAAAAT ATCGTTGACA 
1801 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGGAGTCCAG 
1851 GCTGTGATGG GTTACAAGCC CGGGAGTGGT GTAGTGACTG CATCCACTGC 
1901 GAGCTCTGAG AACGGGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 
1951 ATAAAAAAGA AAAAAGTCGT AGACTCTACA AGCACCTGGA TATGTGAGGT 
2001 TGGGCTGCAA CAGAAATGTC ATCTGCATTG TGCAGATGGA AAAGAGCAGA 
2051 AGCTGCCTGT TGCCTGTGGA ACTGTCCCAA GACACTAGCA CTGTAGAACG 
2101 GGCCCTGCTC TTGCAGAGCA CGGCTGCACC CAACAGTCTC CATGTCAAGA 
2151 CCAAGGGCCT CCTGGAAACA CCAGCTCTGA CAAAAAGGAG TCATCTGGGA 
2201 GCCCGAGAAT CCTACTCCTG GCCGGGCACA GTGGCTCACG CACCAACATG 
2 251 GAGAAACCCC GTCTCTACTA AAAATACAAA AAAATTAGCC AGGCGTGGTG 
2301 GCGCGCACCT GTAATCCCAG CTACTCGGGA GGCTGAGGCA GGAGAATCAC 
2351 TTGAACCAGG GAGGCAGAGT TTGCAGTGAA TGGAGATTGC GCCGCTGCAC 
2 401 TCCAGCCTGG GCGACAGAGT GAGACTGCAT CACAAGAAAA AAAATTTGCA 
2451 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 
2 501 CATTCTGAGT GTCCTAGTTG GGTTCCTCCG ACTCTAAACA AGGGACTTGG 
2551 GTTCAGTTAG TGTACAGCGG GGGCTCACGT CCACTAAGGA ACATGTAGAA 
2 601 TGTAACCACC GGGTGACAGG GAAGCTGCGG TATTTACTAC CTAGCCCCCA 
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2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 



TCTTCACTGG 
CATATAGAGG 
GATTGAGGTC 
GCGATGGTCA 
TTACGTTGGT 
ACAAAAGATA 
TGTCCTCACC 
TGGGATTATG 
TGCTGTTCTC 
TGTGGTGCCT 
CTGCTTCTCC 
TAGAAGCAAA 
AATTAAACCT 



TTATTCCACT 
AAGTAGATTA 
TTTCTTTTGC 
CACAACTCTG 
GAATTGTATG 
ATCAAAAGCA 
GAGTCTCATG 
GGGCAGAGTC 
CTGATATTGA 
CCCCTCTGTC 
TTCGCCTTCT 
AGCTGCTGTG 
CTTTTCTTTA 



TATTTAAAAT 
GTGGTTGCTT 
AGTGATAAAA 
AATATGCTTA 
GTATGTAAAT 
TGAAAGCACT 
TTGAAATGTA 
CTCACAAACG 
GTCCTCATCA 
TCCCTCCTGC 
AACATGATTG 
CTTCCTGTAC 
TAAAAAAAAA 



GTCCAGAATA 
CGGGATGGGA 
ATGTCCTAAA 
AGACCATTGA 
TATAGTTCAA 
ATTGATGTGG 
AGCCCCCTGG 
GTTTAGCACC 
CATCTGGTTG 
TCTGGCCATA 
TAAGTTTCCT 
CATCTACTGG 
AAAAAAAAGG 



AGCAAATCTC 
GGAATGGGAA 
ATTGACTGTA 
ATTACACACT 
TAACATAGTT 
TTTGGATCTG 
TGGGAGGCGA 
ACCCGCTCAG 
CTTCAAAGTG 
TAAGATGTGC 
GAGGCCTCCC 
ACCGTGAGCC 



BLAST Results 



Medline entries 



Peptide information for frame 3 



ORF from 21 bp to 1994 bp; peptide length: 658 
Category: strong similarity to known protein 



No BLAST result 



No Medline entry 



1 MGRRRAPAGG 

51 QSVTEQSSLD 

101 KKLHEENKQF 

151 KLILTPFERN 

201 MDANKENVIL 

251 DSEEEANRDD 

301 DEDDSEYEDC 

351 TPQKRQIHNF 

401 STINTIMGNK 

451 AEMTCSGILP 

501 HRPPTSEELL 

551 PGRDPVTFQH 

601 NVRALTKGVQ 

651 RLYKHLDM 



SLGRALMRHQ 
DFLATAELAG 
LCIPRRPNWN 
LDFWRQLWRV 
INKADLLTAE 
RQSNTTEFGH 
PEEEEDDWQT 
SHLVSKQELL 
KVSVSATPGH 
IDQMRDHVPP 
TAYGYMRGFM 
QHQRLLENKM 
AVMGYKPGSG 



TQRSRSHRHT 
TEFVAEKLNI 
QNTTPEELKQ 
IERSDIVVQI 
QRSAWAMYFE 
SSFDQAEISH 
CSEEDGPKEE 
ELFKELHTGR 
TKHFQTLYVE 
VSLVCQNIPR 
TAHGQPDQPR 
NSDEIKMQLG 
VVTASTASSE 



DSWLHTSELN 
KFVPAEARTG 
AEKDNFLEWR 
VDARNPLLFR 
KEDVKVIFWS 
SESEHLPARD 
DCSQDWKESS 
KVKDGQLTVG 
PGLCLCDCPG 
HVLEATYGIN 
SARYILKDYV 
RNKKAKQIEN 
NGAGKPWKKH 



DGYDWGRLNL 
LLSFEESQRI 
RQLVRLEEEQ 
CEDLECYVKE 
ALAGAIPLNG 
SPSLSENPTT 
TADSEARSRK 
L VGYPN VGKS 
LVMPSFVSTK 
11TPREDEDP 
SGKLLYCHPP 
IVDKTFFHQE 
GNRNKKEKSR 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_8e24, frame 3 

SWISSPROT:YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I., N = 3, Score =560, P = 1.6e-lll 

PIR:S64106 hypothetical protein YGL099w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 544, P = 2.6e-105 

TREMBL : CEAF3 1 4 3_1 gene: "C53H9.2"; Caenorhabditis elegans cosmid 
C53H9., N = 1, Score = 551, P = 2.9e-53 

SWISSPROT:MMRl_MOUSE POSSIBLE GTP-BINDING PROTEIN MMR1 . , N - 2, Score 
311, P = 7.5e-31 



>SWISSPROT: YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I . 

Length = 616 

HSPs : 

Score = 560 (84.0 bits), Expect = 1.6e-lll, Sum P(3) = 
Identities = 119/253 (47%), Positives - 163/253 (64%) 



1.6e-lll 



Query: 12 LGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLDDFLATAELAGT 71 

LGRA+ T+ R+ + H + + R L+SVT ++ LD+FL TAEL 



962 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Sbjct : 


12 


LGRAIQSDFTKNRRNRK--GGLKHIVDSDPKAH--RAALRSVTHETDLDEFLNTAELGEV 


67 


Query : 


72 


EFVAEKLNIKFVP-AEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWNQNTTPEELKQ 


130 




EF+AEK N+ + E LLS EE+ R K+ E+NK L IPRRP+W+Q TT EL + 




Sbjct: 


68 


EFIAEKQNVTVIQNPEQNPFLLSKEEAARSKQKQEKNKDRLTIPRRPHWDQTTTAVELDR 


127 


Query : 


131 


AEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDI VVQIVDARNPLLFR 


190 




E+++FL WRR L +L++ + I+TPFERNL+ WRQLWRVIERSD+VVQI VDARNPL FR 




Sbjct: 


128 


MERES FLNWRRNLAQLQDVEGFIVTPFERN LEI WRQLWRVIERSDVVVQIVDARNPLFFR 


187 


Que ry : 


191 


CEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVIFWSALAGAIPLNG 


250 




LE YVKE+ +K+N +L+NKAD+LT EQR+ W+ YF + ++ +F+SA A N 




Sbjct : 


188 


SAHLEQYVKEVGPSKKNFLLVNKADMLTEEQRNYWSSYFNENNIPFLFFSARMAA-EANE 


246 


Query: 


251 


D5EEEANRDDRQSN 264 








E+ + SN 




Sbjct : 


247 


RGEDLETYESTSSN 260 




Score 


= 532 


(79.8 bits). Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 




Identities ■ 


- 131/323 (40%), Positives = 192/323 (59%) 




Query : 


340 


STADSEARSRKTPQKRQIHNFSHLVSKQELLELFKELHTGRKVKDGQ--LTVGLVGYPNV 


397 






ST+ +E + +H+ S + + + L +F++ + + DG+ +T GLVGYPNV 




Sbjct : 


256 


STSSNEI PESLQADENDVHS-SRIATLKVLEGIFEKFAS — TLPDGKTKMTFGLVGYPNV 


312 


Query: 


398 


GKSSTINTIMGNKKVSVSATPGHTKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSG 


457 




GKSSTIN ++G+KKVSVS+TPG TKHFQT+ + + L DCPGLV PSF +T+A++ G 




Sbjct : 


313 


GKSSTINALVGSKKVSVSSTPGKTKHFQTINLSEKVSLLDCPGLVFPSFATTQADLVLDG 


372 


Query: 


458 


ILPIDQMRDHVPPVSLVCQNIPRHVLEATYGINI-ITPREDEDPHRPPTSEELLTAYGYM 


516 




+LPIDQ+R++ P +L+ + IP+ VLE Y I I I P E E P+++E+L + 




Sbjct : 


373 


VLPIDQLREYTGPSALMAERIPKEVLETLYTIRIRIKPIE-EGGTGVPSAQEVLFPFARS 


431 


Query: 


517 


RGFMTAH-GQPDQPRSARYILKDYVSGKLLYCHPPPG — RDPVTFQHQHQRLLENKMNSD 


573 






RGFM AH G PD R+AR +LKDYV+GKLLY HPPP F +H + + + SD 




Sbjct: 


432 


RGFMRAHHGTPDDSRAARILLKDYVNGKLLYVHPPPNYPNSGSEFNKEHHQKIVSA-TSD 


490 


Query: 


574 


EIKMQLGR NKKAKQIEN-IVDKTFFHQEN--VRALTKGVQAVM-G — YKPGSGVVTA 


624 






I +L R + E+ +VD +F QEN VR + KG M G YK + + 




Sbjct : 


491 


SITEKLQRTAISDNTLSAESQLVDDEYF-QENPHVRPMVKGTAVAMQGPVYKGRNTMQPF 


549 


Query : 


625 


STASSENGAGK-PWKKHGNRNKKEKSRRL 652 






+++ + K P G + K+R+L 




Sbjct : 


550 


QRRLNDDASPKYPMNAQGKPLSRRKARQL 578 




Score 


= 47 


(7.1 bits), Expect = 1.3e-60, Sum P(3) = 1.3e-60 




Identities - 21/84 (25%), Positives = 35/84 (41%) 




Query: 


552 


GRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQENVRALTKGVQA 


611 






G D T + + + + +DE + R K +E I +K F TK 




Sbjct: 


248 


GEDLETYESTSSNEI PESLQADENDVHSSRIATLKVLEGI FEK — FASTLPDGKTKMTFG 


305 


Query: 


612 


VMGYKPGSGVVTASTASSENGAGK 63 5 








++GY P G +ST ++ G+ K 




Sbjct : 


306 


LVGY-PNVG — KSSTINALVGSKK 326 




Score 


= 43 


(6.5 bits). Expect = 1.6e-lll, Sum P(3) = 1.6e-lll 




Identities • 


■ 7/13 (53%), Positives = 9/13 (69%) 




Query: 


638 


KKHGNRNKKEKSR 650 








KKH +NK+ K R 




Sbjct: 


596 


KKHNKKNKRSKQR 608 





Pedant information for DKFZphtes3_8e24, frame 3 



Report for DKFZphtes3_8e24 . 3 



[LENGTH] 658 

[MW] 75226.58 

[pi] 5.86 

[HOMOL] SWISSPROT: YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN CHROMOSOME 
I. 5e-56 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YGL099w] 3e-55 

[ FUNCAT] r general function prediction [M. jannaschii, MJ1464] le-16 

[FUNCAT] 08.16 extracellular transport [S. cerevisiae, YER006w] 3e-09 

[PIRKW] P-loop le-27 

[PIRKW] GTP binding le-27 

[SUPFAM] conserved hypothetical protein MG442 7e-08 
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[PROSITE] 


ATP GTP A 1 




[PROSITE] 


MYRISTYL 3 




[PROSITE] 


AM I DAT I ON 2 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


19 


[PROSITE] 


TYR PHOSPHO SITE 


2 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASNJ3LYCOSYLATION 


2 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 


4.56 % 



SEQ MGRRRAPAGGSLGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLD 

SEG xxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccccchhhhhhhhccccch 

SEQ DFLATAELAGTEFVAEKLNIKFVPAEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWN 

SEG 

PRD hhhhhhhhhhheeeecccceeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccccc 

SEQ QNTTPEELKQAEKDNFLEWRROLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDIVVQI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhcceeeee 

SEQ VDARNPLLFRCEDLECYVKEMDANKENVILINKADLLTAEQRSAWA.MYFEKEDVKVIFWS 

SEG 

PRD eccccccccchhhhhhhhhhhccccceeeeecccchhhhhhhhhhhhhhhhccceeeeec 

SEQ ALAGAIPLNGDSEEEANRDDRQSNTTEFGHSSFDQAEISHSESEHLPARDSPSLSENPTT 

SEG 

PRD cccccccccccchhhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

SEQ DEDDSEYEDCPEEEEDDWQTCSEEDGPKEEDCSQDWKESSTADSEARSRKTPQKRQIHNF 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccccccc 

SEQ SHLVSKQELLELFKELHTGRKVKDGQLTVGLVGYPNVGKSSTINTIMGNKKVSVSATPGH 

SEG 

PRD ccccchhhhhhhhhhhhhhhccccceeeeeecccccccccceeeeccccceeeeeccccc 

SEQ TKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSGILPIDQMRDHVPPVSLVCQNIPR 

SEG 

PRD cceeeeeeeccceeeccoccccccccchhhhhhhhccccccccccccccceeeeecccch 

SEQ HVLEATYGINIITPREDEDPHRPPTSEELLTAYGYMRGFMTAHGQPDQPRSARYILKDYV 

SEG 

PRD hhhhhhhhccccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhcc 

SEQ SGKLLYCHPPPGRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQE 

SEG 

PRD ccceeeeccccccccccchhhhhhhhhhcccchhhhhhhhcchhhhhhhhhhhhccccch 

SEQ NVRALTKGVQAVMGYKPGSGVVTASTASSENGAGKPWKKHGNRNKKEKSRRLYKHLDM 

SEG 

PRD hhhhhhhceeeeeecccccceeecccccccccccccccccccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_8e24 . 3 



PS00001 


264- 


>268 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


359- 


>363 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00004 


410- 


>414 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


21 


->24 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


26 


->29 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


97- 


>100 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


348- 


>351 


PKC~ 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


378- 


>381 


PKC 


PHOSPHO 


"site 


PDOC00005 


PS000D5 


448- 


>451 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


493- 


>496 


PKC 


PHOSPHO 


[site 


PDOC00005 


PS00005 


531- 


>534 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


541- 


>544 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


649- 


>652 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00006 


52 


->56 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


57 


->61 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


93 


->97 


CK2~ 


PHOSPHO 


"site 


PDOC00006 


PS00006 


123- 


>127 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


155- 


>159 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


252- 


>256 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


271- 


>275 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


279- 


>283 


CK2 


"PHOSPHO 


"site 


PDOC00006 
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PS0000 6 


281- 


->285 


CK2 PHOSPHO 


SITE 


PDOC00006 


£ kJ \J <J \J \J \J 


293- 


->297 


CK2 _ PHOSPHO~ 


"site 


PDOC0000 6 


PS00006 


299- 


->303 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


305- 


->309 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00006 


320- 


->324 


CK2 _ PHOSPHO 


"site 


PDOC00006 


PS00006 


322- 


->32 6 


CK2 — PHOSPHO 


"site 


PDOC00006 


PS00006 


340- 


->34 4 


CK2~ PHOSPHO" 


"site 


PDOC00006 


PS00006 


365 


->369 


CK2 _ PHOSPHO 


"site 


PDOC00005 


PS00006 


449- 


->453 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


493- 


->497 


CK2 _ PHOSPHO" 


"site 


PDOC00006 


PS00006 


505- 


->509 


CK2~PHOSPHO~ 


"site 


PDOC00006 


PS00007 


480- 


->488 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


190 


->198 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 




3->15 


MYRISTYL 




PDOC00008 


PS00008 


432- 


->438 


MYRISTYL 




PDOC00008 


PS00008 


620- 


->626 


MYRISTYL 




PDOC00008 


PS00009 




l->5 


AM I DAT I ON 




PDOC00009 


PS00009 


378- 


->382 


AMIDATION 




PDOC00009 


PS00017 


393 


->401 


ATP OTP A 




PDOC00017 



(No Pfam data available for DKFZphtes3_8e24 . 3 ) 
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DKFZphtes3_8gll 



group: testes derived 

DKFZphtes3_8gll encodes a novel proline-rich 939 amino acid protein without similarity to 
known proteins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) . 

No informative BLAST results; No predictive prosite, pfam or -SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown, prolin ritch protein 
1 EST hit (from testis library) 
Sequenced by MediGenomix 



Locus: unknown 



Insert length: 3100 bp 

Poly A stretch at pos. 3056, polyadenylation signal at pos. 3041 



1 AGAGTCTTCC CTCAGCATAT TTTACGATAG AGAAGATCTT GTTCCAATGG 

51 AAGAAAGTGA GGACTCACAC AGTGATTCCC AGACAAGGAT TTCTGAGTCC 

101 CAACACTCCC TCAAGCCAAA TTATCTTTCC CAGGCCAAGA CTGACTTCTC 

151 AGAACAGTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA GCAGCAAAAC 

201 TCTTAAGGAG TCAAATACCC CCCGATGTGC CTCCACCTCT AGCTTCAGGT 

251 CTAGTCCTAA AATACCCTAT CTGCCTACAG TGTGGCCGAT GTTCAGGACT 

301 TAATTGCCAT CATAAATTAC AGACCACTTC GGGGCCTTAT CTTCTTATCT 

351 ATCCACAGCT CCACCTTGTA CGCACTCCTG AAGGCCATGG TGAGGTTCGG 

401 TTGCATCTTG GCTTTAGGCT GAGAATTGGG AAAAGATCCC AAATCTCAAA 

451 GTATCGTGAA AGAGATAGAC CCGTCATACG GAGAAGCCCT ATATGACCAT 

501 CACAAAGGAA AGCTAAAATC TATACTCAAG CTTCCAAGAG TCCTACTTCC 

551 ACAATAGATT TGCAGTCTGG GCCTTCCCAG TCCCCTGCTC CTGTACAAGT 

601 CTACATCAGG CGAGGACAAC GCAGCAGGCC TGACTTAGTA GAAAAGACAA 

651 AAACTAGAGC ACCTGGGCAC TATGAATTCA CTCAAGTTCA CAACCTACCA 

701 GAGAGTGACT CTGAAAGCAC TCAGAATGAA AAACGGGCTA AAGTGAGAAC 

751 CAAAAAGACC TCTGATTCAA AATATCCAAT GAAGAGAATC ACCAAGCGAC 

801 TTAGAAAACA CAGAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 

851 CCTTCTAGGG AATTAGCAGC CCATTTAAGA AGGAAGAGGA TTGGAGCAAC 

901 TCAGACAAGT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 

951 CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAAGCGGGC ATTCCAAACA 

1001 GCACACAGAG TTATAGCTTC TGTTGGGCGG AAGCCTGTGG ACGGGACAAG 

1051 GCCAGACAAT TTGTGGGCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 

1101 GGGACTATTG CTTACCAAGC AGTATCAAAA GAGACAAGAG GTCAGCTGAC 

1151 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTGTGGGG 

1201 AGGAACGGTC CAGTGCAGAT CAGCTCAACA GCCAAGAAGA GCTTACTCTT 

1251 TCCAACCCAG ACCTCTTCGA CTGCCCAAGC CCACAGATTC CCAAAGTGGT 

1301 ATTGCTTTCC AAACTGCCTC AGTGGGGCAG CCTCTGAGAA CTGTTCAAAA 

1351 GGACAGTAGT AGCAGATCAA AGAAAAACTT CTATAGAAAT GAAACCTCCA 

1401 GCCAGGAGTC TAAGAACTTG TCCACACCAG GAACCAGAGT TCAGGCCCGA 

1451 GGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 

1501 TAAAGACAAA CTCACACACA AGGAGCATAA CCACCCCAGC TTCTATAGGG 

1551 AGAGAACCCC ACGCGGTCCT TCTGAGAGAA CCCGTCATAA CCCCTCTTGG 

1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACGCAGTT CCTTGGAGAG 

1651 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 

1701 ACCATTCCAG TCCTTCTGAG AGAAGCTGGC GCAGTCCGTC TCAGAGAAAT 

1751 CACTGCAGTC CCCCCGAGAG GAGCTGTCAC AGTCTCTCTG AAAGGGGCCT 

1801 TCACAGTCCC TCTCAGAGGA GCCATCGCGG TCCCTCTCAG AGAAGACATC 

1851 ACAGTCCCTC AGAGAGAAGC CATCGCAGTC CCTCAGAGAG AAGCCATCGC 

1901 AGTCCCTCTG AGAGAAGACA TCGCAGTCCC TCCCAGAGGA GCCATCGCGG 

1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC 

2001 CCTCTCAGAG GAGCCATCGT GGTCCCTCTG AGAGAAGACA TCACAGTCCC 

2051 TCTAAGAGAA GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 

2101 AGAGAGAAGC CATCACAGTC CCTCTGAGAG AAGCCATCAC AGTCCCTCTG 

2151 AGAGAAGACA TCACAGTCCC TCTGAGAGAA GCCATTGCAG TCCCTCTGAG 

2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC CCTCTGAGAG 

2251 AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTGAGAGAA 

2 301 GCCATCACAG TCCCTCTGAG AGAAGACGTC ACAGTCCCTT GGAGAGGAGC 

2 351 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGAGG AG AT C 

2401 TCACAGGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 

2 451 GTCCCTCAGA GAAGAGCCAC CTCAGTCCCT TGGAAAGAAG CCGTTGCAGT 

2501 CCCTCTGAGA GGAGAGGACA CAGTTCCTCT GGGAAAACCT GTCACAGTCC 

2551 CTCTGAGAGA AGCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 

2601 CTGAGAGGAG CCATCGCAGT TCCTGTGAGA GAACCCGTCA CAGTCCCTCT 
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2 651 GAGATGAGGC CAGGGAGGCC CTCTGGGAGG AACCATTGCA GTCCCTCTGA 
2701 GAGGAGCCGA CGCAGTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 
2751 GAGAGAGGCC CAGCCATAGT TTGTCTAGAG ATTTCAAGAA TCAAACAACT 
2801 CTCCTCGGGA CCACACATAA AAATCCCAAA GCAGGGCAAG TGTGGAGGCC 
2851 TGAAGCTACT CGATGAGGCG AGGTCCGCCC CTATTATTCA TTGTCCTAAG 
2901 TCTTCATCGT GCTGCCCTTT CCAGGCTTCT TTCCTGCTCA GCCACTGCCT 
2951 CCAATTCCTG CGCCCCCAGC GTGGAAAGGC TTCCATTTCT CTCTACCGGG 
3001 GGGGAGGCGG GTGAGAATGG GTCTGTAATT TCTCTAAGAT GAATAAAGGG 
3051 GCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 2863 bp; peptide length: 939 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: ATP GTP A (824-832) 



1 MEESEDSQSD SQTRISESQH SLKPNYLSQA KTOFSEQFQL LEDLQLKIAA 

51 KLLRSQIEPD VPPPLASGLV LKYPICLQCG RCSGLNCHHK LQTTSGPYLL 

101 I YPQLHLVRT PEGHGEVRLH LGFRLRIGKR SQISKYRERD RPVIRRSPIS 

151 PSQRKAKIYT QASKSPTSTI DLQSGPSQSP APVQVYIRRG QRSRPDLVEK 

201 TKTRAPGHYE FTQVHNLPES DSESTQNEKR AKVRTKKTSD SKYPMKRITK 

251 RLRKHRKFYT NSRTTIESPS RELAAHLRRK RIGATQTSTA SLKRQPKKPS 

301 QPKFMQLLFQ SLKRAFQTAH RVIASVGRKP VDGTRPDNLW ASKNYYPKQN 

351 ARDYCLPSSI KRDKRSADKL TPAGSTIKQE DILWGGTVQC RSAQQPRRAY 

401 SFQPRPLRLP KPTDSQSGIA FQTASVGQPL RTVQKDSSSR SKKNFYRNET 

451 SSQESKNLST PGTRVQARGR ILPGSPVKRT WHRHLKDKLT HKEHNHPSFY 

501 RERTPRGPSE RTRHNPSWRN HRSPSERSQR SSLERRHHSP SQRSHCSPSR 

551 KNHSSPSERS WRSPSQRNHC SPPERSCHSL SERGLHSPSQ RSHRGPSQRR 

601 HHSPSERSHR SPSERSHRSP SERRHRSPSQ RSHRGPSERS HCSPSERRHR 

651 SPSQRSHRGP SERRHHSPSK RSHRSPARRS HRSPSERSHH SPSERSHHSP 

701 SERRHHSPSE RSHCSPSERS HCSPSERRHR SPSERRHHSP SEKSHHSPSE 

751 RSHHSPSERR RHSPLERSRH SLLERSHRSP SERRSHRSFE RSHRRISERS 

801 HSPSEKSHLS PLERSRCSPS ERRGHSSSGK TCHSPSERSH RSPSGMRQGR 

851 TSERSHRSSC ERTRHSPSEM RPGRPSGRNH CSPSERSRRS PLKEGLKYSF 
901 PGERPSHSLS RDFKNQTTLL GTTHKNPKAG QVWRPEATR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8gll , frame 2 

TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific 
acidic repeat protein precursor"; Phytophthora infestans cyst 
germination specific acidic repeat protein precursor (car90) gene, 
complete cds . , N = 1, Score = 457, P = 2.3e-39 

TREMBL:AC004561_38 gene: "F16P2.41"; product: "putative proline-rich 
protein"; Arabidopsis thaliana chromosome II BAC F16P2 genomic 
sequence, complete sequence., N = 1, Score = 340, P = 4.2e-27 

TREMBL:AF062 655_1 product: "plenty-of-prolines-101"; Mus musculus 
plenty-of-prolines-101 mRNA, complete cds., N = 1, Score = 313, P = 
3.6e-24 

PIR:PN0099 son3 protein - human (fragment), N - 1, Score = 292, P = 
1.2e-22 

>TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific acidic 
repeat protein precursor"; Phytophthora infestans cyst germination 
specific acidic repeat protein precursor (car90) gene, complete cds. 
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Length = 1,489 

HSPs : 



Score = 457 (68.6 bits), Expect = 2.3e-39, P = 2.3e-39 
Identities = 91/444 (20%), Positives = 239/444 (53%) 



Query : 


475 


SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 


533 




+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 




Sb j ct : 


584 


APT EETMYAPIEET-T YAPTEETT YAPAEETPYEPTEETTYAPTEETT YAPTEETT YAST 


642 


Query: 


534 


ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 


593 






E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P+ + + 




Sb j ct : 


643 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 


702 


Query : 


594 


RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 


653 




P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 




Sb j ct : 


703 


YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 


762 


Query : 


654 


QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 


713 






+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 




Sb j ct : 


763 


EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 


822 


Query : 


714 


CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 


773 




+P+E + P+E +P+E ++P+E++ ++P+E++ ++P+E ++P E + + 




Sbjct: 


823 


YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 


882 


Query : 


774 


ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 


832 




E + +P++ ++ E + + E +++P+E++ +P E + P+E ++ + +T 




Sbjct: 


883 


EETTYAPTKETTYAPTEETTYASTEETT YAPTEET7YAPAEETPYEPTEETTYAPTEETT 


942 


Query: 


833 


HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 


B92 




4-4-p+f? .j. +p+ +E + + E T + P+E P+ +P+E + +P+ 




Sbjct : 


943 


YAPTEETT YAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 


1002 


Query : 


893 


KEGLKYSFPGERPSHSLSRDFKNQTT 918 






+E Y+ P E +++ + + + T 




Sbjct: 


1003 


EE-TTYA-PTEETTYAPAEETPYEPT 1026 




Score 


= 445 


(66.8 bits), Expect = 4.5e-38, P = 4.5e-38 




Identities = 


= ftT/^94 f?1%l Primitives = 212/394 (53%) 




Query : 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 




E TP P+E T + P+ +P+E + + E ++P++ + +P+ + P+E + 




Sbjct : 


763 


EETPYAPTSETTYSPTGETT YAPTEETT YAPTEETT YAPTEETT YAPTEETPYKPTEETT 


822 


Query : 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 






+P++ p E + ++ +E ++P++ + P+++ ++P+E + +P+E + P+ 




Sbjct ; 


823 


YAPTEETPYEPTEETTYTPTEETT YAPTEETT YAPTEKTT YAPTEETT YAPTEETPYEPT 


882 


Query : 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E +P++ + P+E + + +E +P++ + P+E + P++ + +P + 




Sbjct: 


883 


EETTYAPTKETT YAPTEETT YASTEETTYAPTEETTYAPAEETPYEPTEETT YAPTEETT 


942 


Query : 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 




+P+E + ++P+E + ++P+E ++P+E + P+E + +P+E +P+E ++P 




Sbjct: 


943 


YAPTEETT YAPTEETT YAPTEETT YAPAEETPYEPTEETTY A PTEETTYAPTEETMYAPI 


1002 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 




E++ ++P+E + ++P+E + P E + ++ E + +P+E ++ S E + + E + 




Sbjct: 


1003 


EETT YAPTEETT YAPAEETPYEPTEETT YAPTEETT YAPTEETT Y AST EETT YAPTEETT 


1062 


Query : 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 




++P+E++ P E + + P+E ++ + +T ++P+E + +P+ +E + 




Sbjct : 


1063 


YAPAEETPYEPTEETTYAPT EETT YAPTEETTYAPT EETT YAPTEETTYAPAEETPYEPT 


1122 


Query : 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 








E T ++P+E P+ +P E + P+E 




Sbjct: 


1123 


EETT YAPT EETT YAPTEETM YAP I EETT YGPTEE 1156 




Score 


= 439 


(65.9 bits), Expect = 2.0e-37, P = 2.0e-37 





Identities = 86/421 (20%), Positives = 223/421 (52%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 848 APTEETTYAPT-EKTT YAPTEETT YAPTEETPYEPTEETTYAPTKETT YAPT EETT YAST 906 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 

Sbjct: 907 EETT YAPTEETT YAPAEETPYEPTEETT YAPTEETT YAPTEETT YAPTEETT YAPTEETT 966 
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Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + P+E + +P+E P+ 
SbjCt: 967 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 1026 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
Sbjct: 1027 EETTYAPTEETTYAPTEETT YASTEETTYAPTEETTYAPAEETPYEPTEETT YAPTEETT 1086 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
Sbjct: 1087 YAPTEETT YAPTEETT YAPTEETT YAP AEETPYEPTEETT YAPTEETT YAPTEETMYAP I 1146 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 

Sbjct: 1147 EETT YGPTEETTY APT EATT YAPTEETP YAPTEETT YEPTGETT YAPTEETT YAPTEETT 1206 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P 

Sbjct: 1207 YAPTEETT YAPTEETPYEPTEETTYAPTEETTYEPTEETT YAPTEETT YAPTEETT Y APT 1266 

Query: 893 KE 894 
+E 

Sbjct: 1267 EE 1268 

Score = 439 (65.9 bits). Expect = 2.0e-37, p = 2.0e-37 
Identities = 91/434 (20%), Positives = 232/434 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 440 APTEETTYAPT-EKTT YAPTEETT YAPT EETP YE PTEETT YAPT KETT YAPTEETT Y AST 498 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 4 99 EETTYAPTEETT YAP AEETPYEPTEETT YAPTEETT YAPTEETT YAPTEETT YAPTEETT 558 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +p + + P+E + +P+E P+ 
Sbjct: 559 YAPAE ETPYEPTEETT YAPTEETT YAPTEETMYAP IEETT YAPTEETT YAPAEETPYEPT 618 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 
Sbjct: 619 EETT YAPTEETT YAPTEETTYASTEETT YAPTEETT YAP AEETPYEPTEETT YAPTEETT 678 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
Sbjct: 679 YAPTEETT YAPTEETT YAPT EETT YAPAEETPYEPTEETT YAPTEETT YAPTEETMYAP I 738 

Query: 774 ERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 
Sbjct: 739 EETT YGPTEETTYAPTEATTYAPTEETP YAPT EETT YEPTGETT YAPTEETT YAPTEETT 798 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E ++P TE++ET ++P+E P P+ +P+E + +P 

Sbjct: 799 YAPTEETTYAP TEET PYEPT- EETT YAPTEETP YE PTEETTYTPTEETT YAPT 850 

Query: 8 93 KEGLKYSFPGERPSHS 908 

+E Y+ P E+ +++ 
Sbjct: 851 EE-TTYA-PTEKTTYA 864 

Score = 437 (65.6 bits), Expect = 3.3e-37, P = 3.3e-37 
Identities = 85/417 (20%), Positives = 223/417 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E+ ++P++ + +P+ + P+E + 

Sbjct: 419 EETP YE PTEETTYTPTEETTYAPTEETTYAPTEKTT YAPTEETT YAPTEETP YE PTEETT 478 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 

Sbjct: 479 YAPTKETT YAPTEETT YASTEETTYAPTEETTYAPAEETP YE PTEETT YAPTEETT YAPT 538 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRS PSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E P++ + P+E ++P++ + +P + 

Sbjct: 539 EETT YAPTEETT YAPTEETT YAPAEETPYEPTEETTYAPTEETT YAPTEETMYAP I EETT 598 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + + P+E ++P+E + +P+E + + +E +P+E ++P+ 

Sbjct: 599 YAPTEETT YAPAEETPYEPTEETT YAPT EETT YAPTEETTYASTEETT YAPTEETT YAP A 658 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + P+E + ++P+E ++P E + ++ E + +P+E ++ E + + E + 

Sbjct: 659 EETP YE PTEETT YAPTEETT YAPTEETT YAPTEETTYAPTEETT YAP AEETPYEPTEETT 718 

969 
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Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 






++P+E++ +P E + +P E + + +T ++P+E + +P+ +E + 




Sbjct: 


719 


YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 


778 


Query : 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTT 918 






T ++P+E P+ +P+E + +P +E Y P E +++ + + + T 




Sbjct: 


779 


GETTYAPTEETTYAPTEETT YAPTEETTYAPTEE-TPYE-PTEETTYAPTEETPYEPT 834 


Score 


= 428 


(64.2 bits). Expect = 3.1e-36, P - 3.1e-36 




Identities = 


= 89/440 (20%), Positives = 228/440 (51%) 




Query : 


473 


PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 


531 






P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 




Sbjct : 


470 


PYEPTEETTYAPTKET-TYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYA 


528 


Query : 


532 


SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQR 


591 






E ++P++ + +P+ + +P+E + +P++ P E + ++ +E ++P++ 




Sbjct : 


529 


PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 


58B 


Query : 


592 


SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 


651 






+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E + 




Sbjct: 


589 


TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 


648 


Query: 


652 


PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 


711 






P++ + P+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 




Sbjct: 


649 


PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 


708 


Query : 


712 


SHCSPSERSHCSPSERRHRS PSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 


771 






+ P+E + +P+E +P+E + + P E++ + P+E + + + P+E + + P E + ++ 




Sbjct : 


709 


TPYEPTEETTYAPTEETT YAPTEETMYAPI EETTYGPTEETTYAPTEATTYAPTEETPYA 


768 


Query : 


772 


LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 


830 






E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 




Sbjct : 


7 69 


PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 


828 


Query : 


831 


TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 


890 






T + P+E + +P+ +E + + E+T ++P+E P+ P+E + + 




Sbjct: 


829 


TPYEPTEETTYTPTEETT YAPTEETTYAPTEKTT YAP7EETT YAPTEETPYEPTEETTYA 


888 


Query: 


891 


PLKEGLKYSFPGERPSHSLSRD 912 








P KE Y+ P E +++ + + 




Sbjct: 


889 


PTKE-TTYA-PTEETTYASTEE 908 




Score 


- 427 


(64.1 bits), Expect = 4.0e-36, P = 4.0e-36 




Identities = 


= 81/394 (20%), Positives = 213/394 (54%) 




Query : 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 






E T GP+E T + P+ +P+E + + E + P+ + +P+ + +P+E + 




Sbjct : 


739 


EETTYGPTEETT YAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 


798 


Query: 


5 62 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 






+P++ +P E + + +E ++P++ + P++ ++P+E + +P+E + +P+ 




Sbjct: 


799 


YAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPT 


858 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E+ +P++ + P+E + P+E +P++ + P+E ++ ++ + +P + 




Sbjct : 


859 


EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETT 


918 


Query: 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 






+P+E + + P+E + ++P+E ++P+E + +P+E + +P+E +P+E + P+ 




Sbjct : 


919 


YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 


978 


Query : 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


BOO 






E++ ++P+E + ++P+E ++P+E + ++ E + +P+E + E + + E + 




Sbjct: 


979 


EETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


1035 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


B60 






++P+E++ + E + +P+E ++ + +T + P+E + +P+ +E + + 




Sbjct: 


1039 


YAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPT 


109H 


Query : 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 








E T ++P+E P+ P+E + +P +E 




Sbjct: 


1099 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 1132 




Score 


= 424 


(63.6 bits), Expect = 8.5e-36, P = 8.5e-36 




Identities = 


■ 81/394 (20%), Positives = 210/394 (53%) 




Query : 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 






E T P+E T + P+ +P+E + + E + P++ + +P+ + +P+E + 




Sbjct: 


939 


EETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETM 


998 






970 
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Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P + +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 
Sbjct: 999 YAP I EETT YAPTEETT YAPAEETPYEPTEETT YAPTEETT YAPTEETTYASTEETTYAPT 1058 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E +P++ + P+E + +P+E +P++ + P+E ++P++ + +PA + 
SbjCt: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P E + P+E + +P+E +P+E ++P+ 
SbjCt: 1119 YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ +9+ + ++P+E ++P E + ++ E + +P+E + E + + E + 
Sbjct: 1179 EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 1238 

Query: 801 HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 

+ P+E++ +P E + +P+E ++ + +T ++P + + P+ +E + + 

Sbjct: 1239 YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 861 ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+G +P+E + +P +E 

Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEE 1332 



Score = 422 (63.3 bits). Expect = 1.4e-35, P = 1.4e-35 
Identities = 84/407 (20%), Positives = 216/407 (53%) 



Query : 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


561 






E T P+E T + P+ P+E + + E + P++ + +P+ + +P+E + 




Sbjct: 


795 


EETTYAPTEETTYAPTEETP YEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETT 


854 


Query: 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 


621 






+P+++ +P E + ++ +E + P++ + P++ ++P+E + + +E + +P+ 




Sbjct: 


855 


YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYASTEETTYAPT 


914 


Query: 


622 


ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 


681 






E + P++ + P+E + +P+E +P++ + P+E ++P++ + +PA + 




Sbjct: 


915 


EETTYAPAEETPYEPTEETT YAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 


974 


Query : 


682 


RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 


741 






P+E + ++P+E + ++P+E ++P E + +P+E + +P+E P+E ++P+ 




Sbjct: 


975 


YEPTEETTYAPTEETTYAPTEETMYAPI EETTYAPTEETTYAPAEETPYEPTEETTYAPT 


1034 


Query: 


742 


EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 


800 






E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E + + E + 




Sbjct: 


1035 


EETTYAPTEETTYASTEETT YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETT 


1094 


Query: 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


860 






++P+E++ +P E + +P+E + + +T ++P+E + +P+ E + 




Sbjct: 


1095 


YAPTEETT YAPTEETT YAPAEETPYE PTEETT YAPTEETT YAPTEETMYAPIEETTYGPT 


1154 


Query: 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 








E T ++P+E P+ +P+E + P E Y+ P E +++ 




Sbjct: 


1155 


EETTYAPTEATTYAPTEETP YAPTEETT YEPTGE-TTYA-PTEETTYA 12 00 





Score = 421 (63.2 bits), Expect = 1.8e-35, P = 1.8e-35 
Identities = 36/418 (20%), Positives = 219/418 (52%) 



Query: 491 HKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSR 550 

H H E T P+E T + P+ +P+E + + E + P++ + +P+ 

Sbjct: 376 H YAH I EKPCDTEVTMYAPTEETT YAPTEETT YAPTEETT YAPTEETP YE PTEETTYTPTE 435 

Query: 551 KNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHR 610 

+ +P+E + +P+++ +P E + ++ +E + P++ + P++ ++P+E + 
Sbjct: 436 ETT YAPTEETT YAPTEKTT YAPTEETT YAPTEETP YEPTEETTYAPTKETT YAPTEETT Y 495 

Query: 611 SPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSK 670 

+ +E + +P+E +P++ + P+E + +P+E +P++ + P+E ++P++ 
SbjCt: 496 AST EETT YAPTEETT YAPAEETPYEPTEETT YAPTEETT YAPTEETT YAPTEETT YAPTE 555 

Query: 671 RSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHR 730 

+ +PA + P+E + ++P+E + ++P+E ++P E + +P+E + +P+E 
Sbjct: 556 ETTYAPAEETP YE PTEETTYAPTEETTYAPTEETMYAPIEETT YAPTEETT YAP AEETPY 615 

Query: 731 SPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFE 790 

P+E ++P+E++ ++P+E + ++ +E ++P E + ++ E + P+E ++ E 
Sbjct: 516 EPTEETT YAPTEETT YAPTEETT YASTEETT YAPTEETT YAPAEETPYEPTEETT YAPTE 675 

Query: 791 RS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQG 849 
+ + E +++P+E++ +P E + +P+E + + +T ++P+E + +P+ 



971 
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Sbjct: 


675 


Query: 


850 


Sbjct: 


736 


Score 


= 420 


Identities : 


Query : 


502 


Sbj ct : 


971 


Query : 


562 


Sbjct : 


1031 


Query : 


622 


Sbj ct : 


1091 


Query: 


682 


Sbjct : 


1151 


Query: 


742 


Sbjct: 


1211 


Query : 


801 


Sbjct: 


1271 


Query: 


861 


Sbjct: 


1331 


Score 


= 419 


Identities = 


Query : 


502 


Sbjct: 


947 


Query: 


562 


Sbjct: 


1007 


Query : 


622 


Sbjct : 


1067 


Query: 


682 


Sbjct: 


1127 


Query: 


742 


Sbjct: 


1187 


Query : 


801 


Sbjct: 


1247 


Query : 


861 


Sbjct: 


1307 


Score 


= 415 


Identities = 


Query: 


473 


Sbjct: 


878 


Query: 


532 


Sbjct: 


937 


Query: 


592 



ETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMY 735 



908 



E + 



E T ++P+E 



P+ 



2.3e-35, P - 2.3e-35 



Y+ P E +++ 



E TP P+E T + P+ 



+ P+E + + +E 



+P E + ++ +E 



+ + P + + + 



++P++ + +P+ + 



+ P+E + +P+E + +P+ 



P+E + +P+E 



P+E + ++P+E + ++P+E 



++P++ + +P 



P+ 



E++ ++P+E + + P+E ++P E + + E + +P+E ++ E + + 



++P +++ 



P E + +P+E 



E T ++P E 



++ + +T ++P+E + 



-SGRNHCSPSE 885 
S C+ E 



P+G 



+E + 



(52. 9 bits), Expect = 3.0e-35, P - 3.0e-35 
83/411 (20%), Positives = 215/411 (52%) 



P+E T + P+ 



++P++ + +P+ + 



++P++ + 



P++ 



+P+E + +P+ 



P++ + 



P+E + +P+E 



+ P++ + 



P+E 



++P++ + 



+P+E + ++P+E + ++P E 



+ P+E + +P+E + +P+E 



++ ++P+E + ++P+E ++P E + ++ E + 



P+E ++ 



+ P+E 



E + + 



t Pt 



E + 



++P+E++ +P E + +P+E 



+T + P+E + +P+ 



E T + P+ 



P+ 



+P+E + +P++E 



P E + ++S + 



P + T + 



K+ T+ 



8.0e-35 



P+E T + P+ 



P+E + 



++P++ + +P+ + +P+E + +P++ 



++P+E + +P+E + 



P E + ++ +E 



++P++ 



P+E + + +E 



972 
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Sbjct: 997 TMYAPIEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYA 1056 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

P++ + P+E + P++ + +P + +P+E + ++P+E + ++P+E ++P+E 
Sbjct: 1057 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 1116 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 

Sbjct: 1117 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 1176 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 
Sbjct: 1177 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 1236 

Query: 831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 

T + P+E + +P+ +E + + E T ++P + P+ +P+E + + 

Sbjct: 1237 TTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYA 1296 

Query: 891 PLKE 894 
P +E 

Sbjct: 1297 PTEE 1300 

Score = 403 (60.5 bits), Expect = 1.6e-33, P = 1.6e-33 
Identities = 84/394 (21%), Positives = 213/394 (54%) 

Query: 501 RERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHS5PSERS 560 

RE T PSE T + P +P+E+ +E + + ++ +P++ ++P+ER 

Sbjct: 319 REETTAAPSEDTT YAPREVTPYAPTEKPY — DVEETTYVTEESTY-APTKSETNAPTERM 375 

Query: 561 WRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSP 620 

{■ ++ C E + ++ +E ++P++ + P++ + + P+E + P+E + +P 
Sbjct: 376 HYAHIEKP-CDT-EVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTP 433 

Query: 621 SERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRS 680 

+E +P++ + P+E++ +P+E +P++ + ?+E ++P+K + +P + 

Sbjct: 434 TEETTYAPTEETT YAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEET 493 

Query: 681 HRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHR5PSERRHHSP 740 

+ +E + ++P+E + + + P + E + P+E + +P+E + +P+E +P+E ++P 
Sbjct: 494 TYASTEETTYAPTEETTYAFAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAP 553 

Query: 741 SEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISER 799 

+E++ ++P+E + + P+E ++P E+++E++PE++ E++ E 
Sbjct: 554 TEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEET 613 

Query: 800 SHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSS 859 

+ P+E++ +P E + +P+E ++S+ +T ++P+E + +P+ +E + + 

Sbjct: 614 PYEPTEETT YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAP 673 

Query: 860 CERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E 1 ++P+E P+ +P+E + +P +E 

Sbjct: 674 TEETTYAPTEETT YAPTEETTYAPTEETTYAPAEE 708 

Score = 398 (59.7 bits), Expect = 5.5e-33, P = 5.5e-33 
Identities = 84/402 (20%), Positives = 209/402 (51%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 992 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 1050 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 
Sbjct: 1051 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 1110 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
Sbjct: 1111 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 1170 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 1171 EETPYAPTEETT YE PTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEET PYEPTEETT 1230 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + P+E +P+E ++P+E++ ++P+E + ++P + + P E + ++ 
Sbjct: 1231 YAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPT 1290 

Query: 774 ERSHRSPSERRSHRSFERSHRRI SERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCH 833 

E + +P+E + E E ++ P+ ++ +P E + +P+E ++ +T + 

Sbjct: 1291 EATTYAPTEETPYAPTE ETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPY 1343 

Query: 834 SPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 
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P+E S + S + TE+ +ET PS+ P+ 
Sbjct: 1344 EPAEESTSTVSTEKPCNTEEFTDEPTDEPT-DEPSDEPTDEPT 1385 

Score = 368 (55.2 bits), Expect = 9.5e-30, P = 9.5e-30 
Identities = 79/386 (20%), Positives = 211/386 (54%) 

Query: 524 PSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSER 583 

PS+ ++ + E + P + + +PS +P E + +P+++ + E + + ++E 

Sbjct: 303 PSDETEAPT-EGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY — DVEETTY-VTEE 358 

Query: 58 4 GLHSPSQRSHRGPSQRRHHSPSER SHRSPSERSHRSPSERRHRSPSQRSHRGPS 637 

++P+ + P++R H++ E+ + +P+E + +P+E +P++ + P+ 

Sbjct: 359 STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

Query: 638 ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSH 697 

E + P+E +P++ + P+E ++P++++ +P + +P+E + + P+E + 

SbjCt: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 698 HSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPS 757 

++P++ ++P+E + + +E + +P+E +P+E + P+E++ ++P+E + ++P+ 
Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 758 ERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSR 816 

E ++P E + ++ E + +P+E + E + + E +++P+E++ +P+E + 

Sbjct: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 817 CSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 876 

+P+E ++ + +T + P+E + +P+ +E + +S E T ++P+E P+ 

Sbjct: 599 YAPTEETTYAPAEETPYEPTEETT YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 877 GRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

P+E + + P +E Y+ P E +++ 
Sbjct: 659 EETPYEPTEETTYAPTEE-TTYA-PTEETTYA 688 

- Score = 337 (50.6 bits), Expect = 2.1e-26, P = 2.1e-26 
Identities = 66/328 (20%), Positives = 170/328 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + +P+ + +P+E + 

Sbjct: 1059 EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP 1118 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

P++ +P E + ++ +E +++P + + GP++ ++P+E + +P+E + +P+ 
Sbjct: 1119 YEPTEETTYAPTEETT YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 1178 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 

E P+ + P+E + +P+E +P++ + P+E + P++ + +P + 

Sbjct: 1179 EETT YE PTGETTYAPTEETTY A PTEETTYAPTEETTYAPTEETP YEPTEETTYAPTEETT 1238 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + +p + + P+E +P+E ++P+ 

Sbjct: 1239 YEPTEETTYAPTEETT YAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPT 1298 

Query: 742 EKSIIHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRIS 797 

E++ ++P+E + + P+ ++P E + ++ E + +P E + E S +S 

Sbjct: 1299 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKP 1358 

Query: 798 ERSHSPSEKSHLSPLERSRCSPSE 821 

E + P+++ P + P++ 
Sbjct: 1359 CNTEEFTDEPTDEPTDEPSDEPTDEPTD 1386 

Score = 333 (50.0 bits), Expect = 5.7e-26, P = 5.7e-26 
Identities = 63/320 (19%), Positives = 166/320 (51%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQP.SSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E T P+E T + P+ +P+E + + E ++P++ + P+ + +P+E + 

Sbjct: 1075 EETT YAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETP YEPTEETTYAPTEETT 1134 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+P++ +P E + + +E ++P++ + P++ ++P+E + P+ ■ + +P+ 
Sbjct: 1135 YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPT 1194 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRS H 681 

E +P++ + P+E + +P+E P++ + P+E + P++ + +P + 

Sbjct: 1195 EETT YAPTEETTYAPTEETTYAPTEETP YEPTEETTYAPTEETT YEPTEETTYAPTEETT 1254 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P + + P+E + +P+E + +P+E +P+E + P+ 

SbjCt: 1255 YAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPT 1314 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRS FERSHRRI SERSH 801 
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++ ++P+E + ++P+E ++P+E ++ ES+S+ +E+ E+ 




Sbjct: 


1315 


GETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKPCNTEEFTDEPTDEPTD 


137. 


Query: 


802 


SPSEKSHLSPLERSRCSPSE 821 








PS++ P + P++ 




Sbjct: 


1375 


EPSDEPTDEPTDEPTDLPTD 1394 




Score 


= 303 


(45.5 bits), Expect = 9.6e-23, P = 9.6e-23 




Identities = 70/322 (21%), Positives = 170/322 (52%) 




Query: 


584 


GLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCS 


643 






q + ps + p+-t- + P E + +PSE + +P E + P++ + + E + + + 




Sbjct: 


299 


GGYEPSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY-DVEETTYVT 


356 


Query: 


644 


PSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSER 


703 






E +P++ P+ER H++ + + + + +P+E + ++P+E + ++P+E 




Sbjct: 


357 


--EESTYAPTKSETNAPTERMHYAHIEKPCDTEV — TMYAPTEETTYAPTEETTYAPTEE 


412 


Query: 


704 


RHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHS 


763 






++P+E + P+E + +P+E +P+E ++P+EK+ ++P+E + ++P+E + 




Sbjct: 


413 


TTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYE 


472 


Query : 


764 


PLERSRHSLLERSHRSPSERRSHRS FERS-HRRISERSHSPSEKSHLSPLERSRCSPSER 


822 






P E + ++ + + +P+E ++ S E + + E +++P+E++ P E + +P+E 




Sbjct: 


473 


PTEETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEE 


532 


Query: 


823 


RGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCS 


882 






++ + +T ++P+E + +P+ +E + E T ++P+E P+ + 




Sbjct: 


533 


TTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYA 


592 


Query: 


883 


PSERSRRSPLKEGLKYSFPGERP 905 








P E + +P +E Y+ E P 




Sbjct: 


593 


PIEETTYAPTEE-TTYAPAEETP 614 




Score 


= 151 


(22.7 bits), Expect = 2.0e-06, P = 2.0e-06 




Identities = 45/198 (22%), Positives = 103/198 (52%) 




Query : 


716 


PSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLLER 


775 






PS+ + +P+E P E +PSE + ++P E + ++P+E+ +E + + + E 




Sbjct : 


303 


PSDETE-APTEGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPYD— VEETTY-VTEE 


358 


Query: 


776 


SHRSPSERRSHRSFERSHRRISERS HSPSEKSHLSPLERSRCSPSERRGHSSS 


828 






S +P++ ++ ER H E+ ++P+E++ +P E + +P+E ++ + 




Sbjct: 


359 


STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 


418 


Query: 


829 


GKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSR 


888 






+T + P+E + +P+ +E + + E+T ++P+E P+ P+E + 




Sbjct : 


419 


EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 


47B 


Query: 


889 


RSPLKEGLKYSFPGERPSHSLSRD 912 








+P KE Y+ P E +++ + + 




Sbjct: 


479 


YAPTKE-TTYA-PTEETTYASTEE 500 





Pedant information for DKFZphtes3_8gl 1 , frame 2 



Report for DKFZphtes3_8gll .2 



[LENGTH] 95 4 

[MM] 110063.05 

[pi] 11.40 

[PROSITE] ATP_GTP_A 1 

[KW] Irregular 

[KW] LOW_COMPLEXITY 27.67 % 

SEQ ESSLSIFYDREDLVPMEESEDSQSDSQTRISESQHSLKPNYLSQAKTDFSEQFQLLEDLQ 

SEG xxxxxxxxxxx 



PRD ccceeeccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ LKIAAKLLRSQIPPDVPPPLASGLVLKYPICLQCGRCSGLNCHHKLQTTSGPYLLIYPQL 

SEG 

PRD hhhhhhhhhhcccccccccccceeeeecceeecccccccccccccccccccceeeehhhh 

SEQ HLVRTPEGHGEVRLHLGFRLRIGKRSQISKYRERDRPVIRRSPISPSQRKAKI YTQASKS 

SEG 

PRD hcccccccccceeecccceeeccccccccccccccceeeeeccccccchhhhhhhccccc 

SEQ PTSTIDLQSGPSQSPAPVQVYIRRGQRSRPDLVEKTKTRAPGHYEFTQVHNLPESDSEST 
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SEG 

PRD ccccccccccccccccceeeeeeeccccccchhhhhhcccccceeeeeecccccccccch 

SEQ QNEKRAKVRTKKTSDSKYPMKRITKRLRKHRKFYTNSRTTIESPSRELAAHLRRKRIGAT 

SEG 

PRD hhhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhcc 

SEQ QTSTASLKRQPKKPSQPKFMQLLFQSLKRAFQTAHRVIASVGRKPVDGTRPDNLWASKNY 

SEG 

PRD ccchhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ YPKQNARDYCLPSSIKRDKRSADKLTPAGSTIKQEDILWGGTVQCRSAQQPRRAYSFQPR 

SEG 

PRD cccccccccccccccccccccccccccccccccccceeeccccccccccccccccccccc 

SEQ PLRLPKPTDSQSGIAFQTASVGQPLRTVQKDSSSRSKKNFYRNETSSQESKNLSTPGTRV 

SEG 

PRD ccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccccccee 

SEQ QARGRILPGSPVKRTWHRHLKDKLTHKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPS 

SEG xxxxx 

PRD eeecccccccccccccccccccccccccccccceeeeccccccccccccccccccccccc 

SEQ ERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSKRSPSQP.NHCSPPERSCHSLSERGL 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD chhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERSRHSLLERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGH 

SEG XXXXX XX XX XXX XX XXX XX XXX XX XX xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ SSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSE 

SEG xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ RSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTTLLGTTHKNPKAGQVWRPEATR 

SEG 

PRD ccccccccccceeecccccccccccccccccccccccccccccccccccccccc 

Prosite for DKFZphtes3_8gll . 2 
PS00017 839->847 ATP_GTP_A PDOC00017 

(No Pfam data available for DKFZphtes3_8gll . 2) 
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DKFZphtes3_8g5 



group: testes derived 

DKFZphtes3_8g5 encodes a novel 544 amino acid protein nearly identical to human KIAA087 
protein. 

The novel protein is a new splice variant of KIAA087. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



KIAA087, alternative spliced 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2762 bp 

No poly A stretch found, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



CCGACATCGG 
CAGAGCAGCG 
CCTTATGAAA 
ATAAAGTTCG 
TTCTCAAAGA 
CATTGAGAAC 
GTATCCTAAA 
AAAAAAATTC 
GGCCTTTCTT 
TATATATTGA 
ATCCAGGCCC 
GGGCATAAAC 
CCATGATAAT 
TATGTCCTTT 
TAATGCCCTC 
TCCCAATCAG 
GTCCCACTGG 
CCAAGGCGCA 
ATGCTTTTGG 
ATCGGCCAGC 
GGTGTTACAG 
GCATCGACCA 
GCAATGTACC 
CTTCCACCTG 
TGCTTGACAT 
GCGGTGGGCT 
GGAGGAGGTG 
ATGTCTGCTA 
AACTGTGTGA 
GATCCGGAAC 
TCTATAACGT 
AACTTGGAAT 
ACGCTATTTC 
TGGAGATCCG 
AATATTTACA 
GACATTGCAC 
GAAGAAGACG 
ACTCCACCAG 
TCTTCCCCAG 
AATCTGAAAG 
TGACAGTTTG 
GTAATCATTC 
CAGAACATTT 
CCTTCTGATT 
TCTTTTTACC 
TCTTGGCAGC 
AATCATTTGA 
TTTTATCTTT 
CTTCCATATT 
TTCTATTTTA 
GTGAATGGAT 
ACTTTTATTT 
TACATCAGTG 



CCGTGTCTCC 
GGAAGGTGTG 
CACTACAGCC 
GCAAAAAGCT 
GGTTCTTTTC 
CTTGAAGGAC 
TATGGAAGGA 
TTTACTACCT 
CAGCAGCCAG 
CCAGTACTGC 
AAATTGACAG 
AGTCGCCACC 
GGAAATAGAA 
ACGACCAACT 
AACTTATATA 
CATGTCTCTG 
AGCCTGTCAA 
GAAGGGGCGA 
GAAAGGCAAG 
ACGTGACTGC 
AGAATGGTGG 
GTCATACCAG 
CGGACCAGGT 
GGAATCTGGC 
CCTCCAGCAC 
ACCTGGTGCA 
GGCGTAGAGG 
CTCCATCGGG 
TCTACGGCTG 
ATGAACGTCC 
GCTGGTGGAG 
ATAACGTGGA 
TCAGAGTTTA 
GTATCCAGAA 
GTGCAAAGAA 
CTTTGCTGCT 
TCTCCACGGA 
TAGTGCTGGT 
CTGCAAAGAC 
GCACTGTGTC 
TGACATTCTG 
TTTGTATTCA 
CCTTGGCTGG 
TCTCTGTGGA 
CTGAAGTTAG 
ATCTTAGAGA 
ATTTATTTTT 
TTTTTTTTTA 
CCTCTCTTGA 
TTACATAAAA 
TTTTCCATAC 
TTTAATTTAA 
ATGGGTTCTT 



AGCACCTGCC 
GAAGGAGCAG 
CCACCGACTA 
GGGTTAGAAG 
AGAGCACGTT 
CAGAGATTTT 
AGAAAAGCTT 
GCGGCAACAG 
ATGACTATGA 
AATCCTCTCT 
CATCGTGGAG 
CCAGCTTGGC 
CTCCAGAGCC 
GAAGTTCAAG 
TGCATCAGGT 
CTCTATTTGA 
CTTCCCAAGT 
CCCTGGACAT 
CAGCTGACAG 
AGCACTGTAT 
GAAACCTGTT 
CTCCTGAGAG 
GCAGCTTCTC 
CAGAGAAGTC 
ATCCAAACCC 
GCACACTCTA 
TGAAGCTGCG 
CTCATTATGA 
GGACCCCACC 
ACAGCCTGCC 
GACGGCTCCT 
GCCTCAAGAA 
CTGGCACTCA 
GATCTGGAGT 
AGAGAACATA 
GCTGCTATCT 
GCCCTCGGGA 
TGCCTCCTAC 
AATGTTGCTC 
AGTGGCATGG 
TCTTCATGAG 
CTCCATTCCC 
ACAGATGGGG 
ACGTGTTCGG 
TTGCATATTC 
TGGAGACATT 
TTCTAATATG 
AATTTAAATG 
GTTTATGCAC 
TTCTTTTAGA 
TCATCTACAA 
AAAATCTACT 
TTTGTAGTGA 



GGCGGCTGCG 
TTCCGGGTGA 
CGTCAATTGG 
CGCGGAAGAT 
CCTTGTAATG 
TTTTGAGGAT 
TGACCTGGAA 
AAGATCTTAA 
GTCGTATCTT 
CCGACATCAG 
CTTGTTTGCA 
CTTCAAGGCA 
AGGTGCTGGA 
GGGAATCGAA 
TTTGATTCGC 
CAATTGCTCG 
CACTTCTTAT 
CTTTGACTAC 
TGAAAGAATG 
GGGGTGGTCA 
AAGCCTGGGG 
ACTCGCTGGA 
CTCCTCCAAG 
TTTCTGTCTT 
TAGACCCGGG 
GAGCACATTG 
CTCCGATGAG 
AGCATAAGAG 
TGCATGATGG 
GCACGGCCAC 
GTCGATACGC 
ATCTCACACC 
CTACATCCCA 
TTGTCTATGA 
GATGAGTAAA 
TCCAAGAGAA 
CCTGCTGCAC 
TAAGTTTAAA 
TCCGCCTACA 
CTTGTATGCT 
GTCTCACAGT 
CTGTCTGTCT 
TTATGCATTT 
TCCCGAGTGA 
AGAGGTAAAG 
AACAAGCTAA 
TGAAACACAG 
GGAATATAAC 
AT CTCTAT AA 
AAATGCAAAT 
TTCCTCCATT 
TCAGTATCAT 
GACATACAAA 



CGAGCTGTGC 
GGTGACCTTC 
TTGGAAGAGT 
TGTAGCCTCG 
GCTTCAGTGA 
GAACTGGTGT 
ATACTACGCA 
ATAATCTTAA 
GAAGGTGCTG 
CCTCAAAGAC 
AAACCCTTCG 
GGTGAATCAT 
TGCCATGAAC 
TGGATTACTA 
AGAACAGGAA 
GCAGTTGGGA 
TAAGGTGGTG 
ATCTACATAG 
CGAGTACTTG 
ATGTCAAGAA 
AAGCGGGAAG 
TCTCTATCTG 
CCAGGCTTTA 
GTTTTGAAGG 
GCAGCACGGG 
AGCGCAAAAA 
AAGCACAGAG 
GTATGGCTAT 
GACACGAGTG 
CACCAGCCTT 
AGCCCAAGAA 
CTGACGTGGG 
AACGCAGAGC 
AACGGTGCAG 
GTCTAGAGAG 
CGGGACTCCG 
CAGGAAAGCC 
TACCGTGTGC 
CTAGTGAATT 
TGTCCTGTGG 
CGACGCTCCT 
GCATTTGTCT 
GCAATAATTT 
GGACTGTGTG 
TTGTGTGCTA 
TGGTAATTAG 
ATTTCAAGTG 
ACAGTTTTCC 
ATCATTAGTT 
AGTGAACTTT 
TTAAATGACT 
GAGTAGGTCT 
TCTGATGTTA 
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2 651 ATGTTTGCTC TTAGAAGTCA TACTCCATGG TCTTCAAAGA CCAAAAAATG 
2701 AGGTTTTGCT TTTGTAATCA GGAAAAAAAA AATTAATGAA CCTTAAAAAA 
2751 AAAAAAAAAA GG 



BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 105 bp to 1736 bp; peptide length: 544 
Category: known protein 
Classification: unclassified 



1 MKHYSPTDYV NWLEEYKVRQ KAGLEARKIV ASFSKRFFSE HVPCNGFSDI 

51 ENLEGPEIFF EDELVCILNM EGRKALTWKY YAKKILYYLR QQKILNNLKA 

101 FLQQPDDYES YLEGAVYIDQ YCNPLSDISL KDIQAQIDSI VELVCKTLRG 

151 INSRHPSLAF KAGESSMIME IELQSQVLDA MNYVLYDQLK FKGNRMDYYN 

201 ALNLYMHQVL IRRTGIPISM SLLYLTIARQ LGVPLEPVNF PSHFLLRWCQ 

251 GAEGATLDIF DYIYIDAFGK GKQLTVKECE YLIGQHVTAA LYGVVNVKKV 

301 LQRMVGNLLS LGKREGIDQS YQLLRDSLDL YLAMYPDQVQ LLLLQARLYF 

351 HLGIWPEKSF CLVLKVLDIL QHIQTLDPGQ HGAVGYLVQH TLEHI ERKKE 

401 EVGVEVKLRS DEKHRDVCYS IGLIMKHKRY GYNCVIYGWD PTCMMGHEWI 

451 RNMNVHSLPH GHHQPFYNVL VEDGSCRYAA QENLEYNVEP QEISHPDVGR 

501 YFSEFTGTHY IPNAELEIRY PEDLEFVYET VQNIYSAKKE NIDE 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_8g5, frame 3 

TREMBLNEW : AB02 068 2_1 gene: "KIAA0875"; product: "KIAA0875 protein"; 
Homo sapiens mRNA for KIAA087 5 protein, partial cds., N = 1, Score = 
2832, P = 5.5e-295 



>TREMBLNEW:AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo 
sapiens mRNA for KIAA087 5 protein, partial cds. 
Length = 621 



HSPs : 



Score = 2832 (424.9 bits). Expect = 5.5e-295, P = 5.5e-295 
Identities - 537/544 (98%), Positives = 537/544 (98%) 



Query: 1 MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFS EH VPCNGFSDI ENLEGPEIFF 60 

MKHYSPTDYVNWLEEYKVRQKAGLEARKI V AS FSKB.FFS EH VPCNGFSDI ENLEGPEIFF 
Sbjct: 85 MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFS EH VPCNGFSDI ENLEGPEIFF 144 

Query: 61 EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 120 

EDELVC I LNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDD YES YLEGAVYIDQ 
Sbjct: 145 EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 204 

Query: 121 YCNPLSDISLKDIQAQIDS I VELVCKTLRGINSRHPSLAFKAGESSMIME IELQSQVLDA 180 

YCNPLSDISLKDIQAQIDS I VELVCKTLRGINSRHPSLAFKAGESSMIME IELQSQVLDA 
SbjCt: 205 YCNPLS DISLKDIQAQ I DS I VELVCKTLRGINSRHPSLAFKAGESSMIME IELQSQVLDA 264 

Query: 181 MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 240 

MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 
Sbjct: 265 MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 324 

Query: 241 PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 300 

PSHFLLRWCQGAEGATLDI FDYI YIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 
Sbjct: 325 PSHFLLRWCQGAEGATLDI FDYI YIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 3B4 

Query: 301 LQRMVGNLLSLGKREGIDQS YQLLRDSLDL YLAMYPDQVQLLLLQARLYFHLG I WPEKSF 360 
LQRMVGNLLSLGKREGIDQS YQLLRDSLDL YLAMYPDQVQLLLLQARLYFHLG I WPEK 



978 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



Sbjct: 385 LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK-- 442 

Query: 361 CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 420 

VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 
Sbjct: 443 VLDILOHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRS DEKHRDVCYS 497 

Query: 421 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 480 

IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 
Sbjct: 498 IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 557 

Query: 481 QENLEYNVEPQEISHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFVYETVQNIYSAKKE 540 

QENLEYNVEPQEISHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFVYETVQNI YSAKKE 
Sbjct: 558 QENLEYNVEPQE I SHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFV YET VQNI YSAKKE 617 

Query: 541 NIDE 544 
NIDE 

Sbjct: 618 NIDE 621 

Pedant information for DKFZphtes3_8g5, frame 3 



Report for DKFZphtes3_8g5 . 3 

[LENGTH] 54 4 

[MW] 63307.22 

[pi] 5.82 

[HOMOL] TREMBL:AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo sapiens 

mRNA for KIAA0875 protein, partial cds . 0.0 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 1 . 84 % 



SEQ MKHYSPTDYVNWLEEYKVRQKAGLEARKIVASFSKRFFSEHVPCNC-FSDIENLEGPEIFF 

SEG 

PRD cccccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccccccccccccccceee 

SEQ EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYESYLEGAVYIDQ 

SEG 

PRD eeeeeeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeecceeeeeee 

SEQ YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhcccccccccceeeecccchhhhhhhhhhhhhhh 

SEQ MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGIPISMSLLYLTIARQLGVPLEPVNF 

SEG 

PRD hhhhhccccccccccchhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccccc 

SEQ PSHFLLRWCQGAEGATLDIFDYIYIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 

SEG 

PRD cceeeeeeccccccceeeeeeeeeeeccccceeeeeehhhhhhhhhhhhhhhhhhhhhhh 

SEQ LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEKSF 

SEG 

PRD hhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhhhhhhhhhhhhcccccceee 

SEQ CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 

SEG xxxxxxxxxx 

PRD ehhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhheeeeecccccceeeecc 

SEQ ■ IGLIMKHKRYGYNCVIYGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 

SEG 

PRD cccchhhhhhhceeeeecccccccchhhhhhhhhhhccccccccccceeeeecccceeee 

SEQ QENLEYNVEPQE I SHPDVGRYFSEFTGTHYIPNAELEIRYPEDLEFV YET VQNI YSAKKE 

SEG 

PRD hhhhhhhhcccccccccceeeeccccccccccchhhhhhccchhhhhhhhhhhhhccccc 

SEQ NIDE 

SEG .... 

PRD CCCC 



(No Prosite data available for DKFZphtes3_8g5 . 3) 
(No Pfam data available for DKFZphtes3_8g5 . 3) 
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DKFZphtes3_BmlO 



group: nucleic acid management 

DKFZphtes3_8mlO encodes a novel 221 amino acid protein with strong similarity to 
polyadenylate-binding proteins. 

The poly (A) -binding protein (PABP) binds to the messenger (mRNA) 3'-poly(A) tail found on most 
eukaryotic mRNAs and together with the poly (A) tail has been implicated in governing the 
stability and the translation of mRNA. 

The new protein can find application in modulation of mRNA translation and 
processing/stability . 

strong similarity to polyadenylate-binding protein 

frame shift at Bp 707-710 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2107 bp 

Poly A stretch at pos. 2052, polyadenylation signal at pos. 2033 



1 CGGAAAGGTC GCGGCTTGTG TGCCTGCGGG CAGCCGTGCC GAGAATGAAC 
51 CCCAGCACCC CCAGCTACCC AACGGCCTCG CTCTACGTGG GGGACCTCCA 
101 CCCCGACGTG ACTGAGGCGA TGCTCTACGA GAAGTTCAGC CCGGCAGGGC 
151 CCATCCTCTC CATCCGGATC TGCAGGGACT TGATCACCAG CGGCTCCTCC 
201 AACTACGCGT ATGTGAACTT CCAGCATACG AAGGACGCGG AGCATGCTCT 
251 GGACACCATG AATTTTGATG TTATAAAGGG CAAGCCAGTA CGCATCATGT 
301 GGTCTCAGCG TGATCCATCA CTTCGAAAAA GTGGAGTGGG CAACATATTC 
351 GTTAAAAATC TGGATAAGTC CATTAATAAT AAAGCACTGT ATGATACAGT 
401 TTCTGCTTTT GGTAACATCC TTTCGTGTAA CGTGGTTTGT GATGAAAATG 
4 51 GTTCCAAGGG TTATGGATTT GTACACTTTG AGACACACGA AGCAGCTGAA 
501 AGAGCTATTA AAAAAATGAA CGGAATGCTC CTAAATGGTC GCAAAGTATT 
551 TGTTGGACAA TTTAAGTCTC GTAAAGAACG AGAAGCTGAA CTTGGAGCTA 
601 GGGCAAAAGA GTTCCCCAAT GTTTACATCA AGAATTTTGG AGAAGACATG 
651 GATGATGAGC GCCTTAAGGA TCTCTTTGGC AAGTTCGGGC CCGCCTTAAG 
701 TGTGAATTAA TGACCGATGA AAGTGGAAAA TCCAAAGGAT TTGGATTTGT 
751 AAGCTTTGAA AGGCATGAAG ATGCACAGAA AGCTGTAGAT GAGATGAATG 
801 GAAAGGAGCT CAATGGAAAA CAAATTTACG TTGGTCGAGC TCAGAAAAAA 
851 GTGGAACGGC AGACGGAACT TAAGCGCACA TTTGAACAGA TGAAGCAAGA 
901 TAGGATCACC AGATACCAGG TTGTTAATCT TTATGTGAAA AATCTTGATG 
951 ATGGTATTGA TGATGAACGT CTCCGGAAAG CGITTTCTCC ATTTGGTACA 
1001 ATCACTAGTG CAAAGGTTAT GATGGAAGGT GGTCGCAGCA AAGGGTTTGG 
1051 TTTTGTATGT TTCTCCTCCC CAGAAGAAGC CACTAAAGCA GTTACAGAAA 
1101 TGAACGGTAG AATTGTGGCC ACAAAGCCAT TGTATGTAGC TTTAGCTCAG 
1151 CGCAAAGAAG AGCGCCAGGC TTACCTCACT AACGAGTATA TGCAGAGAAT 
1201 GGCAAGTGTA CGAGCTGTGC CCAACCAGCG AGCACCTCCT TCAGGTTACT 
12 51 TCATGACAGC TGTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
1301 AGCCAAATTG CTCGACTAAG ACCAAGTCCT CGCTGGACTG CTCAGGGTGC 
1351 CAGACCTCAT CCATTCCAAA ATAAGCCCAG TGCTATCCGC CCAGGTGCTC 
14 01 CTAGAGTACC ATTTAGTACT ATGAGACCAG CTTCTTCACA GGTTCCACGA 

14 51 GTCATGTCAA CGCAGCGTGT TGCTAACACA TCAACACAGA CAGTGGGTCC 
1501 ACGTCCTGCA GCTGCTGCTG CTGCTGCAGC TACCCCTGCT GTGCGCACGG 

15 51 TTCCACGGTA TAAATATGCT GCGGGAGTTC GCAATCCTCA GCAACATCGT 
1601 AATGCACAGC CACAAGTTAC AATGCAACAG CTTGCTGTTC ATGTACAAGG 
1651 TCAGGAAACT TTGACTGCCT CCAGGTTGGC ATCTGCCCCT CCTCAAAAGC 
1701 AAAAGCAAAT GTTAGGTGAA CGGCTCTTTC CTCTTATTCA AGCCATGCAC 

17 51 CCTACTCTTG CTGGGAAAAT CACTGGCATG TTGTTGGAGA TTGATAATTC 

18 01 AGAACTTCTT TATATGCTCG AGTCTCCAGA GTCACTCCGT TCTAAGGTTG 
1851 ATGAAGCTGT AGCTGTACTA CAAGCCCACC AAGCTAAAGA GGCTACCCAG 
1901 AAAGCAGTTA ACAGTGCTAC CGGTGTTCCA ACTGTTTAAA ATTGATCAGA 
1951 GACCACGAAA AGAAATTTGT GCTTCACCGA AGAAAAATAT CTAAACATCG 
2 001 AGAAACTATG GGAAAAAAAA TTGCAAAATC TAAAATAAAA AATGCAAAAT 
2 051 CTAAAATAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2101 AAAAAGG 



BLAST Results 



Entry HSPOLYAB from database EMBL: 

Human mRNA for polyA binding protein 

Score = 5420, P = 0.0e+00, identities = 1162/1243 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 707 bp to 1936 bp; peptide length: 410 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (10-18) 
RNP_1 (112-120) 



1 LMTDESGKSK GFGFVSFERH EDAQKAVDEM NGKELNGKQI YVGRAQKKVE 

51 RQTELKRTFE QMKQDRITRY QVVNLYVKNL DDGIDDERLR KAFSPFGTIT 

101 SAKVMMEGGR SKGFGFVCFS SPEEATKAVT EMNGRIVATK PLYVALAQRK 

151 EERQAYLTNE YMQRMASVRA VPNQRAPPSG YFMTAVPQTQ NHAAYYPPSQ 

201 IARLRPSPRW TAQGARPHPF QNKPSAIRPG APRVPFSTMR PASSQVPRVM 

251 STQRVANTST QTVGPRPAAA AAAAATPAVR TVPRYKYAAG VRNPQQHRNA 

301 QPQVTMQQLA VHVQGQETLT AS RLASAPPQ KQKQMLGERL FPLIQAMHPT 

351 LAGKITGMLL EIDNSELLYM LESPESLRSK VDEAVAVLQA HQAKEATQKA 
401 VNSATGVPTV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8ml0, frame 2 

PIR:DNHUPA polyadenylate-binding protein - human, N = 1, Score = 1931, 
P = 1.7e-199 

PIR:I48718 poly(A) binding protein - mouse, N = 1, Score = 1928, P = 
3.6e-199 



>PIR:DNHUPA polyadenylate-binding protein - human 
Length = 633 

HSPs: 



Score = 1931 (289.7 bits), Expect = 1.7e-199, P = 1.7e-199 
Identities = 384/415 (92%), Positives = 394/415 (94%) 



Query: 


1 


LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 


60 






+MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKR FE 




Sbjct: 


219 


VMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFE 


278 


Query: 


61 


QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFS 


120 




QMKQDRITRYQ VNLYVKNLDDGIDDERLRK FSPFGTITSAKVMMEGGRSKGFGFVCFS 




Sbjct: 


279 


QMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFS 


338 


Query: 


121 


SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPN Q 


174 






SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQA+LTN+YMQRMASVRAVPN Q 




Sbjct : 


339 


SPEEATKAVT EMNGRIVATK PL YVALAQRKEERQAHLTNQYMQRMASVRAVPNPVINPYQ 


398 


Query : 


175 


RAPPSGYFMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRV 


234 






APPSGYFM A+PQTQN AAY YPPSQ+A+LRPSPRWTAQGARPHPFQN P AIRP APR 




Sbjct : 


399 


PAPPSGYFMAAIPQTQNRAAYYPPSQVAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRP 


458 


Query: 


235 


PFSTMRPASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNP 


294 






PFSTMRPASSQVPRVMSTQRVANTSTQT+GPRPAAAAAAA TPAVRTVP+YKYAAGVRNP 




Sbjct: 


459 


PFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAA-TPAVRTVPQYKYAAGVRNP 


517 


Query: 


295 


QQHRNAQPQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGK 


354 






QQH NAQPQVTMQQ AVHVQGQE LTAS LASAPPQ+QKQMLGERL FPL I QAMH PTLAGK 




Sbjct: 


518 


QQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGK 


577 


Query: 


355 


ITGMLLE I DNSELLYMLES PES LRSKVDEAVAVLQAHQAKEATQKA VNSATGVPTV 410 








ITGMLLEIDNSELL+MLESPESLRSKVDEAVAVLQAHQAKEA QKAVNSATGVPTV 




Sbjct: 


578 


ITGMLLE I DNSELLHMLES PES LRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV 633 




Score 


= 315 


(47.3 bits). Expect = 1.9e-27, P = 1 . 9e-27 
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Identities = "7 1/1 6 3 (43%), Positives =' 102/163 (62%) 



Query : 


1 


LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFE 


60 




++ DE+G SKG+GFV FE E A++A+++MNG LN ++++VGR + + ER+ EL + 




Sbjct: 


130 


VVCDENG-SKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAK 


188 


Query: 


61 


QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMM-EGGRSKGFGFVCF 


119 






+ N+Y+KN + +DDERL+ F P S KVM E G+SKGFGFV F 




Sbjct: 


189 


EF TNVYIKNFGEDMDDERLKDLFGP ALSVKVMTDESGKSKGFGFVSF 


235 


Query: 


120 


SSPEEATKAVTEMNGRI VATKPLYVALAQRKEERQAYLTNEYMQ 163 








E+A KAV EMNG+ + K +YV AQ+K ERQ L ++ Q 




Sbjct: 


236 


ERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFEQ 279 




Score 


= 214 


(32.1 bits), Expect = 1.9e-14, P = 1.9e-14 




Identities = 


= 50/150 (33%), Positives = 87/150 (58%) 




Query: 


8 


KSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQMKQDRI 


67 






+S G+ +V+F++ DA++A+D MN + GK + + +Q R L+++ 




Sbjct: 


50 


RSLGYAYVNFQQPADAERALDTMNFDVIKGKPVRIMWSQ RDPSLRKS 


96 


Query: 


68 


TRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATK 


127 






V N+++KNLD ID++ L FS FG I S KV+ + SKG+GFV F + E A + 




Sbjct: 


97 


GVGNI FIKNLDKSI DNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAER 


153 


Query: 


128 


AVTEMNGRIVATKPLYVALAQRKEERQAYL 157 








A+ +MNG ++ + ++V + ++ER+A L 




Sbjct: 


154 


AIEKMNGMLLNDRKVFVGRFKSRKEREAEL 183 




Score 


= 120 


(18.0 bits). Expect = 4.8e-04, P = 4.8e-04 




Identities = 


= 30/99 (30%), Positives = 54/99 (54%) 




Query: 


70 


YQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVM--MEGGRSKGFGFVCFSSPEEATK 


127 






Y + +LYV +L + + L + FSP G I S +V M RS G+ +V F P +A + 




Sbjct: 


8 


YPMASLY VGDLHPDVTEAMLYEKFS PAGPI LS I RVCRDMI TRRSLGYAYVNFQQPADAER 


67 


Query: 


128 


AVT EMNGRI VAT KPL YVAL AQRKEE -RQA YLTNEYMQRM 165 








A+ MN ++ KP+ + +QR R++ + N +++ + 




Sbjct: 


68 


ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNL 106 








Peptide information for frame 3 





ORF from 45 bp to 707 bp; peptide length: 221 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (138-146) 



1 MNPSTPSYPT ASLYVGDLHP DVTEAMLYEK FSPAGPILSI RICRDLITSG 

51 SSNYAYVNFQ HTKDAEHALD TMNFDVIKGK PVRIMWSQRD PSLRKSGVGN 

101 IFVKNLDKSI NNKALYDTVS AFGNILSCNV VCDENGSKGY GFVHFETHEA 

151 AERAIKKMNG MLLNGRKVFV GQFKSRKERE AELGARAKEF PNVYIKNFGE 
201 DMDDERLKDL FGKFGPALSV N 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8ml0, frame 3 

SWISSPROT: PAB1 HUMAN POL YADENYLATE- BINDING PROTEIN 1 (POLY(A) BINDING 
PROTEIN 1) (PABP 1)., N = 1, Score = 1039, P = 5.7e-105 

PIR:I48718 poly(A) binding protein - mouse, N = 1, Score = 1031, P = 
4e-104 

PIR:DNHUPA polyadenylate-binding protein - human, N = 1, Score = 1009, 
P = 8.7e-102 



>SWISSPROT:PABl_HOMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) ( PABP 1) . 

Length =63 6 

HSPs: 
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Score = 1039 (155.9 bits), Expect = 5.7e-105, P = 5.7e-105 
Identities = 199/220 (90%), Positives = 205/220 (93%) 



Query: 


1 


MNPSTPSYPTASLYVGDLHPDVTEAiMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQ 


60 




MNPS PSYP ASLYVGDLHPDVTEAMLYEKFSPAGPILSIR+CRD+IT S YAYVNFQ 




Sb j ct : 


I 


MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQ 


60 


Query: 


61 


HTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNLDKSINNKALYDTVS 


120 




DAE ALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIF+KNLDKSI+NKALYDT S 




Sbjct: 


61 


QPADAERALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFIKNLDKSIDNKALYDTFS 


120 


Query: 


121 


AFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSRKERE 


180 




AFGNILSC VVCDENGSKGYGFVHFET EAAERAI+KMNGMLLN RKVFVG+FKSRKERE 




Sbjct: 


121 


AFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKERE 


180 


Query: 


181 


AELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 






AELGARAKEF NVYIKNFGEDMDDERLKDLFGKFGPALSV 




Sbjct: 


181 


AELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 




Score 


= 275 


(41.3 bits). Expect = 4.1e-23, P = 4.1e-23 




Identities = 


= 71/733 f30%l Positives = 120/233 (51%) 




Query: 


2 


NPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 


61 




+PS ++++ +L + LY+ FS G ILS ++ D S + + Q 




Sbjct: 


90 


DPSLRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQE 


149 


Query: 


62 


TKD-AEHALDTMNFDVIKGKPVRIMW-SQRDPSL— RKSGVGNIFVKNLDKSINNKALYD 


117 




+ A ++ M + K R +R+ L R N+ ++KN + ++++ L D 




Sbjct: 


150 


AAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYIKNFGEDMDDERLKD 


209 


Query: 


118 


TVSAFGNILSCNVVCDENG-SKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSR 


17 6 






FG LS V+ DE+G SKG+GFV FE HE A++A+ +MNG LNG++++VG+ + + 




Sbjct : 


210 


LFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKK 


269 


Query: 


177 


KEREAELGARAKEFP NVYI KNFGEDXDDERLKDLFGKFGPALS 219 






ER+ EL + + +■ N+Y+KN + +DDERL+ F FG S 




Sbjct: 


270 


VERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITS 322 




Score 


= 227 


(34.1 bits), Expect = 6.3e-18, P = 6.3e-18 




Identities = 


= ^7/1ft7 f3nfti Pr»«ii ti v*»^ = 101/187 (^4%) 




Query: 


12 


SLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQHTKDAEHALDT 


71 






++Y+ + D+ + L + F GP LS+++ D + S + +V+F+ +DA+ A+D 




Sbjct: 


192 


NVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDE-SGKSKGFGFVSFERHEDAQKAVDE 


250 


Query: 


72 


MNFDVIKGKPVRIMWSQR DPSLRKSGVGNIL-'VKNLDKSINNKA 


114 






MN + GK + + +Q+ D R GV N++VKNLD I+++ 




Sbjct: 


251 


MNGKELNGKQIYVGRAQKKVERQTELKRKFEQMKQDRITRYQGV-NLYVKNLDDGIDDER 


309 


Query: 


115 


LYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK 


174 




L S FG I S V+ + SKG+GFV F + E A +A+ +MNG ++ + ++V + 




Sbjct: 


310 


LRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRIVATKPLYVALAQ 


369 


Query: 


175 


SRKEREAEL 183 








++ER+A L 




Sbjct: 


370 


RKEERQAHL 37 8 




Score 


= 100 


(15.0 bits), Expect = 2.3e-02, P - 2.3e-02 




Identities » 


= 26/99 (26%), Positives = 53/99 (53%) 




Query: 


8 


YPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSG-SSNYAYVNFQHTKDAE 


66 




Y +LYV +L + + L + + FSP G I S ++ ++ G S + +V F ++A 




Sbjct: 


291 


YQGVNL YVKNLDDGI DDERLRKEFS P FGTITSAKV MMEGGRSKGFGFVCFSSPEEAT 


347 


Query: 


67 


HALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 106 








A+ MN ++ KP+ + +QR R++ + N +++ + 




Sbjct: 


348 


KAVTEMNGRIVATKPLYVALAQRKEE-RQAHLTNQYMQRM 386 





Pedant information for DKFZphtes3_8ml0, frame 2 



Report for DKFZphtes3_8ml0 . 2 



[LENGTH] 409 

[MW] 45235.68 

[pi] 10.08 

[HOMOL] SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 

1) (PABP 1) . 0.0 
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le 



[FUNCAT] 
cerevisiae, 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
YER165W] 
[FUNCAT] 
le-15 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-05 
[FUNCAT] 
[FUNCAT] 
repair ) 
[FUNCAT] 
[BLOCKS] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFA'M] 
[SUPFAM] 
[ SUPFAM] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 
[KW] 



04.05.05 mrna processing (5' -end, 3 ' -end processing and mrna degradation) [S. 
YER1 65w] le-54 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-54 
30.10 nuclear organization [S. cerevisiae, YER165w] le-54 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

-54 

04.05.99 other rarna-transcription activities [S. cerevisiae, YNL016w] 

11.01 stress response [S. cerevisiae, YGR159c] le-12 

04.01.04 rrna processing [S. cerevisiae, YGR159c] le-12 

04.99 other transcription activities [S. cerevisiae, YNL175C] 4e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 5e-08 
03.19 recombination and dna repair [S. cerevisiae, YHR086w] 3e-07 
03.13 meiosis (S. cerevisiae, YHR086w] 3e-07 

04.05.03 mrna processing (splicing) [S. cerevisiae, YHR086w] 3e-07 

04.07 rna transport [S. cerevisiae, YOL123w HRPl - CF lb] 9e-07 

30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 3e-06 

99 unclassified proteins [S. cerevisiae, YGR250c] 8e-0S 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 
08.01 nuclear transport [S. cerevisiae, YDR432w] 2e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YFR023w] 3e-05 

03.01 cell growth [S. cerevisiae, YBR212w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

dlsxl 4.34.7.1.3 Sex-lethal protein [(Drosophila melanogaster ) le-17 

nucleus 0.0 
duplication 0.0 
RNA binding 0 . 0 
nucleolus 2e-09 
tandem repeat 2e-Q9 
single-stranded DNA binding 3e-06 
DNA binding 5e-13 
phosphoprotein 6e-10 
ribosome 3e-08 
mitochondrion 3e-08 
alternative splicing 9e-ll 
chloroplast 2e-19 
transcription regulation 2e-07 
protein biosynthesis 3e-08 
nucleolin 6e-10 

glycine-rich RNA-binding protein 2e-07 

unassigned ribonucleoprotein repeat-containing proteins 2e-19 
polyadenylate-binding protein 0.0 
ribonucleoprotein repeat homology 0 . 0 
RNP_1 2 

RNA recognition motif, (aka RRM, RBD, or RNP domain) 

Irregular 

3D 

LOW COMPLEXITY 5 . 62 % 



SEQ MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRTFEQ 

SEG 

lsxl- 

SEQ MKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSS 

SEG 

lsxl- CEEEECCCTTTTHHHHHHHHTTTTCCCCCEEECTTTCTTTEEEECTTT 

SEQ PEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPNQRAPPSGY 

SEG 

lsxl- HHHHHHHHHHHTTTCCCCCCCBCCBCC 

SEQ FMTAVPQTQNHAAYYPPSQIARLRP3PRWTAQGARPHPFQNKPSAIRPGAPRVPFSTMRP 

SEG • 

lsxl- 

SEQ ASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNPQQHRNAQ 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

lsxl- 

SEQ PQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGKITGMLLE 

SEG 

lsxl- 

SEQ IDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 

SEG 

lsxl- 
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Prosite for DKFZphtes3_8mlO . 2 



PS00030 9->17 RNP_1 

PS00030 111->119 RNP 1 



PDOC00030 
PDOC00030 



Pfam for DKFZphtes3_8mlO . 2 



HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM *IYVGNLPHDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 
+YV+NL+ +++E LR +FS+FG I+S+++M+ E GRS+GF+FV F + 
Query 74 LYVKNLDDGIDDERLRKAFSPFGTITSAKVMM--EGGRSKGFGFVCFSS 

HMM EEDAekAIde MNGme FmGRr I RV* 

+E+A+KA+ EMNG+++ ++++V 
Query 121 PEEATKAVTEMNGRIVATKPLYV 143 



120 



Pedant information for DKFZphtes3_8mlO, frame 3 



Report for DKFZphtes38ml0 . 3 



le 



[ LENGTH] 

[MW] 

[pi] 

[HOMOL] 

1) (PABP 1) 

[FUNCAT] 

cerevisiae , 
( FUNCAT] 
[FUNCAT] 
YER1 65w] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
2e-19 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-04 
[FUNCAT] 
[ BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 



235 

26308.08 
8.95 

SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 
, le-113 

04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) [S. 
YER165w] le-64 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-64 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

•64 

30.10 nuclear organization • [S. cerevisiae, YER165w] le-64 

03.19 recombination and dna repair [S. cerevisiae, YFR023w] le-24 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

[S. cerevisiae, YFR023w] le-24 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w] 

04.05.03 mrna processing (splicing) [S. cerevisiae, YOR319w] 2e-14 

04.01.04 rrna processing [S. cerevisiae, YGR159c] le-11 
11.01 stress response [S. cerevisiae, YGR159c] le-11 

99 unclassified proteins [S. cerevisiae, YGR250c] le-09 

04.07 rna transport [S. cerevisiae, YOL123w HRP1 - CF lb] le-09 

30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 8e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 2e-08 

03.13 meiosi5 [S. cerevisiae, YHR086w] 2e-08 

04.99 other transcription activities [S. cerevisiae, YBR212w] 3e-08 
03.01 cell growth [S. cerevisiae, YBR212w] 3e-08 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 

08.01 nuclear transport [S. cerevisiae, YDR432w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

BL00900D Bacteriophage-type RNA polymerase family proteins signatur 

dlsxl 4.34.7.1.3 Sex-lethal protein [ (Drosophila melanogaster ) 9e-23 

d2ula 4.34.7.1.2 U1A protein [human (Homo sapiens) 6e-24 

dlupl_2 4.34.7.1.1 Nuclear ribonucleoprotein Al, RNP Al, UP le-13 

nucleus le-110 

duplication le-110 

RNA binding le-110 

nucleolus 4e-10 

tandem repeat 4e-10 

single-stranded DNA binding le-06 

DNA binding 9e-12 

phosphoprotein 4e-10 

mitochondrion 6e-07 

heterotrimer 4e-06 

alternative splicing le-15 

chloroplast 5e-ll 

transcription regulation 3e-09 

GTP binding 2e-06 

helix-destabilizing protein le-07 

nucleolin 4e-10 

glycine-rich RNA-binding protein 2e-07 
yeast HRP1 protein 2e-08 
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[SUPFAM] unassigned ribonucleoprotein repeat-containing proteins 3e-25 

[SUPFAM] polyadenylate-binding protein le-112 

[SUPFAM] ribonucleoprotein repeat homology le-112 

[PROSITE] RNP_1 1 

[PFAM] RNA recognition motif, (aka RRM, RBD, or RNP domain) 

[KW] All_Beta 
[KW] 3D 

SEQ ERSRLVCLRAAVPRMNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDL 
lhal- EEEETTTTTTCHHHHHHHHGGGCCEEEEEEEETT 

SEQ ITSGSSNYAYVNFQHTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 
lhal- TTTCEEEEEEEEECCHHHHHHHHHHTTEEE-TT EEEEEEECTTTTCCCCCEEEEECC 

SEQ DKSINNKALYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGR 
lhal- TTTTCHHHHHHHHGGGCCEEEEEEEETTTTTCEEEEEEECCHHHHHHHH 

SEQ KVFVGQFKSRKEREAELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSVN 
lhal- 

Prosite for DKFZphtes3_8mlO . 3 
PS0003Q 152->160 RNP_1 PDOC00030 

Pfam for DKFZphtes3_8mlO . 3 

HMM NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM 'IYVGNLPWDtTEEDLrDlFsQFGpIvsIrMMrDReTGRSRGFAFVEFED 

+YVG+L +D+TE +L + FS+ GPI+SIR+ RD T S +A+V+F+ 
Query 21 LYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQH 75 

HMM EEDAekAIdeMNGmeFmGRrlRV* 

DAE A+D+MN ++ G+++R+ 
Query 76 TKDAEHALDTMNFDVIKGKPVRI 98 

HMM *I YVGNLPWDtTEEDLrDl FsQFGpIvsIrMMrDReTGRSRGFAFVEFED 

I+V+NL+ +++ L D S FG I+S++++ D + S+G++FV FE+ 
Query 115 I FVKNLDKSINNKALYDTVSAFGNILSCNVVCD--ENGSKGYGFVHFET 161 

HMM EEDAekAIdeMNGmeFmGRrlRV 

+E+AE+AI +MNGM+++GR++ V 
Query 162 HEAAERAIKKMNGMLLNGRKVFV 184 
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DKFZphtes3_8p7 



group: testes derived 

DKFZphtes3_8p7 encodes a novel 412 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

2 EST hits (both from testis librarys) 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2899 bp 

Poly A stretch at pos. 2870, polyadenylation signal at pos . 2852 



1 CCGACCCGCC CTGGGGTGCT GCGTGCGCTG CCTGCTCCCG CCTGAGGAAA 

51 ACACTGCCCA TGGCGCAAGG CCGGGAGCGC GACGAAGGCC CCCACTCCGC 

101 CGGCGGCGCG TCCTTGTCCG TGAGATGGGT GCAAGGATTC CCTAAGCAGA 

151 ATGTTCATTT GTCAACGACA ACACCATTTG CTACCCTTGT GGGAATTATG 

201 TAATATTTAT TAATATTGAA ACCAAGAAAA AGACTGTACT GCAGTGTAGT 

251 AATGGAATTG TGGGCGTCAT GGCAACTAAC ATCCCCTGTG AAGTTGTGGC 

301 TTTTTCTGAC CGGAAGCTAA AACCTCTCAT CTACGTATAC AGCTTTCCAG 

351 GATTGACCAG AAGG AC C AAA TTGAAAGGCA ACATTCTCCT GGACTACACT 

401 TTACTTTCAT TCAGTTACTG TGGCACCTAC CTGGCTAGTT ACTCCTCTCT 

451 CCCAGAATTT GAACTGGCCC TTTGGAACTG GGAATCGAGT ATCATTTTGT 

501 GTAAGAAATC ACAGCCTGGA ATGGATGTGA ACCAAATGTC TTTTAACCCC 

551 ATGAACTGGC GCCAGCTGTG CTTATCAAGT CCAAGTACAG TGAGCGTGTG 

601 GACCATTGAA AGAAGTAACC AGGAGCATTG TTTCAGAGCA AGGTCGGTGA 

651 AATTACCTCT AGAAGATGGG TCATTTTTTA ATGAAACGGA TGTCGTTTTC 

701 CCCCAGTCGT TGCCGAAAGA TCTCATCTAT GGTCCCGTGC TGCCACTGTC 

751 AGCCATTGCC GGGCTGGTAG GCAAAGAGGC AGAGACTTTC CGGCCGAAAG 

801 ATGATCTATA TCCTTTGCTT CACCCGACTA TGCATTGCTG GACTCCAACA 

851 AGTGACTTGT ACATTGGCTG TGAAGAGGGT CATCTTTTAA TGATTAATGG 

901 AGACACCTTG CAAGTGACTG TACTTAATAA GATAGAAGAG GAATCGCCAT 

951 TGGAAGACAG AAGAAATTTT ATCAGTCCAG TAACCTTGGT ATATCAGAAG 

1001 GAGGGCGTGC TGGCTTCTGG AATTGATGGC TTTGTGTATT CTTTTATTAT 

1051 TAAAGATAGA AGTTACATGA TCGAGGATTT TCTTGAGATT GAAAGACCTG 

1101 TAGAACATAT GACATTTTCT CCCAATTATA CAGTGTTGCT GATTCAAACA 

1151 GACAAGGGAT CTGTTTATAT CTACACTTTT GGTAAGGAGC CAACCTTAAA 

1201 TAAAGTCCTA GATGCTTGTG ATGGGAAATT TCAGGCAATT GACTTTATCA 

1251 CACCTGGAAC CCAATACTTC ATGACACTTA CATATTCAGG GGAAATTTGT 

1301 GTTTGGTGGC TGGAGGATTG TGCTTGTGTA AGCAAGATTT ATCTGAATAC 

1351 CCTAGCAACG GTTCTGGCTT GCTGTCCATC CTCCCTCTCT GCAGCCGTGG 

1401 GCACGGAGGA TGGCTCGGTC TACTTCATCA GCGTATATGA TAAGGAATCC 

14 51 CCTCAGGTCG TGCACAAGGC CTTTCTCTCG GAATCGTCCG TGCAGCACGT 

1501 CGTGTAAGTC CTTTCTGCCT CCAGGAGCGG CTCCGTGTCA CACCCGTCTG 

1551 TTGAAAATTC TAGTGAAGCC ATCCTTTCTT TTAATTTTAA GTTTTACGTG 

1601 TTTCATTTGT TTTGAATGTT AATATATTCA CACAGTTCAA CACTCAAAAG 

1651 GTACAGAGGG CTGTGTAGTA AAGTACCCCC CATACCCAGG TCTGTCCTTG 

1701 CAGGCAGCCT GGTACCAATT TCTCATGTCT CTCCTGAGAT GTTTTATCCA 

1751 TGAACAAGCA AAACATAATA AGCACTTCTT TTTACTTGTA TCAATGGCCA 

1801 TCATGTGTGT ATAGTGTGCC AGGCACTTCT GCTGTATTAA CTCCATGAGG 

1851 TAAACACTCT TGTTGTCTCT ATTTGACAGG TGAGGAAGAT AAGGCACAAG 

1901 GATTTTAAAT AACTTGCTCA ATAGTACACA GATAGTGAAT GGCAAATGTT 

1951 GGGATTTGAA CCCAGGTAGT TGGGCTGCAG AGTCACTGCC TTTGCTCTTA 

2001 AAAGGAGAAA ACTATGTACA ATGCCTCATT TCTTTTTTCA CTTAATCGTA 

2051 TATCTTGGAG AATGTTTTAT ATCCACACAT AAAGACCAGC CTGATTATTT 

2101 GTATAGCCAC ATAGTATTCC ATTATATGAA TATACTATCA TTTTTTAAAA 

2151 ACGGTATATT AATGAACATT TAGAGTATTT CAAAACTTTT GAAGCAATAC 

2201 TTTTAAGATG ATAATATAGA GACATTAGAT TTGGACTTGT AGGTGCTATC 

2251 ATTATTACTG TTTCTTTTTA ATTTATTATA TTATTAGGTA TTAATAAGAA 

2301 CAGACATTTG TATTCTGCTT TACAGCTTGA GATCACTGTA GCTTGTGGCA 

2 351 TGTGATCCTC AAAACACCAG TCAGAAAGGT GTTATTCTTA TCCCTATTAG 

2401 ACAAATTAGG GAATTCAGGG TTAGAGAGGT GAGGAAAAGC ATTGTCCAAG 

2 4 51 ATTACACATT ACACAGCTAG CACACTGAGG AGCTGGCCCT GCCACTGTGG 

2501 ACTGCCCAGC TCCACCACCC TAGCTCAGTG GGGAAGGATG GATAACCTCC 

2551 TTCCATTTAC CCCCTGCCTT TCTGCACTGT CATTTTTTTG TGCCTTTCCT 

2601 TTCTCAGATC CTCTTATTCT AATTTACATC TTCCCACTTT TTCTAATTTG 

2 651 ATAAAGTTGT AGACATGTTT CACTACATTC TTCCTCCCAC TGCCAGGTAC 

2701 CAGACACAGG GTAATGAAAT GTCACACCCA CCACTAATTT GAGAATTGCT 
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2751 TATTTGCGCT TGAAACATCA AGAAAGCTCT ACCGACAGAC ATGTTTCATT 
2801 CACTTATGAT GAACCAACTG CCCATCTTTA CTGAATCTTC TTGACTGTAT 
2851 TTATTAAAGT TGCAATTTGG AAATAAAAAA AAAAAAAAAA AAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 269 bp to 1504 bp; peptide length: 412 
Category: putative protein 
Classification: no clue 



1 MATNIPCEVV AFSDRKLKPL I YVYSFPGLT RRTKLKGNIL LDYTLLSFSY 
51 CGTYLASYSS LPEFELALWN WESSIILCKK SQPGMDVNQM SFNPMNWRQL 
101 CLSSPSTVSV WTIERSNQEH CFRARSVKLP LEDGSFFNET DVVFPQSLPK 
151 DLIYGPVLPL SAIAGLVGKE AETFRPKDDL YPLLHPTMHC WTPTSDLYIG 
201 CEEGHLLMIN GDTLQVTVLN KIEEESPLED RRNFTSPVTL VYQKF.GVLAS 
251 GIDGFVYSFI IKDRSYMIED FLEIERPVEH MTFSPNYTVL LIQTDKGSVY 
301 IYTFGKEPTL NKVLDACDGK FQAIDFITPG TQYFMTLTYS GEICVWWLED 
351 CACVSKIYLN TLATVLACCP SSLSAAVGTE DGSVYFISVY DKESPQVVHK 
401 AFLSESSVQH VV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8p7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_8p7, frame 2 



Report for DKFZphtes3_8p7 . 2 



[LENGTH] 412 

[MW] 46476.62 

[pi] 4.91 

[KW] Alpha_Beta 



SEO MATNI PCEVVAFSDRKLK PL I YVYSFPGLT RRTKLKGN I LLDYTLLSFSYCGTYLASYSS 

PRD cccccceeeeeecccccceeeeeecccccccccccchhhhhhhheeeecccccccccccc 

SEQ LPEFELALWNWESSIILCKKSQPGMDVNQMSFNPMNWRQLCLSSPSTVSVWTIERSNQEH 

PRD cchhhhhhhhccccceeeccccccccceeeccccccceeeeeccccceeeeeeeecchhh 

SEQ CFRARSVKLPLEDGSFFNETDVVFPQSLPKDLIYGPVLPLSAIAGLVGKEAETFRPKDDL 

PRD hhhhhhhcccccccccccccccccccccccccccccccceeeeeeccccccccccccccc 

SEQ YPLLHPTMHCWTPTSDLYIGCEEGHLLMINGDTLQVTVLNKIEEESPLEDRRNFISPVTL 

PRD cccccccccccccccceeeecccceeeecccceeeeeehhhhhcccccccccccccccee 

SEQ VYQKEGVLASG I DGFVYSFI IKDRSYMIED FLEIERPVEHMTFSPNYTVLLIQTDKGSVY 

PRD eeeceeeeecccceeeeeeeeeccchhhhhhhhhhcccceeeccccceeeeeecccccee 

SEQ IYTFGKEPTLNKVLDACDGKFQAIDFITPGTQYFMTLTYSGEICVWWLEDCACVSKIYLN 

PRD eeeccccccchhhhhcccccceeeeeccccceeeeeeeccceeeeeeecceeeeeeeehh 

SEQ TLATVLACCPSSLSAAVGTEDGSVYFISVYDKESPQVVHKAFLSESSVQHVV 

PRD hhhhhhhccccccceeeeccccceeeeeeeccccccchhhhhhhcccccccc 



(No Prosite data available for DKFZphtes3_8p7 . 2) 
(No Pfam data available for DKFZphtes3_8p7 . 2 ) 
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DKFZphtes3_9e22 



group: testes derived 

DKFZphtes3_9e22 encodes a novel 227 amino acid protein with weak partial similarity to Ring- 
finger proteins. 

For the novel protein, Pfam, but not Prosite predicts a C3HC4 type RING finger motife. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to zinc finger proteins 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1318 bp 

Poly A stretch at pos . 1308, no polyadenylation signal found 



1 GCTCCCCCGG CTTTCGGAGC 

51 CGCCGGACTG CGCCTCTTTG 

101 GATCGTTTGA AATTCTAAGT 

151 CCGCCCCGCG GGTTTTTTCC 

201 CCTCCGGGTC TCCTTTTTGA 

251 CCCCCTGCTG CTGAGAAGTG 
301. CCACCGGGGC CCGGGCGAGC 

351 TCCCGGGGCC CCTTCCCGGG 

401 GCCGGGAGGG GCGCCCCATT 

451 TGGGGCTGCG CAGCCGCTCG 

501 CCCAGCACGG CCGGGGGGGT 

551 GGGCACCGGC GACTCCGAGA 

601 ACTCCACCTA TGCCCATGGC 

651 CATAGAGACG GGATGCTGTA 

701 TCTACCTCTG CACATCGCAC 

751 AGTGCCCCAT TTGCTCCAAG 

801 TTTATAATGT GTTTGAGCAA 

851 GACTAAAGAC GCGGGTGAGT 

901 GGGACACGAT AGCCAGGCTG 

951 ATAGACTCGT GGTTTGAAGT 

1001 CTGACCTGCG GGCTTGCTTG 

1051 TGCTCCAGGG AGGAGGCTCA 

1101 CACCAGCGGG AACAGGGCAC 

1151 CTCCCTTCCT CCCTGAGGAC 

1201 AGAATGAATC AACTGCTATC 

1251 AGGGCATTTT CTTTTTCATC 

1301 GTGTTTACAA AAAAAAAA 



CCGGGGGCGG CCTGTGGCGC GCGGAGCCCG 
GACCTTGAGG GGAAACATGC GTTTGCCTTG 
TTGGGATCCC CGCCCGCCCG CCTGCCTCTT 
TTTTTTCCTT TTGCTTTTTT TCCTTTTCTC 
CTCCCTCCCC CTTTATGCTC GCCCAGCCCT 
GGGGAGGGTC TCGGCCTCCA GGTTCCCGCC 
ATGGGGGGCA AGCAGAGCAC GGCGGCCCGC 
GGTCTCCACC GATGACAGCG CCGTGCCGCC 
TCGGGCACTA CCGGACGGGC GGCGGGGCCA 
GTCAGCTCGG TGGCAGGCAT GGGCATGGAC 
GCCCTTTGGC CTCTACACCC CCGCCTCCCG 
GGGCGCCCGG CGGCGGAGGG TCTGCGTCCG 
AATGGTTACC AGGAGACGGG CGGCGGTCAC 
CCTGGGCTCC CGAGCCTCGC TGGCGGATGC 
CCAGGTGGTT CAGCTCGCAT AGTGGTTTCA 
TCTGTGGCTT CTGACGAGAT GGAAATGCAC 
ACCTCGCCTC TCCTACAACG ATGATGTGCT 
GTGTGATCTG CCTGGAGGAG CTGCTGCAGG 
CCCTGCCTGT GCATCTATCA CAAAAGCTGC 
GAACAGATCT TGTCCGGAAC ACCCTGCGGA 
CTGACTCCTC TCAAAGGGAC AGAGCGCCCC 
CCGGACCCTG GGGCAGAGCT GAGCTTGGGA 
CCCTTCTGCA CTGACTTCCA GATCATGGTT 
ACCAAATTGG ATGAGAGCAA GTTTGAGAGA 
CTTCCCCTCA CCCCTCAGCC CAGGAGGGAA 
TTTGAAAGGC ATTGTGGGTC TGTCTTTAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 321 bp to 1001 bp; peptide length: 227 
Category: similarity to known protein 
Classification: unclassified 



1 MGGKQSTAAR SRGPFPGVST DDSAVPPPGG APHFGHYRTG GGAMGLRSRS 

51 VSSVAGMGMD PSTAGGVPFG LYTPASRGTG DSERAPGGGG SASDSTYAHG 

101 NGYQETGGGH HRDGMLYLGS RASLADALPL HIAPRWFSSH SGFKCPICSK 

151 SVASDEMEMH FIMCLSKPRL SYNDDVLTKD AGECVICLEE LLQGDTIARL 
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201 PCLCIYHKSC IDSWFEVNRS CPEHPAD 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9e22, frame 3 

TREMBL:AF078823_1 product: "RING-H2 finger protein RHA2b"; Arabidopsis 
thaliana RING-H2 finger protein RHA2b raRNA, complete cds., N = 1, Score 
= 111, P = 2.8e-06 

TREMBL:AF078822_1 product: "RING-H2 finger protein RHA2a"; Arabidopsis 
thaliana RING-H2 finger protein RHA2a mRNA, complete cds., N - 1, Score 
= 112, P = 6.6e-06 

TREMBL:AC004138_14 gene: "T17M13 . 17"; Arabidopsis thaliana chromosome 
II BAC T17M13 genomic sequence, complete sequence., N = 2, Score = 123, 
P = 1.4e-05 

PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana, N = 1 , 
Score = 142, P = 8.8e-08 



>PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 
Length = 327 

HSPs: 

Score = 142 (21.3 bits), Expect = 8.8e-08, P = 8.8e-08 
Identities = 24/57 (42%), Positives = 30/57 (52%) 

Query: 166 SKPRLSYNDDVLTKDAGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCP 222 

S P + LT D +C +C+EE + G LPC IYHK CI W + N SCP 

Sbjct: 206 SLPS VKITPQHLTNDMSQCTVCMEEFI VGGDATELPCKHI YHKDCIVPWLRLNNSCP 262 



Pedant information for DKFZphtes3_9e22 , frame 3 



Report for DKFZphtes3_9e22 .3 



[LENGTH] 
[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

[FUNCAT] 

0.001 

[ FUNCAT ] 

[PFAM] 

[KW] 



227 

23782.62 
6. 18 

PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 2e-08 

99 unclassified proteins [S. cerevisiae, YDR313c] 4e-06 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOL013c] 

06.13 proteolysis [S. cerevisiae, YOL013c] 0.001 

Zinc finger, C3HC4 type (RING finger) 

Irregular 



SEQ MGGKQSTAARSRGPFPGVSTDDSAVPPPGGAPHFGHYRTGGGAMGLRSRSVSSVAGMGMD 

prd cccccccccccccccccccccccccccccccccccccccccccccccccceeeccccccc 

SEQ PSTAGGVPFGLYTPASRGTGDSERAPGGGGSASDSTYAHGNGYQETGGGHHRDGMLYLGS 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeech 

SEQ RASLADALPLHIAPRWFSSHSGFKCPICSKSVASDEMEMHFIMCLSKPRLSYNDDVLTKD 

PRD hhhhhhhhceeecccccccccccccccccccchhhhhhhhhhhhcccccccccccccccc 

SEQ AGECVICLEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEHPAD 

PRD cceeeeeecccccccccccccceeeeeeccchhhhhhhhcccccccc 



(No Prosite data available for DKFZphtes3_9e22 . 3) 



Pfam for DKFZphtes3_9e22 . 3 



HMMJJAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C IC L+++ D++ LPC+ ++ ++CI +W CP+ 

Query 184 CVIC LEELLQGDTIARLPCLCIYHKSCIDSWFEVNRSCPEH 224 
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DKFZphtes3_9i20 
group: testes derived 

DKFZphtes3_9i20 encodes a novel 205 amino acid protein with similarity to human KIAA0336 gene. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 

Locus: /map="44.1 cR from top of Chrl7 linkage group" 
Insert length: 2509 bp 

Poly A stretch at pos . 2499, polyadenylation signal at pos. 2481 

1 CTCGCCGAGA TGACCTGGGC ACCTCTGCGT TGAATCGGCA AATACTGATC 

51 AAGCCGCATT TATTCTGCTC TCAGGAACTC TAAGTCTAGC AGAGAAGATG 

101 AGGCGGTAGA AGTTCATCAA TGGCTTGGCT GGAGGACAAG CAAATTGAGG 

151 ACATTGGCAA CGGAGTGATC AAAATGATAG ATCATGAGGC CTAAAATGAA 

201 TAAGGAAAGA AGAGAAGTGG CAGAGGCTGA GAACAGAAAG AGAGGGTGGA 

251 GGGGCTGTAA ATCTTGAAGA TTAGGGTATA ATATGAGTAT ATGGGTAAGA 

301 ATTGGAAGAA TTGTGTAGGA GGCAGTAGTC AAAAAGTAGA AGCAGTTTGG 

351 AAGAGTAGTT ACAAATATCA AGAGCCAGGT GGCTAAAAGG TGGAGCTATA 

401 GGTCATTGAA GCTCAAGAAA CTGAGTCTCT AGGGCATTGG TTAAGTCATC 

451 TGTCTAGACT TCAAAGTTGT CTAGGATGAT AATTCAGAAG ACTGATCTGT 

501 GCCAAAGTCA CAGGTTTTTC ACGACTGAAA ACAACATAGC AAAATAAGCC 

551 AAGATGTCTG TGGATCCAAT GACCTACGAG GCCCAGTTCT TTGGCTTCAC 

601 GCCACAAACG TGCATGCTTC GGATCTACAT TGCATTTCAA GACTACCTAT 

651 TTGAAGTGAT GCAGGCCGTT GAACAGGTTA TTCTGAAGAA GCTGGATGGC 

701 ATCCCAGACT GTGACATTAG CCCAGTGCAG ATTCGCAAAT GCACAGAGAA 

751 GTTTCTTTGC TTCATGAAAG GACATTTTGA TAACCTTTTT AGCAAAATGG 

801 AGCAACTGTT TTTGCAGCTG ATTTTACGTA TTCCCTCAAA CATCTTGCTT 

851 CCTGAAGATA AATGTAAGGA GACACCTTAT AGTGAGGAAG ATTTTCAGCA 

901 TCTCCAGAAA GAAATTGAAC AGTTACAGGA GAAGTACAAG ACTGAATTAT 

951 GTACTAAGCA GGCCCTTCTT GCAGAATTAG AAGAGCAAAA AATTGTTCAG 

1001 GCCAAACTCA AACAGACGTT GACTTTCTTT GATGAGCTTC ATAATGTTGG 

1051 CAGAGATCAT GGGACTAGTG ATTTTAGGGA GAGTTTAGTA TCCCTGGTTC 

1101 AGAACTCCAG AAAACTACAG AACATTAGAG ACAATGTGGA AAAGGAATCG 

1151 AAACGACTGA AAATATCTTA ATTGCTCAGT AGTCAAAAGG AGGAGCCTGT 

1201 CAAAAAGTAG AATCATAAGG ACTGTTCAAA CCATAAGGAC TGTTCAAATC 

1251 ATACCAGTGA CTGTTCAAAC CAACCATACT TTTTATTAGA TTTGCTTTGT 

1301 CAACTCTTTC TTGTATTCTG TGTTTTCCTC TTTTTTGGTC CACTTTGCTG 

1351 AGGTATGAAG TGTACTACTT TGAACTAGGC TGAAGCATCT GAGTCTTCTA 

1401 ATAAGTGGGA AGGGATCCAA CAAAGAAGCC ATGACCAGTT AAAGATATTT 

1451 GCAGAGTTAC ACCTTGGTCA TAAGTCCTTT GTGACCTTGA TTATTTTGGC 

1501 TTACTCTTTG GATGAGACCA GACAAGAAAA GGATTAAACG GGTGGCTCCT 

1551 TTAATATTAT TATTATTGTT TTTGAGACAA GGTCCCTTTC TGTCACCCAG 

1601 GTTAGAGTAG ATTTCAGTGG CACAATCTTG GCTCACTGCA ACCTCTGTGT 

1651 CCTGGGCTCA AGTGATCCTC CTGCCTCAGC CTCCCAAGTA GCTAGGACCA 

1701 CAGGTGCGTG TCACCATGCT TGGCTAATTT TTTTGCAGAA ACGAGGCCTC 

1751 ACTATATTGT CCAGGCTGAG TGGCTCTTTT ATTAACCAGT CATTACACTG 

1801 CGGAACAGCC AACATAGAGT ACTTGCTCTC GTCCTGTGAA TTTTCTTTCA 

1851 TGAGGGAGTC AATATGTAGT GGAAAGAAGC ATGTAGCAAA AAAGACAACC 

1901 TTGATCTTTA ATAAAAAAGA AGTTGGTTTA TTTCCAAAAT AAATCCCCTG 

1951 ACAAAAAACC TGGTGATGTT AAGCAATTGA CTGTCTTAGA GTCCAGCAGA 

2001 AGACCTTAGA CAAAAAAAGC AGAACCCACT GGAGTAGAAA AGGAAGCATG 

2051 TAGCATATAC TCAGTAGTGA AATTTAATTT TACTGACTGT TAGGTATCTA 

2101 TGCCAATTTG TTTTCATACT TCAGTTGGTT TTGGAATCTG CCTTATACCT 

2151 AATATTTATT TATTCACACT CATAAGCATC AAATATTTAA TGCCCTCAGT 

2201 GGGAAATTTG TGTTTAAACT CAATGGAATC TAATATTTCT TTATGTCGTT 

2251 AGTCCCTGTA AAATGTTAGG TCACCCAAGG AAAGGGGAGA AATAGCAATG 

2301 GTTGTTCCTA AGGTATTGCT TGCCCTCCAT GTCTTCCTAA AGAGCAGAAC 

2 351 TTGGAGTTTC TCCTTTATGT AGAGAAGAAG TAACTTAGGG TGTATTTGCA 

2401 ATGAAATATT CATAGATATT GAAAGCTTGT GTTTACATGA AATATGTTTA 

2451 TTATCAAGAA GTCCTTTTTC CAATTCTGTA CATTAAATAT ATGTGTTTTA 
2501 AAAAAAAAA 



BLAST Results 
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Entry AC004148 from database EMBL: 

Homo sapiens chromosome 17, clone HCIT524C5, complete sequence. 
Score = 5245, P = 0.0e+00, identities = 1049/1049 
3 exons 

Entry HS556361 from database EMBL: 
human STS TIGR-A003N29 . 

Score = 1005, P = 1.3e-39, identities = 201/201 

Entry HSG043 from database EMBL: 
human STS SHGC-36031. 
Score = 955, P = 2.8e-37, identities = 205/215 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 554 bp to 1168 bp; peptide length: 205 
Category: putative protein 
Classification: no clue 



1 MSVDPMT YEA QFFGFTPQTC MLRIYIAFQD YLFEVMQAVE QVILKKLDGI 
51 PDCDISPVQI RKCTEKFLCF MKGHFDNLFS KMEQLFLQLI I.RTPSNILLP 
101 EDKCKETPYS EEDFQHLQKE IEQLQEKYKT ELCTKQALLA ELEEQKIVQA 
151 KLKQTLTFFD ELHNVGRDHG TSDFRESLVS LVQNSRKLQN IRDNVEKESK 
201 RLKIS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9i20, frame 2 

tremblnew : HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 

complete cds . , N = 1, Score = 107, P = 0.0081 



>TREMBLNEW:HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 
complete cds . 

Length = 1,583 

HSPs: 

Score = 107 (16.1 bits), Expect = 8.2e-03, P = 8.1e-03 
Identities = 42/140 (30%), Positives = 76/140 (54%) 

Query: 65 EKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEED FQHLQKE 120 

EK CF+K H +NL +EQ +L R ILL +D ++P + D + L+++ 

Sbjct: 796 EKEKCFIKEH-ENLKPLLEQK--ELRDRRAELILL-KDSLAKSPSVKNDPLSSVKELEEK 851 

Query: 121 IEQLQE— KYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESL 178 

IE L++ K K E K L+A ++ +K + + K+T T +EL ++ + S+ 
Sbjct: 852 IENLEKECKEKEEKINKIKLVA-VKAKKELDSSRKETQTVKEELESLRSEK--DQLSASM 908 

Query: 179 VSLVQNSRKLQNIRDNVEKESKRLKI 204 

L+Q + +N+ EK+S++L + 
Sbjct: 909 RDLIQGAESYKNLLLEYEKQSEQLDV 934 



Pedant information for DKFZphtes3_9i20, frame 2 



Report for DKFZphtes3_9i20 . 2 



[LENGTH] 205 

[MW] 24140.13 

[pi] 5.51 

[KW] All_Alpha 

[KW] C0ILED_COIL 18.05 % 
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SEQ MSVDPMTYEAQFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVILKKLDGIPDCDISPVQI 

PRD cccccchhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

COILS 

SEQ RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEEDFQHLQKE 

PRD cccchhhhhhhcccccchhhhhhhhhhhhhhhcccceeeccccccccccchhhhhhhhhh 

COILS CCCCCCCCCC 

SEQ IEQLQEKYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESLVS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LVQNSRKLQNIRDNVEKESKRLKIS 

PRD hhcccchhhhhhhhhhhhhhhcccc 

COILS 

(No Prosite data available for DKFZphtes3_9i20 . 2 ) 
(No Pfara data available for DKFZphtes3_9i20 . 2) 
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PCT/IB00/01496 



group: testes derived 

DKFZphtes3_9k22 encodes a novel 304 amino acid protein with partial similarity to X. leavis 
katanin p80. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C-terminus of katanin p80 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2676 bp 

Poly A stretch at pos . 2665, no polyadenylation signal found 



1 CTCTCTAGGC TGCCGGGCGC TGGTCGTCAG CGCCGAGGCT GGGCTGAGGC 
51 GCCGCGGTAC CATGAGGCGC CGGTACTTAA GAGATTATGG CATCAGAAAC 
101 CCACAATGTT AAAAAACGGA ACTTTTGTAA TAAGATTGAG GATCATTTCA 
151 TTGATCTTCC TAGAAAAAAG ATCTCTAATT TCACTAATAA GAACATGAAG 
201 GAGGTTAAGA AATCTCCAAA ACAGTTGGCT GCTTACATAA ATAGAACAGT 
251 TGGACAAACT GTGAAAAGCC CAGATAAACT TCGTAAAGTG ATCTATCGCA 
301 GAAAGAAAGT TCATCATCCC TTTCCAAATC CTTGTTACAG AAAAAAACAG 
351 TCCCCTGGAA GTGGGGGCTG TGACATGGCA AATAAAGAAA ATGAACTGGC 
401 TTGTGCAGGC CACCTGCCTG AAAAATTACA CCATGATAGT CGAACATATT 
451 TGGTTAACTC CAGTGATTCT GGTTCTTCAC AGACAGAAAG CCCATCATCA 
501 AAATATAGTG GGTTTTTTTC TGAGGTTTCT CAGGACCATG AAACAATGGC 
551 CCAAGTTTTG TTCAGCAGGA ATATGAGATT GAATGTAGCT TTAACTTTCT 
601 GGAGAAAGAG AAGTATAAGT GAACTTGTAG CTTATTTGTT GAGGATAGAA 
651 GATCTTGGCG TTGTGGTAGA TTGCCTTCCT GTGCTCACCA ATTGTTTACA 
701 GGAAGAAAAA CAATATATCT CACTTGGCTG CTGTGTTGAC TTGTTGCCTC 
751 TAGTAAAGTC ACTACTTAAA AGCAAATTTG AAGAATATGT TATAGTTGGT 
801 TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGGTCAG AACTATCATC 
851 CAAAACAGAA ATTATAAATG ATGGAAATAT TCAAATTTTA AAACAACAAT 
901 TAAGTGGATT ATGGGAACAG GAAAACCATC TTACTTTGGT TCCAGGATAT 
951 ACTGGTAATA TAGCTAAGGA TGTAGATGCT TATTTATTAC AGTTACATTG 
1001 AGAGATTTCA TCTACTAAAG AGCATTTGGT TTTTCAAAAC ATCCCTGAAC 
1051 TGTATAATTT ACAAAAAAAA AAGTCTCGTC TGAGAACTGT GAACTGTGGA 
1101 AGAAATCAAA ACTATTTTTT CTTTTAAAAA GCCACGTAAT GAAACCACTA 
1151 ATGAAATCCC AGCAATCTGC TTCACATTGA AGTGGAAAAA TATCCAAAAG 
1201 GAGCAGCTTC AATTTCATTG AGGTGAAAGT GCACTATGAA GATTGTTCAC 
1251 CTTTGCTGCA TTTGGGAGTT ATATGGTTAT TTGGTAACAT TAAGAACTAC 
1301 TGGATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATGTGAAAA 
1351 AATAAGACAG GACTTACCAC TAGGAACCAC CAAGACCAAT CATCATTAAC 
1401 TTTTTTAAGA TTGTGTTTTA TTAAAAAAAA AAAACACTTA AATGTGTGCA 
1451 GCTATTTTCT TATGTTGAAA AGACTGAAAG TTTAAAACAT GAAAAAAATC 
1501 AATATTAAAC ATTTTTTGTT CACACTGAGA TACTGTGTAT GTAAAATGCC 
1551 TTAATTATTA ATAAGCCAAT GTGTTATGAT ACCAATATCT GTTTTAAAAA 
1601 ACTAAAACCA ACCATGCTTC TGGCATGATA AAATCATGGA ATTAAATCAG 
1651 GGGTTTACAT TCTTGTAGAG TGTTCTTGAA ACACTCTCTG CACCATTTTT 
1701 AAAACTTGAG AATAGTTTTA GTATCTCTGA TATTTTTTGC CAGAATCATC 
1751 ATGTCATGTA TGAATGTGTT ATCCCTATCT AAGGAAAAAG GTGAATATGT 
1801 TTTTGTATGA ATGTTTAACT GGAAATGTCC ATGGACTTGG CTAATTTATA 
1851 TTTACTTTTT ATTGTACATA GATTTCTAAT ATTTTTCATT CCTGTATCAT 
1901 TTAAACTTCC TTCATTTGAG TAAATTCACT AAATATTTCT ATTTTTTTGC 
1951 TTTTTTAAAT TCTGATTTTA TATGAATTCT AATTCTTTTT CACTACATAT 
2001 GTTTTAAAGA GTTACATACA GTGATTTAGA ATGGTTTACA GTTAATGCTG 
2051 ATCTTGTATT TTAAATTCCA ACACTTTGTG TCACTACCTC CTCTAATGGT 
2101 TAGTATGATA TGCTAGCAGA CTGTATGAGG TCTTTTTTTA AAATACCACT 
2151 TTTAGTGTCA GTGAACCAAA TTCTGGAATG TCTTAACAGC TCTAAATCTT 
2201 ACTTGTCTTG AAAATGATTG GGGTTTAATA CCACTGCTGG TGGTTCACAC 
2251 ATCATCCCAT CCTTAATATG CCTGACAGGC ATCTGAGCAA AGGTTTTTAG 
2301 TAATTGAATT TCTCTGCAGT AGTCCTTCAA GCACTTGAAT GTAAACCTTT 
2351 AGCATTTATT CGTTTAATGA CTACTGATAC GAATCTCAAG CAGATTTCTT 
2401 GCTCTTAAAA GTTATGTTTC ACTGAGTTCT GGTTTTGTGT AGCTATATTT 
24 51 TATATAGCTA GATATTCCTC ACAGTGAACA TGAATTGTAA TAATTGGTTA 
2501 TTTCCTTAAG TCTTTAGATT ATAATAATTT CAGATTATTG CACGTCTGTG 
2551 ATTTGAGAGG TGAGTTATTT AAGAGGCCAG TTTTCAGGAC ATGGGAATTT 
2 601 GAATTGTAAA CCTGTTATCT CTGTGAAACT TTTAACATGA TAAAATATAA 
2651 CCTTTCTTTG TGCTTAAAAA AAA AAA 
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BLAST Results 



Entry HS541354 from database EMBL: 
human STS WI-11840. 
Score = 1267, P = 7.1e-50, identities = 271/281 



Medline entries 



98227670: 

Katanin, a microtubule-severing protein, is a novel AAA ATPase 
that targets to the centrosome using a WD40-containing subunit. 



Peptide information for frame 3 



ORF from 87 bp to 998 bp; peptide length: 304 
Category: similarity to known protein 
Classification: unclassified 



1 MASETHNVKK 

51 INRTVGQTVK 

101 ENELACAGHL 

151 HETMAQVLFS 

201 TNCLQEEKQY 

251 SELSSKTEII 

301 LQLH 



RNFCNKIEDH 
SPDKLRKVIY 
PEKLHHDSRT 
RNMRLNVALT 
ISLGCCVDLL 
HDGNIQILKQ 



FIDLPRKKIS 
RRKKVHHPFP 
YLVNSSDSGS 
FWRKRSISEL 
PLVKSLLKSK 
QLSGLWEQEN 



NFTNKNMKEV 
NPCYRKKQSP 
SQTESPSSKY 
VAYLLRIEDL 
FEEYVIVGLN 
HLTLVPGYTG 



KKSPKQLAAY 
GSGGCDMANK 
SGFFSEVSQD 
GVVVDCLPVL 
WLQAVIKRWW 
NIAKDVDAYL 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9k22, frame 3 

TREMBL:AF056021_1 product: "p80 katanin"; Xenopus laevis p80 katanin 

mRNA, partial cds . , N = 1 , Score = 146, P = 1.2e-07 

trembl : AF052 432_1 product: "katanin p80 subunit"; Homo sapiens katanin 

p80 subunit mRNA, complete cds., N = 1, Score = 150, P = 1.2e-07 

TREMBL:AF052433_1 product: "katanin p80 subunit"; Strongylocentrotus 
purpuratus katanin p80 subunit mRNA, complete cds., N = 2, Score = 146, 
P = 4.2e-07 



>TREMBL: AF052432_1 product: "katanin p80 subunit"; Homo sapiens katanin p80 
subunit mRNA, complete cds. 
Length - 655 

HSPs: 

Score = 150 (22.5 bits), Expect = 1.2e-07, P = 1.2e-07 
Identities = 35/105 (33%), Positives = 55/105 (52%) 

Query: 145 SEVSODHETMAQVLFS RNMRLNVALT FWRKRSISELVAYLLRIEDLGVVVDCLPVLTNCL 204 

S++ + H+TM VL SR+ L+ W I V + I DL VVVD L N + 

Sbjct: 489 SQIRKGHDTMCVVLTSRHKNLDTVRAVWTMGDIKTSVDSAVAINDLSVVVDLL NIV 544 

Query: 205 QEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLNWLQAVIKRW 249 

++ L C +LP ++ LL+SK+E YV G L+ +++R+ 

Sbjct: 545 NQKASLWKLDLCTTVLPQIEKLLQSKYESYVQTGCTSLKLILQRF 589 



Pedant information for DKFZphtes3_9k22, frame 3 



Report for DKFZphtes3_9k22 . 3 



[LENGTH] 304 

[MW] 34767.24 

[pi] 9.18 

[KW] All_Alpha 
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[KWJ LOWCOMPLEXITY 3.95 % 

SEQ MASETHNVKKRNFCNKIEDHFIDLPRKKISNFTNKNMKEVKKSPKQLAAYINRTVGQTVK 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccccc 

SEQ S PDKLRKVI YRRKKVHHPFPNPCYRKKQS PGSGGCDMANKENELACAGHLPEKLHHDSRT 

SEG 

PRD ccchhhhhhhhhhhcccccccccccccccccccccccccchhhhhhccccccccccccce 

SEQ YLVNSSDSGSSQTESPSSKYSGFFSEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISEL 

SEG 

PRD eeecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VAYLLRIEDLGVVVDCLPVLTNCLQEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhcceeeeeeccchhhhhhhhceeeccceeeehhhhhhhhhhhheeeeeeehh 

SEQ WLQAVIKRWWSELSSKTEI INDGNIQI LKQQLSGLWEQENHLTLVPGYTGN1 AKDVDAYL 

SEG 

PRD hhhhhhhhhhhhcccceeeeccccccccccccchhhhhhhhhhccccccccchhhhhhhh 

SEQ LQLH 

SEG .... 

PRD hccc 

(No Prosite data available (or DKFZphtes3_9k22 . 3) 
(No Pfam data available for DKFZphtes3_9k22 . 3 ) 
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Prosite Key 

NAME: N-glycosylation site. 
CONSENSUS: N-{P}-[ST]-{P}. 

NAME: Glycosaminoglycan attachment site. 
CONSENSUS: S-G-x-G. 

NAME: Tyrosine sulfation site. 

NAME: cAMP- and cGMP-dependent protein kinase phosphorylation site. 
CONSENSUS: [RK](2)-x-[ST]. 

NAME: Protein kinase C phosphorylation site. 
CONSENSUS: [ST]-x-[RK]. 

NAME: Casein kinase II phosphorylation site. 
CONSENSUS: [ST]-x(2)-[DE]. 

NAME: Tyrosine kinase phosphorylation site. 
CONSENSUS: [RK]-x(2,3)-[DE]-x(2,3)-Y. 

NAME: N-myristoylation site. 

CONSENSUS: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P}. 

NAME: Amidation site. 
CONSENSUS: x-G-fRKl-[RKl. 

NAME: Aspartic acid and asparagine hydroxylation site. 
CONSENSUS: C-x-[DN]-x(4)-[FY]-x-C-x-C. 

NAME: Vitamin K-dependent carboxylation domain. 

CONSENSUS: x(12)-E-x(3)-E-x-C-x(6)-[DEN]-x-[LIVMFY]-x(9)-[FYW]. 
NAME: Phosphopantetheine attachment site. 

CONSENSUS: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-[LIVMFYAG]-[DNEKHS]-S-[LIVMST]- 
CONSENSUS: {PCFY}-[STAGCPQLIVMF]-[L[VMATN]-[DENQGTAKRHLM]-ILIVMWSTA]-[LIVGSTACRJ- 
CONSENSUS: x(2)-[LIVMFA) . 

NAME: Acyl carrier protein phosphopantetheine domain profile. 

NAME: Prokaryotic membrane lipoprotein lipid attachment site. 

CONSENSUS: {DERK}(6)-[LIVMFWSTAGl(2)-[LiVMFYSTAGCQl-[AGSl-C. 

NAME: Prokaryotic N-terminal methylation site. 

CONSENSUS: [KPJIEQSTAG]-G-[F^IVM]-[STl-[LT]-[LrVP]-E-[UVMFWSTAGK14). 

NAME: Prenyl group binding site (CAAX box). 
CONSENSUS: C-{DENQ}-[LIVM]-x> . 

NAME: Protein splicing signature. 

CONSENSUS: [DNEG]-x-[LIVFA]-[LIVMY]-[LVAST]-H-N-[STC]. 

NAME: Endoplasmic reticulum targeting sequence. 
CONSENSUS: [KRHQSA]-[DENQ]-E-L> . 

NAME: Microbodies C-terminal targeting signal. 
CONSENSUS: [STAGCN]-[RKH]-[LIVMAFY]> . 

NAME: Gram-positive cocci surface proteins 'anchoring' hexapeptide. 
CONSENSUS: L-P-x-T-G-[STGAVDE]. 

NAME: Bipartite nuclear targeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D. 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: [AG]-x(4)-G-K-[ST]. 

NAME: Cyclic nucleotide-binding domain signature 1. 

CONSENSUS: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G. 
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NAME: Cyclic nucleotide-binding domain signature 2. 

CONSENSUS: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5, 1 l)-R-[STAQ]-A-x-[LIVMA]-x-[STACV] . 
NAME: cAMP/cGMP binding motif. 
NAME: EF-hand calcium-binding domain. 

CONSENSUS: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LlVMC]-[DENQSTAGCJ-x(2)- 
CONSENSUS: [DE]-[LIVMFYW]. 

NAME: Actinin-type actin-binding domain signature 1. 
CONSENSUS: [EQ]-x(2)-[ATV]-[FY]-x(2)-W-x-N. 

NAME: Actinin-type actin-binding domain signature 2. 

CONSENSUS: [LIVM]-x-[SGN]-[LIVM]-[DAGHE]-fSAG]-x-[DNEAG]-[LIVM]-x-[DEAG]-x(4)- 
CONSENSUS: [LIVM]-x-[LMJ-[SAG]-[LIVM]-[LiVMT]-W-x-[LIVM](2). 

NAME: Anaphylatoxin domain signature. 

CONSENSUS: [CSH]-C-x(2)-[GAP]-x(7,8)-[GASTDEQR]-C-[GASTDEQL]-x(3,9)-[GASTDEQN]-x(2>- 
CONSENSUS: [CE]-x(6,7)-C-C. 

NAME: Anaphylatoxin domain profile. 

NAME: Apple domain. 

CONSENSUS: C-x(3)-rLIVMFY]-x(5)-[LIVMFY]-x(3)-[DENQ]-[LIVMFY]-x(10)-C-x(3)-C-T- 
CONSENSUS: x(4)-C-x-[LIVMFY]-F-x-(FY]-x(13,14)-C-x-[LIVMFY]-[RK]-x-[ST]-x(14,15)- 
CONSENSUS: S-G-x-[ST]-[LiVMFY]-x(2)-C. 

NAME: Band 4.1 family domain signature 1. 

CONSENSUS: W-[LIV]-x(3)-[KRQ]-x-[LIVM]-x(2)-[QH]-x(0,2)-[UVMF]-x(6.8)-[LIVMF]- 
CONSENSUS: x(3,5)-F-[FY]-x(2)-[DENS] . 

NAME: Band 4.1 family domain signature 2. 

CONSENSUS: rHYWl-x(9)-[DENQSTVl-rSA]-x(3)-[FY|-[LiVM]-x(2)-[ACVl-x(2)-[LMl-x(2)- 
CONSENSUS: [FY]-G-x-[DENQST]-[LIVMFYS] . 

NAME: Band 4.1 family domain profile. 

NAME: Clq domain signature. 

CONSENSUS: F-x(5)-[ND]-x(4)-[FWL]-x(6)-F-x(5H3-x-Y-x-F-x-[FY). 

NAME: C-terminal cystine knot signature. 

CONSENSUS: C-C-x(13)-C-x(2)-[GN]-x(12)-C-x-C-x(2.4)-C. 

NAME: C-terminal cystine knot profile. 

NAME: CUB domain profile. 

NAME: Death domain profile. 

NAME: EGF-like domain signature 1. 
CONSENSUS: C-x-C-x(5)-G-x(2)-C. 

NAME: EGF-like domain signature 2. 
CONSENSUS: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C. 

NAME: Calcium-binding EGF-like domain pattern signature. 

CONSENSUS: [DEQN]-x-[DEQN](2)-C-x(3,14)-C-x(3,7)-C-x-[DN]-x(4)-[FY]-x-C. 

NAME: Laminin-type EGF-like (LE) domain signature. 

CONSENSUS: C-x(l,2)-C-x(5)-G-x(2)-C-x(2)-C-x(3,4)-lFYWJ-x(3,15)-C. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 1 . 

CONSENSUS: [GAS]-W-x(7,15)-[FYW]-[LiV]-x-[LIVFAl-[GSTDEN]-x(6)-[LIVF]-x(2)-[IV]-x- 
CONSENSUS: [LIVT]-[QKM]-G. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 2. 
CONSENSUS: P-x(8, 10)-[LM]-R-x-[GE]-[LIVP]-x-G-C. 

NAME: Forkhead-associated (FHA) domain profile. 

NAME: Fibrinogen beta and gamma chains C-terminal domain signature. 
CONSENSUS: W-W-[LiVMFYW]-x(2)-C-x(2)-[GSA]-x(2)-N-G. 

NAME: Type I fibronecnn domain. 



1025 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 

CONSENSUS: C-x(6,8)-[LFY]-x(5)-[FYW]-x-[RK]-x(8,10)-C-x-C-x(6,9)-C. 
NAME: Type II fibronectin collagen-binding domain. 

CONSENSUS: C-x(2)-P-F-x-[FYWI]-x(7)-C-x(8,10)-W-C-x(4)-[DNSR]-[FYW]-x(3,5)-[FYW]-x- 
CONSENSUS: [FYWI]-C. 

NAME: Hemopexin domain signature. 

CONSENSUS: [LIFAT]-x(3)-W-x(2,3)-[PE]-x(2)-[LIVMFY]-[DENQS]-[STA]-[AV]-[LiVMFYl. 

NAME: Kringle domain signature. 
CONSENSUS: [FY]-C-R-N-P-[DNR]. 

NAME: Kringle domain profile. 

NAME: LDL-receptor class A (LDLRA) domain signature. 

CONSENSUS: C-[VILMAJ-x(5)-C-[DNH]-x(3)-[DENQHT]-C-x(3,4)-[STADE]-[DEH]-[DEJ-x(l,5)- 
CONSENSUS: C. 

NAME: LDL-receptor class A (LDLRA) domain profile. 
NAME: C-type lectin domain signature. 

CONSENSUS: C-[LIVMFYATG]-x(5, 12)-[WL]-x-[DNSR]-x(2)-C-x(5,6)-[FYWLIVSTA]-[LIVMSTA]- 
CONSENSUS: C. 

NAME: C-type lectin domain profile. 

NAME: Link domain signature. 

CONSENSUS: C-x(15)-A-x(3,4)-G-x(3)-C-x(2)-G-x(8,9)-P-x(7)-C. 
NAME: Osteonectin domain signature 1 . 

CONSENSUS: C-x-[DN] x(2)-C-x(2)-G-[KRH]-x-C-x(6,7)-P-x C-x-C-x(3,5)-C-P. 

NAME: Osteonectin domain signature 2. 
CONSENSUS: F-P-x-R-[IM]-x-D-W-L-x-[NQ] . 

NAME: Somatomedin B domain signature. 

CONSENSUS: C-x-C-x(3)-C-x(5)-C-C-x-[DN]-[FY] x(3) C . 

NAME: Thyroglobulin type-1 repeat signature. 

CONSENSUS: [FYWHP]-x-P-x-C-x(3,4)-G-x-[FYW]-x(3)-Q-C-x(4,10)-C-[FYW]-C-V-x(3,4)- 
CONSENSUS: [SG]. 

NAME: P-type 'Trefoil' domain signature. 

CONSENSUS: R-x(2)-C-x-[FYPSTJ-x(3,4)-[ST| -x<3) -C-x(4) C C-[FYWHJ. 
NAME: Cellulose-binding domain, bacterial type. 

CONSENSUS: W-N-[STAGR]-[STDN]-[LIVM1 x(2)-[GST]-x-[GST)-x(2)-[LIVMFT]-[GAl. 
NAME: Cellulose-binding domain, fungal type. 

CONSENSUS: C-G-G-x(4,7)-G-x(3)-C-x(5)-C-x(3,5)-[NHG]-x-[FYWM]-x(2)-Q-C. 

NAME: Chitin recognition or binding domain signature. 
CONSENSUS: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(4)-[FYW]-C. 

NAME: Barwin domain signature 1. 
CONSENSUS: C-G-[KR]-C-L-x-V-x-N. 

NAME: Barwin domain signature 2. 
CONSENSUS: V-[DN]-Y-[EQ]-F-V-[DN]-C. 

NAME: BIR repeat. 

CONSENSUS: [HKEPILVY]-x(2)-R-x(3,7)-[FYW]-x(ll,14)-[STAN]-G-[LMF]-X-[FYHDA]-X(4)- 
CONSENSUS: [DESL]-X(2,3)-C-X(2)-C-X(6)-[WA]-X(9)-H-X(4)-[PRSD)-X-C-X(2)-[LIVMAJ. 

NAME: WAP-type 'four-disulfide core' domain signature. 
CONSENSUS: C-x-{C}-[DN]-x(2)-C-x(5)-C-C. 

NAME: Phorbol esters / diacylglycerol binding domain. 

CONSENSUS: H-x-[LIVMFYW]-x(8,ll)-C-x(2)-C-x(3)-[LIVMFC]-x(5,10)-C-x(2)-C-x(4)-[HD]- 
CONSENSUS: x(2)-C-x<5,9)-C. 

NAME: C2 domain signature. 

CONSENSUS: [ACG]-x(2)-L-x(2,3)-D-x(l,2)-[NGSTLIF]-[GTMR]-x-[STAP]-D-[PA]-[FY]. 
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NAME: C2-domain profile. 
NAME: CAP-Gly domain signature. 

CONSENSUS: G-x(8,10)-[FYW]-x-G-[LIVM]-x-[LIVMFY]-x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G- 
CONSENSUS: x(2)-[LY]-F. 

NAME: Ly-6 / u-PAR domain signature. 

CONSENSUS: [EQR]-C-[LIVMFYAH]-x-C-x(5,8)-C-x(3,8)-[EDNQSTV]-C-{C}-x(5)-C- 
CONSENSUS: x(12,24)-C. 

NAME: MAM domain signature. 

CONSENSUS: G-x-[LIVMFY](2)-x(3)-[STA]-x(10,U)-[LV]-x(4)-[LIVMF]-x(6,7)-C-[LiVM]-x- 
CONSENSUS : F-x- [LIVMFY]-x(3)-[GSC] . 



NAME: 


MAM domain profile. 


NAME: 


PH domain profile. 


NAME: 


Phosphotyrosine interaction domain (PID) profile 


NAME: 


Src homology 2 (SH2) domain profile. 


NAME: 


Src homology 3 (SH3) domain profile. 


NAME: 


VWFC domain signature. 



CONSENSUS: C-x(2,3)-C-x-C-x(6,14)-C-x(3,4)-C-x(2,10)-C-x(9,16)-C-C-x(2,4)-C. 
NAME: WW/rsp5/WWP domain signature. 

CONSENSUS: W-x(9, 1 l)-[VFY]-[FYW]-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. 
NAME: WW/rsp5/WWP domain profile. 
NAME: ZP domain signature. 

CONSENSUS: [LlVMFYW]-x(7)-[STAPDNLJ-x(3)-[LIVMFYWJ-x-|LlVMFYW]-x-[LlVMFYW]-x(2)-C- 
CONSENSUS: [LIVMFYW]-x-[STJ-[PSL]-x(2,4)-[DENS]-x-[STADNQLF]-x(6)-[LIVM](2)-x(3,4)- 
CONSENSUS: C. 

NAME: S-layer homology domain signature. 

CONSENSUS: [LVFYT]-x-[DA]-x<2,5)-[DNGSATPHY]-[WYFPDA]-x(4)-[LIV]-x(2)-[GTALV]- 
CONSENSUS: x(4,6)-[LIVFYCl-x(2)-G-x-[PGSTA]-x(2,3)-rMFYA]-x-[PGAVl-x(3,10)-[LIVMA]- 
CONSENSUS: [STKR]-[RY]-x-[EQ]-x-[STALIVM]. 

NAME: 'Homeobox' domain signature. 

CONSENSUS: [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LrVM]-x(4)-[LlV]-[RKNQESTAlY|- 
CONSENSUS: [LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)-tRKNAIMW]. 

NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox' antennapedia-type protein signature. 
CONSENSUS: [LIVMFE]-[FY]-P-W-M-[KRQTA] . 

NAME: 'Homeobox' engrailed-type protein signature. 
CONSENSUS: L-M-A-Q-G-L-Y-N. 

NAME: 'Paired box' domain signature. 
CONSENSUS: R-P-C-x(ll)-C-V-S. 

NAME: 'POU' domain signature 1. 

CONSENSUS: [RKQ]-R-[LIM]-x-[LF]-G-[LrVMFY]-x-Q-x-[DNQ]-V-G. 
NAME: 'POU' domain signature 2. 

CONSENSUS: S-Q-[STJ-[TA]-I-[SC]-R-F-E-x-[LSQJ-x-[Ll]-[STJ. 
NAME: Zinc finger, C2H2 type, domain. 

CONSENSUS: C-x(2,4)-C-x(3)-[LlVMFYWC]-x(8)-H-x(3,5)-H. 

NAME: Zinc finger, C3HC4 type (RING finger), signature. 
CONSENSUS: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA]. 

NAME: Nuclear hormones receptors DNA-binding region signature. 
CONSENSUS: C-x(2)-C-x-[DE]-x(5)-[HNJLFY]-x(4)-C-x(2)-C-x(2)-F-F-x-R. 

NAME: GATA-type zinc finger domain. 

CONSENSUS: C-x-[DNl-C-x(4,5)-[STl-x(2)-W-[HR]-rRK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C. 
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NAME: Poly(ADP-ribose) polymerase zinc Finger domain signature. 
CONSENSUS: C-[KR]-x-C-x(3)-I-x-K-x(3)-[RGl-x(16,18)-W-[FYHl-H-x(2)-C. 

NAME: Poly(ADP : ribose) polymerase zinc Finger domain proFile. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain signature. 

CONSENSUS: [GASTPV]-C-x(2)-C-[RKHSTACW]-x(2)-[RKHOJ-x(2)-C-x(5,12)-C-x(2)-C-x(6,8)- 
CONSENSUS: C. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR C4-type zinc Finger. 
CONSENSUS: C-[DES]-x-C-x(3)-I-x(3)-R-x(4)-P-x(4)-C-x(2)-C. 

NAME: Copper-Fist domain signature. 

CONSENSUS: M-[LIVMF](3)-x(3)-K-[MY]-A-C-x(2)-C-I-[KRl-x-H-[KR]-x(3)-C-x-H-x(8)- 
CONSENSUS: [KR]-x-[KR]-G-R-P. 

NAME: Copper Fist DNA binding domain proFile. 

NAME: Leucine zipper pattern. 
CONSENSUS : L-x(6)-L-x(6)-L-x(6)-L. 

NAME: bZIP transcription factors basic domain signature. 

CONSENSUS: [KR]-x(l,3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK]. 

NAME: Myb DNA-binding domain repeat signature 1 . 
CONSENSUS: W-[ST]-x(2)-E-[DE]-x(2)-[LIV]. 

NAME: Myb DNA-binding domain repeat signature 2. 

CONSENSUS: W-x(2)-[LI]-[SAG]-x(4,5)-R-x(8)-[YWl-x(3)-[LiVM]. 

NAME: Myc-type, 'helix-loop-helix' dimerization domain signature. 

CONSENSUS: [DENSTAP]-K-[LIVMWAGSN]-{FYWCPHKR}-[LIVT]-[LIV]-x(2)-[STAV]-[LIVMSTAC]-x- 
CONSENSUS: [VMFYH]-[LIVMTA]-{P}-{P}-[LiVMSR]. 

NAME: p53 tumor antigen signature. 
CONSENSUS : M-C-N-S-S-C-M-G-G-M-N-R-R. 

NAME: CBF-A/NF-YB subunit signature. 

CONSENSUS: C-V-S-E-x-I-S-F-[LIVM]-T-[SG]-E-A-[SC]-[DE]-[KRQ]-C. 
NAME: CBF-B/NF-YA subunit signature. 

CONSENSUS: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E. 
NAME: 'Cold-shock' DNA-binding domain signature. 

CONSENSUS: [FY]-G-F-I-x(6,7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIVMFY]. 

NAME: CTF/NF-I signature. 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R. 

NAME: Ets-domain signature 1 . 

CONSENSUS: L-LFYW]-[QEDH]-F-[LI]-[LVQK]-x-[Ll]-L. 
NAME: Ets-domain signature 2. 

CONSENSUS: [RKH]-x(2)-M-x-Y-[DENQ]-x-[LIVM}-[STAG]-R-[STAG]-[LI]-R-x-Y. 

NAME: Ets-domain profile. 

NAME: Fork head domain signature 1 . 

CONSENSUS: [KR]-P-[PTQ]-[FYLVQH]-S-[FY]-x(2)-[LIVM]-x(3,4)-[AC]-[LIM]. 

NAME: Fork head domain signature 2. 
CONSENSUS: W-[QKR]-[NS]-S-[LIV]-R-H. 

NAME: Fork head domain profile. 

NAME: HSF-type DNA-binding domain signature. 

CONSENSUS: L-x(3)-[FY]-K-H-x-N-x-[STAN]-S-F-[LIVM]-R-Q-L-[NH]-x-Y-x-[FYW]-[RKH]-K- 
CONSENSUS: [LIVM]. 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: W-x-[DNH]-x(5)-[LIVF]-x-[IV]-P-W-x-H-x(9,10)-[DE]-x(2)-[LIVF]-F-[KRQ]-x- 
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CONSENSUS: [WR]-A. 
NAME: LIM domain signature. 

CONSENSUS: C-x(2)-C-x(15,21)-[FYWH]-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)-[LIVMF]. 

NAME: LIM domain profile. 

NAME: NF-kappa-B/Rel/dorsal domain signature. 
CONSENSUS: F-R-Y-x-C-E-G. 

NAME: MADS-box domain signature. 

CONSENSUS: R-x-[RK]-x(5)-I-x-[DN]-x(3)-[KR]-x(2)-T-[FY]-x-[RK](3)-x(2)-[LIVM]-x- 
CONSENSUS: K(2)-A-x-E-[LIVM]-[ST)-x-L-x(4)-[LIVM]-x-[LrVM](3)-x(6)-[LIVMF]-x(2)- 
CONSENSUS: [FY]. 

NAME: MADS-box domain profile. 

NAME: T-box domain signature 1. 

CONSENSUS: L-W-x(2)-[FC]-x(3,4)-[NT]-E-M-[LIV](2)-T-x(2)-G-[RG]-[KRQ]. 
NAME: T-box domain signature 2. 

CONSENSUS: [LlVMYW]-H-[PADH]-[DEN]-[GS]-x(3)-G-x(2)-W-M-x(3)-(IVA]-x-F. 
NAME: TEA domain signature. 

CONSENSUS: G-R-N-E-L-I-x(2)-Y-I-x(3)-[TC]-x(3)-R-T-[RK](2)-Q-[LIVM]-S-S-H-[LIVM]- 
CONSENSUS: Q-V. 

NAME: Transcription factor TFIIB repeat signature. 

CONSENSUS: G-[KR]-x(3)-[STAGN]-x-[LIVMYA]-[GSTA](2)-[CSAV]-[LIVM]-[LIVMFY]-[LIVMA]- 
CONSENSUS: [GSA]-[STAC]. 

NAME: Transcription factor TFHD repeat signature. 

CONSENSUS: Y-x-P-x(2)-[IF]-x(2)-[LIVM](2)-x-[KRH]-x(3)-P-[RKQ]-x(3)-L-[LlVM]-F-x- 
CONSENSUS: [STN]-G-[KR]-[LIVM]-x(3)-G-[TAGL] [KR] x(7)-[AGC]-x(7)-[LIVM]. 

NAME: TFIIS zinc ribbon domain signature. 

CONSENSUS: C-x(2)-C-x(9)-[LIVMQSAR]-[QH]-[STQL]-[RA]-[SACR]-x-[DE]-[DET]-[PGSEA]- 
CONSENSUS: x(6)-C-x(2,5)-C-x(3)-[FW]. 

NAME: TSC-22 / dip / bun family signature. 
CONSENSUS: M-D-L-V-K-x-H-L-x(2)-A-V-R-E-E-V-E. 

NAME: Prokaryotic transcription elongation factors signature 1. 

CONSENSUS: [ST]-x(2)-[GS]-x(3)-[LIJ-x(2)-E-L-x(2)-L-x(3,4)-R-x(2)-[IV]-x(3)-[LIV]- 
CONSENSUS: x(6)-G-D-x(2)-E-N-[GSA]-x-Y. 

NAME: Prokaryotic transcription elongation factors signature 2. 

CONSENSUS: S-x(2)-S-P-[LIVM]-[AG]-x-[SAG]-[LIVM]-[LIVMY]-x(4)-[DG]-[DE]. 

NAME: DEAD-box subfamily ATP-dependent helicases signature. 
CONSENSUS : [LiVMF](2)-D-E-A-D-[RKEN]-x-[LI VMFYGSTN] . 

NAME: DEAH-box subfamily ATP-dependent helicases signature. 
CONSENSUS: [GSAH]-x-[LIVMF](3)-D-E-[ALrV]-H-[NECR]. 

NAME: Eukaryotic putative RNA-binding region RNP-1 signature. 
CONSENSUS: [RK]-G-{EDRKHPCG(-[AGSCI]-[FY]-[LrVA]-x-[FYLM]. 

NAME: Fibrillarin signature. 

CONSENSUS: [GST]-[LIVMAP]-V-Y-A-[rV]-E-[FYJ-[SA]-x-R-x(2)-R-[DEJ. 
NAME: MCM family signature. 

CONSENSUS: G-[IVT]-[LVAC](2)-[rVT]-D-[DEl-[FL]-[DNST]. 
NAME: MCM family domain. 
NAME: XPA protein signature 1 . 

CONSENSUS: C-x-[DE]-C-x(3)-[LIVMF]-x(l,2)-D-x(2)-L-x(3)-F-x(4)-C-x(2)-C. 
NAME: XPA protein signature 2. 

CONSENSUS: [LrVM](2)-T-[KR]-T-E-x-K-x-[DE]-Y-[LrVMF](2)-x-D-x-[DE]. 
NAME: XPG protein signature 1 . 

CONSENSUS: [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K. 



1029 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



NAME: XPG protein signature 2. 

CONSENSUS: [GS]-[LIVM]-[PER]-[FYS]-[LIVM]-x-A-P-x-E-A-[DE]-[PAS]-[QS]-[CLM]. 
NAME: Bacterial regulatory proteins, araC family signature. 

CONSENSUS: [KRQ]-[LIVMA]-x(2)-[GSTALIV]-{FYWPGDN}-x(2)-[LIVMSA]-x(4,9)-[LIVMF]- 
CONSENSUS: x(2)-[LIVMSTA]-[GSTACIL]-x(3)-[GANQRF]-[LIVMFY]-x(4,5)-[LFY]-x(3)- 
CONSENSUS: [FYIVA]-{FYWHCM}-x(3)-[GSADENQKR]-x-[NSTAPKL]-[PARL]. 

NAME: Bacterial regulatory proteins, araC family DNA-binding domain profile. 

NAME: Bacterial regulatory proteins, arsR family signature. 
CONSENSUS: C-x(2)-D-[LIVM]-x(6)-[ST]-x(4)-S-[HYR]-[HQ] . 

NAME: Bacterial regulatory proteins, asnC family signature. 

CONSENSUS: [GSTAP]-x(2)-[DNEA]-[LIVM]-[GSA]-x(2)-[LIVMFY]-[GN|-[LrVMST|-[ST]-x(6)-R- 
CONSENSUS: [LVT]-x(2)-[LIVM] -x(3)-G. 

NAME: Bacterial regulatory proteins, crp family signature. 

CONSENSUS: [LIVM]-[STAG]-[RHNW]-x(2)-[LIM]-[GA]-x-[LIVMFYA]-[LIVSC]-[GA]-x-[STACN]- 
CONSENSUS: x(2)-[MST]-x-[GSTN]-R-x-[LIVMF)-x(2)-[LIVMF]. 

NAME: Bacterial regulatory proteins, deoR family signature. 

CONSENSUS: R-x(3)-[LIVM]-x(3)-[LIVM]-x(16, 17)-[STA]-x(2)-T-[LrVMA]-[RH]-[KRNA]-D- 
CONSENSUS: [LIVMF] . 

NAME: Bacterial regulatory proteins, gntR family signature. 

CONSENSUS: [LIVAPKR]-[PILV]-x-[EQTIVMR]-x(2)-[LIVM]-x(3)-[LIVMFYK]-x-[LIVFT]- 
CONSENSUS: [DNGSTK]-[RGTLVl-x-[STAIVP]-fLrVA]-x(2)-[STAGV]-rLIVMFYH]-x(2)-[LMA]. 

NAME: Bacterial regulatory proteins, icIR family signature. 

CONSENSUS: [GA]-x(3)-[DS]-x(2)-E-x(6)-[CSA]-[LIVM]-[GSA]-x(2)-[LIVM]-[FYH]-[DN]. 
NAME: Bacterial regulatory proteins, lad family signature. 

CONSENSUS: [LIVM]-x-[DE]-[LIVM]-A-x(2)-[STAGV]-x-V-[GSTP]-x(2)-[STAG]-[LIVMA]-x(2)- 
CONSENSUS: [LIVMFYAN]-[LIVMC]. 

NAME: Bacterial regulatory proteins, luxR family signature. 

CONSENSUS: [GDC]-x(2)-[NSTAVY]-x(2)-[IVJ-[GSTA]-x(2)-[LrVMFYWCT]-x-[LIVMFYWCR]-x(3)- 
CONSENSUS: [NST]-[LrVM]-x(5)-[NRHSA]-[LlVMSTA]-x(2)-[KR]. 

NAME: Bacterial regulatory proteins, lysR family signature. 

CONSENSUS: [NQKRHSTAG]-[LIVMFYTA]-x(2)-[STAGLV]-[STAG]-x(4)-[LIVMYCTQR]-[PSTANLVER]- 
CONSENSUS: x-[PSTAGQV]-[PSTAGNVMF]-[UVMFA]-[STAGH]-x(2)-[LiVMF]-x(2)-[LrVMFW]- 
CONSENSUS: [RKEAV]-xC2)-[LIVMFYNTAE]-x(3)-[LIMVT]. 

NAME: Bacterial regulatory proteins, marR family signature. 

CONSENSUS: [STNA]-[LIA]-x-[RNGS]-x(4)-[LM]-[EIV]-x(2)-[GES]-[LFYW]-[LIVC]-x(7)- 
CONSENSUS: [DN]-[RKQG]-[RK]-x(6)-T-x(2)-[GA], 

NAME: Bacterial regulatory proteins, merR family signature. 

CONSENSUS: [GSA]-x-[LlVMFA]-[ASM]-x(2>-[STACLIV]-[GSDENQR]-[LrVC]-[STANHK]-x(3)- 
CONSENSUS: [LlVM]-[RHF)-x-[YW]-[DEQ]-x(2,3)-[GHDNQJ-[LIVMF](2). 

NAME: Bacterial regulatory proteins, tetR family signature. 

CONSENSUS: G-LLIVMFYS]-x(2,3)-[TS]-[LrVMTJ-x(2)-[LIVM]-x(5)-[LrVQS]-[STAGENQH]-x- 
CONSENSUS: [GPAR]-x-[LIVMF]-[FYST]-x-[HFY]-[FV]-x-[DNST]-K-x(2)-[LIVM]. 

NAME: Transcriptional antiterminators bglG family signature. 
CONSENSUS: [ST]-x-H-x(2)-[FA](2)-[LIVM]-[EQK]-R-x(2)-[QNK]. 

NAME: Sigma-54 factors family signature 1. 

CONSENSUS: P-[LIVMJ-x-[LIVM]-x(2)-[LIVM]-A-x(2)-[LIVMF]-x(2)-[HS]-x-S-T-[LIVM]-S-R. 

NAME: Sigma-54 factors family signature 2. 
CONSENSUS: R-R-T-[IV]-[AT]-K-Y-R. 

NAME: Sigma-54 factors family profile. 

NAME: Sigma-70 factors family signature 1. 

CONSENSUS: [DE]-[LIVMF](2)-[HEQS]-x-G-x-[LIVMFA]-G-L-[LIVMFYE]-x-[GSAM]-[LIVMAP]. 
NAME: Sigma-70 factors family signature 2. 

CONSENSUS: [STN]-x(2)-[DEQ]-[LrVM]-[GAS]-x(4)-IUVMF]-[PSTG]-x(3)-[LrVMA]-x-[NQR]- 
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CONSENSUS: [LIVMA]-[EQH]-x(3)-[LIVMFW]-x(2)-[LlVM]. 
NAME: Sigma-70 factors ECF subfamily signature. 

CONSENSUS: [STAIV]-[PQDEL]-[DE]-[LIV]-[LIVTA]-Q-x-[STAV]-[LrVMFYC]-[LIVMAK]-x- 
CONSENSUS: [GSTAIV]-[LIMFYWQ]-x(12,14)-[STAP]-[FYW]-[LIF]-x(2)-[IV]. 

NAME: Sigma-54 interaction domain ATP-binding region A signature. 
CONSENSUS: [LIVMFY](3)-x-G-[DEQ]-[STE]-G-[STAV]-G-K-x(2)-[LIVMFY]. 

NAME: Sigma-54 interaction domain ATP-binding region B signature. 

CONSENSUS: [GS]-x-[LIVMF]-x(2)-A-[DNEQASH]-[GNEK]-G-[STIM]-[LIVMFY](3)-[DE]-[EK]- 
CONSENSUS: [LIVM], 

NAME: Sigma-54 interaction domain C-terminal part signature. 
CONSENSUS: [FYW]-P-[GS]-N-[LIVM]-R-[EQ]-L-x-[NHAT]. 

NAME: Sigma-54 interaction domain profile. 

NAME: Single-strand binding protein family signature 1. 

CONSENSUS: [LrVMF]-[NSTJ-[KRT]-[LIVM]-x-[LIVMFl(2)-G-fNHRK]-[LrVM]-|GST]-x-[DET]. 

NAME: Single-strand binding protein family signature 2. 

CONSENSUS: T-x-W-[HY]-[RNS]-[LIVM]-x-[LrVMF)-[FY]-[NGKR]. 

NAME: Bacterial histone-like DNA-binding proteins signature. 

CONSENSUS: [GSK]-F-x(2)-[LrVMF]-x(4)-[RKEQA]-x(2)-[RST]-x-[GA]-x-[KN]-P-x-T. 
NAME: Dps protein family signature 1. 

CONSENSUS: H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)-[DE]. 
NAME: Dps protein family signature 2. 

CONSENSUS: [L[VMFY]-[DH]-x-[LIVM]-[GA]-E-R-x(3)-[LIF)-[GDN]-x(2)-[PA]. 

NAME: DNA repair protein radC family signature. 
CONSENSUS: H-N-H-P-S-G. 

NAME: recA signature. 

CONSENSUS: A-L-[KR]-[IF]-[FY]-[STA]-[STAD]-[LIVMQ]-R. 
NAME: RecF protein signature 1. 

CONSENSUS: P-[ED]-x(3)-[LIVM](2)-x-G-tGSAD]-P-x(2)-R-R-x-[FY]-[LrVM]-D. 
NAME: RecF protein signature 2. 

CONSENSUS: [LIVMFY](2)-x-D-x(2,3)-[SA]-[EH]-L-D-x(2)-[KRH]-x(3)-L. 
NAME: RecR protein signature. 

CONSENSUS: C-x(2)-C-x(3)-[STJ-x(4)-C-x-I-C-x(4)-R. 

NAME: Histone H2A signature. 
CONSENSUS: [AC]-G-L-x-F-P-V. 

NAME: Histone H2B signature. 

CONSENSUS: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-[LIVM](2)-x-[PAG]-[DE]-L-x-[KR]-H-A- 
CONSENSUS: [LIVM]-[STA]-E-G. 

NAME: Histone H3 signature 1. 
CONSENSUS: K-A-P-R-K-Q-L. 

NAME: Histone H3 signature 2. 

CONSENSUS: P-F-x-[RA]-L-[VA]-[KRQJ-[DEG]-[lV] , 

NAME: Histone H4 signature. 
CONSENSUS: G-A-K-R-H. 

NAME: HMG1/2 signature. 

CONSENSUS: [FI]-S-[KR]-K-C-S-[EK]-R-W-K-T-M. 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook). 
CONSENSUS: [AT]-x(l,2)-[RK](2)-[GP]-R-G-R-P-[RK]-x. 

NAME: HMG14 and HMG17 signature. 
CONSENSUS: R-R-S-A-R-L-S-A-[RK]-P. 

NAME: Bromodomain signature. 
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CONSENSUS: [STANVF]-x(2)-F-x(4)-[DNS]-x(5,7)-[DENQTF)-Y-[HFY]-x(2)-tLIVMFY]-x(3)- 
CONSENSUS: [LIVM]-x(4)-[LIVM]-x(6,8)-Y-x(12,13)-[LIVM]-x(2)-N-[SACF]-x(2)-[FY]. 

NAME: Bromodomain profile. 

NAME: Chromo domain signature. 
CONSENSUS: fFYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)- 
CONSENSUS: [LIVMC]. 

NAME: Chromo and chromo shadow domain profile. 

NAME: Regulator of chromosome condensation (RCC1) signature 1. 
CONSENSUS: G-x-N-D-x(2)-[AV]-L-G-R-x-T. 

NAME: Regulator of chromosome condensation (RCC 1 ) signature 2. 

CONSENSUS: [LIVMFA]-[STAGC](2)-G-x(2)-H-[STAGLI]-[LIVMFA]-x-[LIVM]. 

NAME: Protamine PI signature. 

CONSENSUS: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S. 

NAME: Nuclear transition protein 1 signature. 
CONSENSUS: S-K-R-K-Y-R-K. 

NAME: Nuclear transition protein 2 signature 1 . 
CONSENSUS : H-x(3)-H-S-[NS]-S-x-P-Q-S . 

NAME: Nuclear transition protein 2 signature 2. 
CONSENSUS: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K. 

NAME: Ribosomal protein LI signature. 

CONSENSUS: [IM]-x(2)-[LrVA]-x(2,3)-[LIVM}-G-x(2)-[LMS]-[GSNH]-[PTKR]-[KRAV]-G-x- 
CONSENSUS: [LMF]-P-[DENSTK]. 

NAME: Ribosomal protein L2 signature. 

CONSENSUS: P-x(2)-R-G-[STArV](2)-x-N-[APK]-x-[DE] . 

NAME: Ribosomal protein L3 signature. 

CONSENSUS: [FL]-x(6)-[DN]-x(2)-[AGS]-x-[STJ-x-G-[KRH]-G-x(2)-G-x(3)-R. 
NAME: Ribosomal protein L5 signature. 

CONSENSUS: [LrVM]-x(2)-[LrVM]-[STAC]-[GE]-[QV]-x(2)-[LlVMA]-x-[STC]-x-[STAG]-[KR]- 
CONSENSUS: x-[STA]. 

NAME: Ribosomal protein L6 signature I . 
CONSENSUS: [PS]-[DENS]-x-Y-K-[GA]-K-G-[LIVM], 

NAME: Ribosomal protein L6 signature 2. 

CONSENSUS: Q-x(3)-[LIVM]-x(2)-[KR]-x(2)-R-x-F-x-D-G-[LIVM]-Y-[LIVM]-x(2)-[KR]. 
NAME: Ribosomal protein L9 signature. 

CONSENSUS: G-x(2)-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-[GA]-x(3)-[STN]. 
NAME: Ribosomal protein L10 signature. 

CONSENSUS: [DEH]-x(2)-[GS]-[LIVMF]-(STN|-[VA]-x-[DEQK]-[LlVMA]-x(2)-[LIM]-R. 
NAME: Ribosomal protein LI 1 signature. 

CONSENSUS: [RKN]-x-[LIVM]-x-G-[ST]-x(2)-[SNQ]-[LIVM]-G-x(2)-[LIVM]-x(0,l)-tDENG]. 
NAME: Ribosomal protein L13 signature. 

CONSENSUS: [LIVM]-[KRV]-[GK]-M-[LIV]-[PS]-x(4,5)-[GS]-[NQEKRA]-x(5)-[LIVM]-x-[AIV]- 
CONSENSUS: [LFY]-x-[GDNJ. 

NAME: Ribosomal protein L14 signature. 

CONSENSUS: [GA]-[LlV](3)-x(9,10)-[DNS]-G-x(4)-[FY]-x(2)-[NT]-x(2)-V-[LIV]. 
NAME: Ribosomal protein LI 5 signature. 

CONSENSUS: K-[LIVM](2)-[GAL]-x-[GT]-x-[LrVMA]-x(2,5)-[LIVM]-x-[LIVMF]-x(3,4)- 
CONSENSUS: [LIVMFC]-[STJ-x(2)-A-x(3)-[LIVM]-x(3)-G. 

NAME: Ribosomal protein L16 signature 1. 

CONSENSUS: [KR]-R-x-[GSACl-[KQVA]rLIVM]-W-rLIVM]-rKR]-rLrVM]-[LFY]-tAP]. 

NAME: Ribosomal protein L16 signature 2. 
CONSENSUS : R-M-G-x-[GR]-K-G-x(4)-[FWKR] . 
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NAME: Ribosomal protein L17 signature. 

CONSENSUS: I-x-[ST]-[GT]-x(2)-[KR]-x-K-x(6)-[DE]-x-[LIMV]-[UVMT]-T-x-[STAG]-[KR]. 
NAME: Ribosomal protein L19 signature. 

CONSENSUS: [RT]-[KRSVY]-[GSA]-x-V-[RS]-[KR]-[SA]-K-L-Y-Y-L-R. 
NAME: Ribosomal protein L20 signature. 

CONSENSUS: K-x(3)-[KRC]-x-[LIVM]-W-[IV]-[STNALV]-R-tLIVM]-N-x(3)-IRKH]. 
NAME: Ribosomal protein L21 signature. 

CONSENSUS: [IVT]-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HFl-R-[RQ]-x(2)-T. 
NAME: Ribosomal protein L22 signature. 

CONSENSUS: [RKQN]-x(4)-[RH]-[GAS]-x-G-[KRQS]-x(9)-[HDN]-[LIVM]-x-[LIVMS]-x-[LIVM]. 
NAME: Ribosomal protein L23 signature. 

CONSENSUS: [RK](2)-[AM]-[IVFYT]-[IV]-[RKT]-L-[STANQK]-x(7)-[LIVMFT]. 
NAME: Ribosomal protein L24 signature. 

CONSENSUS: [GDEN]-D-x-V-x-[iV]-[LIVMA]-x-G-x(2)-[KA]-[GN]-x(2,3)-[GA]-x-[IV]. 

NAME: Ribosomal protein L27 signature. 
CONSENSUS: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G. 

NAME: Ribosomal protein L29 signature. 

CONSENSUS: [KNQS]-[PSTL]-x(2)-[LlMFA]-[KRGSAN]-x-[LIVYSTA]-[KR]-[KRH]-[DESTANRL]- 
CONSENSUS: [LIV]-A-[KRCQVT]-[LIVMA]. 

NAME: Ribosomal protein L30 signature. 

CONSENSUS: [IVT]-[LIVM]-x(2)-[LF]-x-[LI]-x-[KRHQEG]-x(2)-[STNQH]-x-[IVT]- 
CONSENSUS: x(10)-[LMS]-[LIV]-x(2)-[LIVA]-x(2)-[LMFY]-(IVTl. 

NAME: Ribosomal protein L31 signature. 

CONSENSUS: H-P-F-[FY]-[TI]-x(9)-G-R-[AV]-x-[KR] . 

NAME: Ribosomal protein L33 signature. 

CONSENSUS: Y-x-[ST]-x-[KR]-[NSJ-x(4)-[PAT]-x(l,2)-[LlVM]-[EAl-x(2)-K-[FY]-[CSD]. 
NAME: Ribosomal protein L34 signature. 

CONSENSUS: K-LRGJ-T-lFYWL]-[EQS]-x(S)-[KRHS]-x(4,5)-G-F-x(2)-R. 
NAME: Ribosomal protein L35 signature. 

CONSENSUS: tLrVM]-K-tTV]x(2)-[GSA]-[SAIL]-x-K-R-[LIVMFY]-[KRL]. 
NAME: Ribosomal protein L36 signature. 

CONSENSUS: C-x(2)-C-x(2)-[LIVM]-x-R-x(3)-[LrVMNl-x-[LIVMl-x-C-x(3,4)-[KR]-H-x-Q-x-Q. 
NAME: Ribosomal protein Lie signature. 

CONSENSUS: N-x(3)-[KR]-x(2)-A-[LIVTJ-x-S-A-[LIV]-x-A-[ST!-[SGA]-x(7)-[RK]-G-H. 

NAME: Ribosomal protein L6e signature. 

CONSENSUS: N-x(2)-P-L-R-R-x(4)-[FY]-V-I-A-T-S-x-K. 

NAME: Ribosomal protein L7Ae signature. 

CONSENSUS: [CA]-x(4)-[IV]-P-[FY]-x(2)-[LIVM]-x-[GSQl-[KRQ]-x<2)-L-G. 

NAME: Ribosomal protein LlOe signature. 

CONSENSUS: R-x-A-[FYW]-G-K-[PA]-x-G-x(2)-A-R-V. 

NAME: Ribosomal protein L13e signature. 

CONSENSUS: [KR]-Y-x(2)-K-[LIVM]-R-[STA]-G-[KRJ-G-F-[ST]-L-x-E. 
NAME: Ribosomal protein L15e signature. 

CONSENSUS: [DEJ-[KR]-A-R-x-L-G-[FY]-x-[SAPJ-x(2)-G-[LIVMFYJ(4)-R-x-R-V-x-R-G. 
NAME: Ribosomal protein L18e signature. 

CONSENSUS: [KRE]-x-L-x(2)-[PS]-[KR]-x(2)-[RH]-[PSA]-x-[LIVM]-[NS]-[LrVM]-x-[RK]- 
CONSENSUS: [LIVM]. 

NAME: Ribosomal protein L19e signature. 

CONSENSUS: R-x-[KR]-x(5)-[KR]-x(3)-[KRH]-x(2)-G-x-G-x-R-x-G-x(3)-A-R-x(3)-[KQ]- 
CONSENSUS: x(2)-W-x(7)-R-x(2)-L-x(3)-R. 
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NAME: Ribosomal protein L21e signature. 

CONSENSUS: G-[DE]-x-V-x(10)-[GV]-x(2)-[FYH]-x(2)-[FY]-x-G-x-T-G. 
NAME: Ribosomal protein L24e signature. 

CONSENSUS: [FY]-x-[GS]-x(2)-[IV]-x-P-G-x-G-x(2)-[FYV]-x-[KRHE]-x-D. 

NAME: Ribosomal protein L27e signature. 
CONSENSUS: G-K-N-x-W-F-F-x-K-L-R-F> . 

NAME: Ribosomal protein L30e signature 1. 

CONSENSUS: [STA]-x(5)-G-x-[QKR]-x(2)-[LIVM]-[KQT]-x(2)-[KR]-x-G-x(2)-K-x-[LlVM](3). 
NAME: Ribosomal protein L30e signature 2. 

CONSENSUS: [DE]-L-G-[STA]-x(2)-G-[KR]-x(6)-[LIVM]-x-[LIVM]-x-[DEN]-x-G. 
NAME: Ribosomal protein L31e signature. 

CONSENSUS: V-[KR]-[LIVM]-x(3)-[LIVM]-N-x-[AKl-x-W-x-[KR]-G. 
NAME: Ribosomal protein L32e signature. 

CONSENSUS: F-x-R-x(4)-[KR]-x(2)-[KR]-[LIVM]-x(3)-W-R-[KR]-x(2)-G. 

NAME: Ribosomal protein L34e signature. 
CONSENSUS: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G. 

NAME: Ribosomal protein L35Ae signature. 

CONSENSUS: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P. 
NAME: Ribosomal protein L36e signature. 

CONSENSUS: P-Y-E-[KR]-R-x-[LIVM]-[DE]-[LIVM](2)-[KR]. 
NAME: ■ Ribosomal protein L37e signature. 

CONSENSUS: G-T-x-[SA]-x-G-x-[KR]-x(3)-[ST]-x(0,l)-H-x(2)-C-x-R-C-G. 
NAME: Ribosomal protein L39e signature. 

CONSENSUS: [KRA]-T-x(3)-[LIVM]-[KRQF]-x-[NHS]-x(3)-R-[NHY]-W-R-R. 

NAME: Ribosomal protein L44e signature. 
CONSENSUS: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C. 

NAME: Ribosomal protein S2 signature 1 . 

CONSENSUS: [LIVMFA]-x(2)-[LIVMFYC](2)-x-[STAC]-[GSTANQEKR]-[STALV]-[HY]-[LIVMF]-G. 
NAME: Ribosomal protein S2 signature 2. 

CONSENSUS: P-x(2)-[LlVMFl(2}-[LIVMS]-x-lGDNJ-x(3)-lDENL]-x(3)-[LlVMJ-x-E-x(4)- 
CONSENSUS: [GNQKRH]-[LIVM]-[AP]. 

NAME: Ribosomal protein S3 signature. 

CONSENSUS: [GSTA]-[KR]-x(6)-G-x-[UVMT]-x(2)-[NQSCH]-x(l ,3)-[LIVFCA]-x(3)-[LIV]- 

CONSENSUS: [DENQ]-x(7)-[LMTJ-x(2)-G-x(2)-G. 

NAME: Ribosomal protein S4 signature. 

CONSENSUS: [LrVM]-[DE]-x-R-L-x(3)-[LIVMC]-[VMFYHQ]-[KRT]-x(3)-[STAGCF]-x-[ST]-x(3)- 
CONSENSUS: [SAI]-[KR]-x-[LIVMF](2). 

NAME: Ribosomal protein S5 signature. 

CONSENSUS: G-[KRQ]-x(3)-[FY]-x-[ACV]-x(2)-[LIVMA]-[LIVM]-[AG]-[DN]-x(2)-G-x- 
CONSENSUS: [LIVM]-G-x-[SAG]-x(5,6)-[DEQ]-[LIVM]-x(2)-A-[LrVMF]. 

NAME: Ribosomal protein S6 signature. 

CONSENSUS: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-I-[KRNSA]. 
NAME: Ribosomal protein S7 signature. 

CONSENSUS: [DENSK]-x-[LIVMET]-x(3)-[LIVMFT](2)-x(6)-G-K-[KR]-x(5)-[LIVMF]-[LIVMFC]- 
CONSENSUS: x(2)-[STA]. 

NAME: Ribosomal protein S8 signature. 

CONSENSUS: [GE]-x(2)-[LIV](2)-[STY]-T-x(2)-G-[LIVM](2)-x(4)-[AG]-[KRHAYI]. 
NAME: Ribosomal protein S9 signature. 

CONSENSUS: G-G-G-x(2)-[GSA]-Q-x(2)-[SA]-x(3)-[GSA]-x-[GSTAV]-[KR]-[GSAL]-[LIF], 
NAME: Ribosomal protein S10 signature. 

CONSENSUS: [AV]-x(3)-[GDNSR]-[LIVMSTA]-x(3)-G-P-[LIVM]-x-[LiVM]-P-T. 
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NAME: Ribosomal protein SI 1 signature. 

CONSENSUS: [LIVMF]-x-[GSTAC]-[LIVMF]-x(2)-[GSTAL]-x(0,l)-[GSN]-[LIVMF]-x-[LrVM]- 
CONSENSUS: x(4)-[DEN]-x-T-P-x-[PA]-[STCH]-[DN] . 

NAME: Ribosomal protein S12 signature. 
CONSENSUS: [RK]-x-P-N-S-[AR]-x-R. 

NAME: Ribosomal protein SB signature. 

CONSENSUS: [KRQS]-G-x-R-H-x(2)-[GSNH]-x(2)-[LIVMC]-R-G-Q. 
NAME: Ribosomal protein S14 signature. 

CONSENSUS: [RP]-x(0, l)-C-x(l 1 , 12)-[LIVMF]-x-[LIVMF)-[SC)-[RG]-x(3)-[RN] . 

NAME: Ribosomal protein S15 signature. 

CONSENSUS: [LIVM]-x(2)-H-tLiVMFY]-x(5)-D-x(2)-[SAGN]-x(3)-[LF]-x(9)-[LrVM]-x(2)- 
CONSENSUS: [FY]. 

NAME: Ribosomal protein S16 signature. 

CONSENSUS: [LIVMT]-x-[LIVM]-[KR]-L-[STAK]-R-x-G-[AKR] . 
NAME: Ribosomal protein S17 signature. 

CONSENSUS: G-D-x-[LW]-x-[LIVA]-x-[QEK]-x-[RKJ-P-[LiV]-S. 
NAME: Ribosomal protein S18 signature. 

CONSENSUS: [lVJ-[DY]-Y-x(2) [LIVMT]-x(2)-[LIVM]-x(2)-[FYT]-[LIVM]-[ST]-[DERP]-x- 
CONSENSUS: [GY]-K-[LIVM]-x(3)-R-[LIVMAS]. 

NAME: Ribosomal protein S 19 signature. 

CONSENSUS: [STDNQ]-G-[KRQM]-x(6)-[LIVM]-x(4)-[LIVM]-[GSD]-x(2)-[LF]-[GAS]-[DE]-F- 
CONSENSUS: x(2)-[ST]. 

NAME: Ribosomal protein S21 signature. 

CONSENSUS: [DE]-x-A-[LY]-[KR]-R-F-K-[KR]-x(3)-[KR]. 

NAME: Ribosomal protein S3Ae signature. 

CONSENSUS: [LIV]-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L. 

NAME: Ribosomal protein S4e signature. 

CONSENSUS: H-x-K-R-[LIVM]-[SAN]-x-P-x(2)-W-x-[LIVM]-x-[KR]. 

NAME: Ribosomal protein S6e signature. 

CONSENSUS: [LIVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M. 

NAME: Ribosomal protein S7e signature. 

CONSENSUS: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H. 

NAME: Ribosomal protein S8e signature. 

CONSENSUS: R-x(2)-T-G-[GAJ-x(5)-[HR]-K-[KR]-x-K-x-E-[LM]-G. 

NAME: Ribosomal protein S12e signature. 

CONSENSUS: A-L-[KRQP]-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L. 

NAME: Ribosomal protein S17e signature. 

CONSENSUS: A-x-I-x-[ST]-K-x-L-R-N-[KR]-I-A-G-[FY]-x-T-H. 
NAME: Ribosomal protein S19e signature. 

CONSENSUS: P-x(6)-[SAN]-x(2)-[LIVMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ]. 

NAME: Ribosomal protein S21e signature. 
CONSENSUS: L-Y-V-P-R-K-C-S-[SA]. 

NAME: Ribosomal protein S24e signature. 

CONSENSUS: [FA]-G-x(2)-[KR]-[STA]-x-G-[FY]-[GA]-x-[LrVM]-Y-[DN]-[SN]. 

NAME: Ribosomal protein S26e signature. 
CONSENSUS: [YH]-C-V-S-C-A-I-H. 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: [QK]-C-x(2)-C-x(6)-F-[GS]-x-[PSA]-x(5)-C-x(2)-C-[GS]-x(2)-L-x(2)-P-x-G. 

NAME: Ribosomal protein S28e signature. 
CONSENSUS: E-[ST]-E-R-E-A-R-x-L. 

NAME: DNA mismatch repair proteins mutL / hexB / PMS1 signature. 



1035 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 

CONSENSUS: G-F-R-G-E-A-L. 

NAME: DNA mismatch repair proteins mutS family signature. 

CONSENSUS: [ST]-[LIVM]-x-[LIVM]-x-D-E-[LIVMY]-[GCI-[RKH]-G-[GST]-x(4)-G. 
NAME: mutT domain signature. 

CONSENSUS: G-x(5)-E-x(4)-[STAGC]-[LrVMAC]-x-R-E-[UVMFTJ-x-E-E. 
NAME: DnaA protein signanire. 

CONSENSUS: I-[GA]-x(2)-[LIVMF]-[SGDNK]-x(0,l)-[KR]-x-H-[STP]-[STV]-[LIVM](2)-x- 
CONSENSUS: [SA]-x(2)-[KRE]-[LIVM]. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 1. 
CONSENSUS: K-x-E-[LIV]-A-x-[DE]-[LIVMF)-G-[LIVMFJ. 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 2. 
CONSENSUS: [KR]-[SAQ]-x-G-x-V-G-G-x-[LIVM]-x-[KR](2)-[LIVM](2). 

NAME: Zinc-containing alcohol dehydrogenases signature. 
CONSENSUS: G-H-E-x(2)-G-x(5)-[GAl-x(2)-[IVSAC] . 

NAME: Quinone oxidoreductase / zeta-crystallin signature. 

CONSENSUS: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-[KR]. 
NAME: Iron-containing alcohol dehydrogenases signature 1 . 

CONSENSUS: [STALIV]-[LIVF]-x-[DEl-x(6,7)-P-x(4)-[ALIV]-x-[GST]-x(2)-D-[TAIVM]- 
CONSENSUS: [LIVMF]-x(4)-E. 

NAME: Iron-containing alcohol dehydrogenases signature 2. 

CONSENSUS: [GSW]-x-rLIVTSACD]-[GH]-x(2)-[GSAE]-tGSHYQ]-x-[LIVTP]-[GAST]-[GAS]-x(3)- 
CONSENSUS: [LIVMT]-x-[HNS]-[GA]-x-[GTAC]. 

NAME: Short-chain dehydrogenases/reductases family signature. 

CONSENSUS: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K-{PC)-[SAGFR]- 
CONSENSUS: [LIVMSTAGD]-x(2)-[LrVMFYW]-x(3)-[LIVMFYWGAPTHQ]-[GSACQRHM}. 

NAME: Aldo/keto reductase family signature 1. 

CONSENSUS: G-[FYJ-R-lHSAL]-[LIVMF]-D-[STAGC]-[AS]-x(5)-E-x(2)-[LrVM]-G. 
NAME: Aldo/keto reductase family signature 2. 

CONSENSUS: [UVMFY]-x(9)-[fiREQJ-x-[LIVM]-G-[LIVM]-[SC]-N-[FY]. 
NAME: Aldo/keto reductase family putative active site signature. 

CONSENSUS: [LrVM]-[PAIV]-[KR]-[ST]-x(4)-R-x(2)-[GSTAEQK]-[NSL]-x(2)-[LIVMFA]. 
NAME: Homoserine dehydrogenase signature. 

CONSENSUS: A-x(3)-G-[LIVMFY]-[STAG]-x(2,3)-[DNS]-P-x(2)-D-[LrVM]-x-G-x-D-x(3)-K. 
NAME: NAD-dependent glycerol-3-phosphate dehydrogenase signature. 

CONSENSUS: G-[ATJ-[LIVM]-K-[DN]-[LIVM](2)-A-x-[GA]-x-G-tLIVMF]-x-[DE]-G-[LIVM]-x- 
CONSENSUS: [LIVMFYWJ-G-x-N. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature 1 . 
CONSENSUS: [IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature 2. 
CONSENSUS: G-G-K-x(2)-[GSTE]-Y-R-x(2)-A. 

NAME: Mannitol dehydrogenases signature 

CONSENSUS: [LIVMY]-x-[FS]-x(2)-[STAGCV]-x-V-D-R-[rV]-x-[PS]. 
NAME: Histidinol dehydrogenase signature. 

CONSENSUS: I-D-x(2)-A-G-P-[STJ-E-[LrVS]-[LrVMA](3)-[AC]-x(3)-A-x(4)-|LIVM]-[AV]- 
CONSENSUS: [SACL]-[DE]-[LIVMFC]-[LIVM]-[SA]-x(2)-E-H. 

NAME: L-lactate dehydrogenase active site. 
CONSENSUS: [LIVMA]-G-[EQ]-H-G-[DN]-[ST] . 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases NAD-binding signature. 

CONSENSUS: [LIVMA]-[AG]-[iVT]-[LIVMFY]-[AG]-x-G-[NHKRQGSAC]-[LiV]-G-x(13,14)- 

CONSENSUS: [LrVfMTJ-x(2)-[FYwCTH]-[DNSTK]. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 2. 

CONSENSUS: [LIVMFYWA]-[LIVFWC]-x(2)-[SAC]-[DNQHR]-[IVFA]-[LrVF]-x-[LrVF],HNI]-x- 
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CONSENSUS: P-x(4)-[STN]-x(2)-[LIVMF]-x-[GSDN]. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 3. 

CONSENSUS: [LMFATC]-[KPQ]-x-[GSTDN]-x-[LIVMFYWRJ-[UVMFYW](2)-N-x-[STAGC]-R-LGP]-x- 
CONSENSUS: [LIVH]-[LIVMC]-[DNV]. 

NAME: 3-hydroxyisobutyrate dehydrogenase signature. 

CONSENSUS: [LIVMFY](2)-G-L-G-x-[MQ]-G-x-[PGS]-[MA]-[SA]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 1 . 
CONSENSUS: [RKH]-x(6)-D-x-M-G-x-N-x-[LIVMA]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 2. 
CONSENSUS: [LIVM]-G-x-[LIVM]-G-G-[AG]-T. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 3. 

CONSENSUS: A-[LlVM]-x-[STAN]-x(2)-[LI]-x-[KRNQ]-[GSA]-H-[LM]-x-[FYLH]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile. 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature. 

CONSENSUS: [DNE]-x(2)-[GA]-F-[LIVMFY]-x-[NT]-R-x(3)-[PA]-[LiVMFY](2)-x(5)- 
CONSENSUS : [LlVMFYCTHLIVMFY]-x(2)-[GV] . 

NAME: Malate dehydrogenase active site signature. 

CONSENSUS: [LIVM]-T-[TRKMN]-L-D-x(2)-R-[STA]-x(3)-[LIVMFY]. 
NAME: Malic enzymes signature. 

CONSENSUS: F-x-[DV]-D-x(2)-G-T-[GSA}-x-[IV]-x-[LrVMA]-[GAST](2)-[LTVMF](2). 
NAME: Isocitrate and isopropylmalate dehydrogenases signature. 

CONSENSUS: [NS]-[LIMYTJ-[FYDN]-G-[DNT]-[iMVY]-x-[STGDN]-[DN]-x(2)-[SGAP]-x(3,4)-G- 
CONSENSUS: [STG]-[LIVMPA]-G-[L1VMF]. 

NAME: 6-phosphogluconate dehydrogenase signature. 
CONSENSUS: [UVM]-x-D-x(2)-[GA]-(NQS]-K-G-T-G-x-W. 

NAME: Glucose-6-phosphate dehydrogenase active site. 
CONSENSUS: D-H-Y-L-G-K-[EQK] . 

NAME: IMP dehydrogenase / GMP reductase signature. 

CONSENSUS: [LIVM]-[RK]-[LIVM]-G-[LIVM]-G-x-G-S-[LIVM]-C-x-T. 

NAME: Bacterial quinoprotein dehydrogenases signature 1. 

CONSENSUS: [DEN]-W-x(3)-G-[RK]-x(6)-[FYW]-S-x(4)-[LIVM]-N-x(2)-N-V-x(2)-L-[RK], 

NAME: Bacterial quinoprotein dehydrogenases signature 2. 

CONSENSUS: W-x(4)-Y-D-x(3)-[DN]-[L[VMFYJ(4)-x(2)-G-x(2)-|STAJ-P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active site. 
CONSENSUS: S-N-H-G-[AG]-R-Q. 

NAME: GMC oxidoreductases signature 1. 

CONSENSUS: [GA]-[RKN]-x-[LIV]-G(2)-[GST](2)-x-[LIVM}-N-x(3)-[FYWA]-x(2)-[PAG]-x(5)- 
CONSENSUS: [DNESH]. 

NAME: GMC oxidoreductases signature 2. 

CONSENSUS: [GS]-tPSTA]-x(2)-[ST]-P-x-[LrVM](2)-x(2)-S-G-[LIVM]-G. 
NAME: Eukaryotic molybdopterin oxidoreductases signature. 

CONSENSUS: [GA]-x(3)-[KRNQHT]-x(ll,I4)-[LIVMFYWS]-x(8)-[LIVMF]-x-C-x(2)-rDEN]-R- 
CONSENSUS: x(2)-[DE]. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 1. 

CONSENSUS: [STAN]-x-tCH]-x(2,3)-C-[STAG]-[GSTVMF]-x-C-x-[Lrv , MFYW]-x-[LIVMA]-x(3,4)- 
CONSENSUS: [DENQKHTJ. 

NAME: Prokaryotic molybdopterin oxidoreductases signature 2. 

CONSENSUS: [STA]-x-[STAC](2)-x(2)-[STA]-D-[LlVMY](2)-L-P-x-lSTAC](2)-x(2)-E. 
NAME: Prokaryotic molybdopterin oxidoreductases signature 3. 

CONSENSUS: A-x(3)-[GDT]-I-x-[DNQTK]-x-[DEA]-x-[LiVM]-x-[UVMC]-x-[NS]-x(2)-[GS]- 
CONSENSUS: x(5)-A-x-[LIVM]-[ST]. 
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NAME: Aldehyde dehydrogenases glutamic acid active site. 

CONSENSUS: [LIVMFGA]-E-[LIMSTAC]-[GS]-G-[KNLM]-[SADN]-[TAPFV]. 
NAME: Aldehyde dehydrogenases cysteine active site. 

CONSENSUS: [FYLVAJ-x(3)-G-[QE]-x-C-[LIVMGSTANC]-[AGCN]-x-[GSTADNEKR]. 

NAME: Aspartate-semialdehyde dehydrogenase signature. 

CONSENSUS: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-fGSC]-H-[STAl. 

NAME: Glyceraldehyde 3-phosphate dehydrogenase active site. 
CONSENSUS: [ASV]-S-C-[NT]-T-x(2)-[LIM]. 

NAME: N-acetyl-gamma-glutamyl-phosphate reductase active site. 

CONSENSUS: [LIVM]-[GSA]-x-P-G-C-[FY]-[AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]-x-P. 
NAME: Gamma-glutamyl phosphate reductase signature. 

CONSENSUS: V-x(5)-A-[LIV]-x-H-I-x(2)-[HY]-[GS]-[STl-x-H-[ST]-IDE]-x-I. 

NAME: Dihydrodipicolinate reductase signature. 
CONSENSUS: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A. 

NAME: Dihydroorotate dehydrogenase signature 1. 

CONSENSUS: [GS]-x(4)-[GK]-[STA]-[IVSTA]-[GTl-x(3)-[NQR]-x-G-[NH]-x(2)-P-[RT]. 
NAME: Dihydroorotate dehydrogenase signature 2. 

CONSENSUS: [LIV](2)-[GSA]-x-G-G-[IV]-x-[STGN]-x{3)-[ACV]-x(6)-G-A. 
NAME: Coproporphyrinogen III oxidase signature. 

CONSENSUS: K-x-W-C-x(2)-[FYH](3)-[LIVM]-x-H-R-x-E-x-R-G-[LlVM]-G-G-[LIVM]-F-F-D. 

NAME: Fumarate reductase / succinate dehydrogenase FAD-binding site. 
CONSENSUS: R-[ST]-H-[ST]-x(2)-A-x-G-G. 

NAME: Acyl-CoA dehydrogenases signature 1. 

CONSENSUS: [GAC]-[LIVM]-[ST]-E-x(2)-[GSAN]-G-[ST]-D-x(2)-[GSA]. 
NAME: Acyl-CoA dehydrogenases signature 2. 

CONSENSUS: [QDE]-x(2)-G-[GS]-x-G-[LIVMFY]-x(2)-[DEN]-x(4)-[KR]-x(3)-[DEN]. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 1 . 
CONSENSUS: G-[LIVM]-P-x-E-x(3)-N-E-x(1.3)-R-V-A-x-[ST]-P-x-[GST]-V-x(2)-L-x-[KRH]- 
CONSENSUS: x-G. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 2. 
CONSENSUS: [LIVM](2)-G-[GAl-G-x-A-G-x(2)-fSAl-x(3)-[GA]-x-[SG]-[LIVM]-G-A-x-V- 
CONSENSUS: x{3)-D. 

NAME: Glu / Leu / Phe / Val dehydrogenases active site. 
CONSENSUS: [LrV]-x(2)-G-G-[SAG]-K-x-[GV]-x(3)-[DNST]-[PL]. 

NAME: D-amino acid oxidases signature. 

CONSENSUS: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A. 

NAME: Pyridoxamine 5'-phosphate oxidase signature. 
CONSENSUS: [UVF>E-F-W-[QHG]-x(4)-R-[LIVM]-H-[DNE]-R. 

NAME: Copper amine oxidase topaquinone signature. 

CONSENSUS: [LIVM]-[LIVMA]-[LIVMl-x(4)-T-x(2)-N-Y-[DE]-[YN] . 

NAME: Copper amine oxidase copper-binding site signature. 
CONSENSUS: T-x-G-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P. 

NAME: Lysyl oxidase putative copper-binding region signature. 
CONSENSUS : W-E-W-H-S-C-H-Q-H-Y-H . 

NAME: Delta l-pyrroline-5-carboxylate reductase signature. 

CONSENSUS: [PALF]-x(2,3)-[LIV]-x(3)-[LIVM]-[STAC]-[STV]-x-[GAN]-G-x-T-x(2)-[AG]- 
CONSENSUS: [LIV]-x(2)-[LMF]-[DENQK] . 

NAME: Dihydrofolate reductase signature. 

CONSENSUS: [LVAGC]-[LIF]-G-x(4)-[LIVMFl-P-W-x(4,5)-[DE]-x(3)-[FYrV]-x(3)-[S , nQ]. 
NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 1. 

CONSENSUS: [EQ]-x-[EQK]-[LrVM](2)-x(2)-[LIVM]-x(2)-[LIVMY]-N-x-pN]-x(5)-[LiVMF](3)- 
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CONSENSUS: Q-L-P-[LV]. 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 2. 
CONSENSUS : P-G-G- V-G-P-[MF]-T-[I V] . 

NAME: Oxygen oxidoreductases covalent FAD-binding site. 

CONSENSUS: P-x(10)-[DE]-[LIVM]-x(3)-[LIVM]-x(9)-[LrVM]-x(3)-[GSA]-[GST]-G-H. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-I active site. 
CONSENSUS: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM]-P. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-II active site. 
CONSENSUS: C-x(2)-C-D-[GA]-x(2,4)-[FY]-x(4)-[LIVM]-x-[UVM](2)-G(3)-[DN]. 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 1. 

CONSENSUS: G-[LIVMFYKRS]-[LIVMAGP]-Q-x-[LIVMFY)-x-D-[AGIM]-[LIVMFTA]-K-[LVMYST]- 
CONSENSUS : [LIVMFYG]-x-[KR]-[EQG] . 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signature 2. 

CONSENSUS: P-F-D-[LIVMFYQ]-[STAGPVM]-E-[GAC]-E-x-[EQ]-[LIVMS]-x(2)-G. 

NAME: Respiratory-chain NADH dehydrogenase 20 Kd subunit signature. 

CONSENSUS: [GN]-x-D-[KRST]-[LIVMF](2)-P-[IV]-D-[LIVMFYW](2)-x-P-x-C-P-[PT]. 

NAME: Respiratory-chain NADH dehydrogenase 24 Kd subunit signature. 
CONSENSUS: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2)-[GA]-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit signature. 

CONSENSUS: E-R-E-x(2)-[DE]-[LIVMF](2)-x(6)-[HK]-x(3)-[KRP]-x-[LIVM]-[LIVMS]. 

NAME: Respiratory chain NADH dehydrogenase 49 Kd subunit signature. 
CONSENSUS: [LPVMH]-H-[RT]-[GA]-x-E-K-[LIVMT]-x-E-x-[KRQJ. 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit signature 1. 
CONSENSUS: G-[AM]-G-[AR]-Y-[LIVM]-C-G-[DE](2)ISTA](2)-[LIM](2)-[EN]-S. 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit signature 2. 
CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 1. 
CONSENSUS: P-x(2)-C-[YWS]-x(7)-G-x-C-R-x-C. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 2. 
CONSENSUS: C-P-x-C-[DEl-x-[GS](2)-x-C-x-L-Q. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 3. 
CONSENSUS: R-C-[LIVM]-x-C-x-R-C-[LIVM]-x-[FY] . 

NAME: Nitrite and sulfite reductases iron-sulfur/siroheme-binding site. 
CONSENSUS: [STV]-G-C-x(3)-C-x(6)-[DE]-[LIVMF]-[GAT]-[LIVMF]. 

NAME: Unease signature. 

CONSENSUS: L-x-[LV]-L-K-[ST]-T-x-S-x-F-x(2)-[FY]-x(4)-[FYJ. 

NAME: Heme-copper oxidase catalytic subunit, copper B binding region signature. 
CONSENSUS: [YWG]-[LIVFYWTA](2)-[VGS]-H-[LNP]-x-V-x(44,47)-H-H. 

NAME: CO II and nitrous oxide reductase dinuclear copper centers signature. 
CONSENSUS: V-x-H-x(33,40)-C-x(3)-C-x(3)-H-x(2)-M. 

NAME: Cytochrome c oxidase subunit Vb, zinc binding region signature. 
CONSENSUS: [LIVM](2)-[FYW]-x(10)-C-x(2)-C-G-x(2)-[FY]-K-L. 

NAME: Multicopper oxidases signature 1 . 

CONSENSUS: G-x-lFYW]-x-[LIVMFYWJ-x-[CSTJ-x(8)-G-[LM]-x(3)-[LIVMFYWJ. 

NAME: Multicopper oxidases signature 2. 
CONSENSUS: H-C-H-x(3)-H-x(3)-[AG]-[LM] . 

NAME: Peroxidases proximal heme-ligand signature. 

CONSENSUS: [DET]-[LIVMTAl-x(2)-[LIVM]-[LIVMSTAG]-[SAG]-[LIVMSTAG]-H-[STA]-[LIVMFY]. 
NAME: Peroxidases active site signature. 

CONSENSUS: [SGATV]-x(3)-[LIVMA]-R-[LiVMA]-x-[FW]-H-x-[SAC]. 
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NAME: Catalase proximal heme-ligand signature. 

CONSENSUS: R-[LIVMFSTAN]-F-[GASTNP]-Y-x-D-[AST]-[QEH]. 
NAME: Catalase proximal active site signature. 

CONSENSUS: llF]-x-[RH]-x(4)-[EQ]-R-x(2)-H-x(2)-[GAS]-[GASTF]-[GAST]. 
NAME: Glutathione peroxidases selenocysteine active site. 

CONSENSUS: [GN]-[RKHNFYC]-x-[LIVMFC]-[LrVMF](2)-x-N-[VT]-x-[STC]-x-C-[GA]-x-T. 

NAME: Glutathione peroxidases signature 2. 
CONSENSUS : [LIV]-[AGDJ-F-P-[CS]-[NG]-Q-F. 

NAME: Lipoxygenases iron-binding region signature 1. 

CONSENSUS: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GSTJ-H-[LIVMSTAC](3)-E. 

NAME: Lipoxygenases iron-binding region signature 2. 

CONSENSUS: [LIVMA]-H-P-[LIVM]-x-[KRQ]-[LIVMF](2)-x-[APJ-H. 

NAME: Extradiol ring-cleavage dioxygenases signature. 

CONSENSUS: [GNTIV]-x-H-x(5,7)-[LIVMF]-Y-x(2)-[DENTA]-P-x-[GP]-x(2,3)-E. 
NAME: Intradiol ring-cleavage dioxygenases signature. 

CONSENSUS: [LIVM]-x-G-x-[LIVM]-x(4)-[GS]-x(2)-[LIVM]-x(4)-[LIVM]-[DE]-[LIVMFY]- 
CONSENSUS: x(6)-G-x-[FY]. 

NAME: Indoleamine 2,3-dioxygenase signature 1. 
CONSENSUS: G-G-S-[AN]-[GA]-Q-S-S-x(2)-Q. 

NAME: Indoleamine 2,3-dioxygenase signature 2. 

CONSENSUS: [FY]-L-[DQ]-[DE]-[LIVM]-x(2)-Y-M-x(3)-H-[KR]. 

NAME: Bacterial ring hydroxylating dioxygenases alpha-subunit signature. 
CONSENSUS: C-x-H-R-[GA]-x(8)-G-N-x(5)-C-x-[FY]-H. 

NAME: Bacterial luciferase subunits signature. 

CONSENSUS: [GA]-[LIVM]-P-[LIVM]-x-[LIVMFY]-x-W-x(6)-[RK]-x(6)-Y-x(3)-[AR]. 

NAME: ubiH/COQ6 monooxygenase family signature. 
CONSENSUS: H-P-[LIV]-[AG]-G-Q-G-x-N-x-G-x(2)-D. 

NAME: Biopterin-dependent aromatic amino acid hydroxylases signature. 
CONSENSUS: P-D-x(2)-H-(DE]-[LIJlLlVMF|-G-H-[LIVMCJ-P. 

NAME: Copper type H, ascorbate-dependent monooxygenases signature 1. 
CONSENSUS: H-H-M-x(2)-F-x-C. 

NAME: Copper type II, ascorbate-dependent monooxygenases signature 2. 
CONSENSUS: H-x-F-x(4)-H-T-H-x(2)-G. 

NAME: Tyrosinase CuA-binding region signature. 

CONSENSUS: H-x(4,5)-F-[LrVMFTP]-x-[FW]-H-R-x(2)-[LM]-x(3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region signature. 
CONSENSUS: D-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D. 

NAME: Fatty acid desaturases family 1 signature. 
CONSENSUS: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y. 

NAME: Fatty acid desaturases family 2 signature. 

CONSENSUS: [ST]-[SA]-x(3)-[QR]-[LI]-x(5,6)-D-Y-x(2)-[UVMFYW]-[LrVM]-[DEJ. 

NAME: Cytochrome P450 cysteine heme-iron ligand signature. 
CONSENSUS: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-[GAD]. 

NAME: Heme oxygenase signature. 
CONSENSUS: L-L-V-A-H-A-Y-T-R. 

NAME: Copper/Zinc superoxide dismutase signature 1. 

CONSENSUS: [GA]-[IFAT]-H-[LIVF]-H-x(2)-[GP]-[SDG]-x-[STAGD]. 

NAME: Copper/Zinc superoxide dismutase signature 2. 
CONSENSUS: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2)-[TV]. 



1040 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 

NAME: Manganese and iron superoxide dismutases signature. 
' CONSENSUS: D-x-W-E-H-[STA]-[FY](2). 

NAME: Ribonucleotide reductase large subunit signature. 

CONSENSUS: W-x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRA]-[NH]-x(3)-[STAQLIVM]-[ASC]-x(2)- 
CONSENSUS: [PA]. 

NAME: Ribonucleotide reductase small subunit signature. 

CONSENSUS: [IVMSEQ]-E-x( 1 ,2)-[LIVTA]-[HY]-[GSA]-x-[STAVM]-Y-x(2)-[LIVMQ]-x(3)- 
CONSENSUS: [LIFY]-[IVFYCSA]. 

NAME: Nitrogenases component 1 alpha and beta subunits signature 1. 
CONSENSUS: [LIVMFYH]-[LIVMFST]-H-[AG]-[AGSP]-[LIVMNQA]-[AG]-C. 

NAME: Nitrogenases component 1 alpha and beta subunits signature 2. 

CONSENSUS: [STANQ]-[ET]-C-x(5)-G-D-[DN]-[LIVMT]-x-[STAGR]-[LIVMFYST]. 

NAME: NifH/frxC family signature 1. 

CONSENSUS: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG]-G 

NAME: NifH/frxC family signature 2. 

CONSENSUS: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P. 

NAME: Nickel-dependent hydrogenases large subunit signature 1 . 
CONSENSUS: R-G-[LIVMF]-E-x(15)-[QESM]-R-x-C-G-[LIVM]-C. 

NAME: Nickel-dependent hydrogenases large subunit signature 2. 
CONSENSUS: [FY]-D-P-C-[LrM]-[ASG]-C-x(2,3)-H. 

NAME: Glutamyl-tRNA reductase signature. 

CONSENSUS: H-[LIVM]-x(2)-[L[VM]-[GSTAC](3)-[LIVM] [DEQ]-S-[LrVMA]-[LIVM](2)-[GF|-E- 
CONSENSUS: x-[QR]-[IV]-[LIT]-[STAG]-Q-[LIVM]-[KR]. 

NAME: Bacterial-type phytoene dehydrogenase signature. 

CONSENSUS: [NG]-x-[FYWV]-[LrVMF]-x-G-tAGC]-[GS]-[TA]-[HQT]-P-G-[STAV]-G-[LIVM]- 
CONSENSUS: x(5)-[GS]. 

NAME: Glycine radical signature. 

CONSENSUS: [ST[V]-x-R-[IVT]-[CSA]-G-Y-x-[GACV]. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 1 . 
CONSENSUS: G-x(2)-[LIVM]-Y-D-x-[FY]-x-G-x(2)-L-N-P-R. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 2. 
CONSENSUS: [LIVM](2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G. 

NAME: NNMT/PNMT/TEMT family of methyltransferases signature. 
CONSENSUS: L-I-D-I-G-S-G-P-T-[IV]-Y-Q-L-L-S-A-C. 

NAME: RNA methyltransferase trmA family signature 1. 

CONSENSUS: [DN]-P-[PA]-R-x-G-x(14,16)-[LrVM](2)-Y-x-S-C-N-x(2)-T. 

NAME: RNA methyltransferase trmA family signature 2. 
CONSENSUS: [LIVMF]-D-x-F-P-[QHY]-[STJ-x-H-[LlVMFYJ-E. 

NAME: Thymidylate synthase active site. 

CONSENSUS: R-x(2)-[LIVM]-x(3)-[FW]-[QNl-x(8,9)-[LV]-x-P-C-[HAVM]-x(3)-[QMTl-[FYW]- 
CONSENSUS: x-[LV]. 

NAME: Ribosomal RNA adenine dimethylases signature. 

CONSENSUS: [LlVM]-[LIVMFY]-[DE]-x-G-[STAPVJ-G-x-[GA]-x-[UVMF]-LSTJ-x(2)-[LIVM]- 
CONSENSUS: x(6)-[LIVMY]-x-[STAGV]-[L!VMFYHC]-E-x-D. 

NAME: Methylated-DNA-protein-cysteine methyltransferase active site. 
CONSENSUS: [LIVMF]-P-C-H-R-[LrVMF|(2). 

NAME: N-6 Adenine-specific DNA methylases signature. 
CONSENSUS: [LIVMACJ-[LIVFYWA]-x-[DN]-P-P-[FYW]. 

NAME: N-4 cytosine-specific DNA methylases signature. 
CONSENSUS: [LiVMFl-T-S-P-P-[FY]. 

NAME: C-5 cytosine-specific DNA methylases active site. 

CONSENSUS: [DENKS]-x-[FLIV]-x(2)-[GSTC]-x-P-C-x(2)-tFYWLIM]-S. 
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NAME: C-5 cytosine-specific DNA methylases C-terminal signature. 

CONSENSUS: [RKQGTF]-x(2)-G-N-[STAGl-[LIVMFl-x(3)-[LIVMT]-x(3)-[LIVM]-x(3)-[LIVM]. 

NAME: Protein-L-isoaspartate(D-aspartate) O-methyltransferase signature. 
CONSENSUS: [GSA]-D-G-x(2)-G-[FYWV]-x(3)-[ASJ-P-[FY]-[DN]-x-I. 

NAME: Uroporphyrin-III C-methyltransferase signature 1. 

CONSENSUS: [LIVM]-[GS]-[STAL]-G-P-G-x(3)-[LIVMFY]-[LIVM]-T-[LIVM]-[KRHQG]-[AG]. 
NAME: Uroporphyrin-III C-methyltransferase signature 2. 

CONSENSUS: V-x(2)-[LI]-x(2)-G-D-x(3)-[FYW]-[GS]-x(8)-[UVF|-x(5,6)-[LIVMFYWPAC]- 
CONSENSUS: x-[LIVMY]-x-P-G. 

NAME: ubiE/COQ5 methyltransferase family signature 1. 
CONSENSUS: Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W. 

NAME: ubiE/COQ5 methyltransferase family signature 2. 

CONSENSUS: R-V-[LIVM]-K-[PV]-G-G-x-[LIVMF]-x(2)-[LIVM]-E-x-S. 

NAME: Serine hydroxymethyltransferase pyridoxal-phosphate attachment site. 

CONSENSUS: [DEH]-[LIVMFY]-x-[STMV]-[GST)-[ST](2)-H-K-[ST|-[LF]-x-G-[PAC]-[RQ]- 

CONSENSUS: [GSA]-[GA]. 

NAME: Phosphoribosylglycinamide formyltransferase active site. 

CONSENSUS: G-x-[STM]-[IVT]-x-tFYWVQ]-[VMAT]-x-[DEVM]-x-[LIVMY]-D-x-G-x(2)-[LIVT]- 
CONSENSUS: x(6)-[LIVM]. 

NAME: Aspartate and ornithine carbamoyltransferases signature. 
CONSENSUS: F-x-[EK]-x-S-[GT]-R-T. 

NAME: Transketolase signature 1 . 

CONSENSUS: R-x(3)-[LIVMTA]-[DENQSTHKF]-x(5,6)-[GSN]-G-H-[PLIVMF]-[GSTA]-x(2)- 
CONSENSUS: [LIMC]-[GS]. 

NAME: Transketolase signature 2. 

CONSENSUS: G-[DEQGSA]-[DN]-G-[PAEQl-rST]-[HQ]-x-[PAGM]-[LrVMYAC]-fDEFYW]-x(2)- 
CONSENSUS: [STAP]-x(2)-[RGA]. 

NAME: Transaldolase signature 1. 

CONSENSUS: [DG]-[rVSAJ-T-[ST]-N-P-[STA]-[LIVMF](2). 
NAME: Transaldolase active site. 

CONSENSUS: [LIVM]-x-[LIVM]-K-[LIVMJ-lPASJ-x-lSTJ-x-[DENQPAS]-G-[LIVM]-x-[AGVJ-x- 
CONSENSUS: [QEKRST]-x-[LIVM]. 

NAME: Acyltransferases ChoActase / COT / CPT family signature 1 . 

CONSENSUS: [LI]-P-x-[LVP]-P-[rVTA]-P-x-[LIVM]-x-[DENQAS]-[ST]-[LIVM]-x(2)-[LY]. 
NAME: Acyltransferases ChoActase / COT / CPT family signature 2. 

CONSENSUS: R-[FYW]-x-[DAJ-[KA]-x(0,l)-[LrVMFY]-x-[LIVMFY](2)-x(3)-[DNS]-[GSA]-x(6)- 
CONSENSUS: [DE]-[HS]-x(3)-[DE]-[GA]. 

NAME: Thiolases acyl-enzyme intermediate signature. 

CONSENSUS: [LIVM]-[NST]-x(2)-C-[SAGLrj-[ST]-[SAG]-[LIVMFYNS]-x-[STAG]-[LIVM]-x(6)- 
CONSENSUS: [LIVM]. 

NAME: Thiolases signature 2. 

CONSENSUS: N-x(2)-G-G-x-[LiVM]-[SA]-x-G-H-P-x-G-x-[ST]-G. 
NAME: Thiolases active site. 

CONSENSUS: [AG]-[LIVMA]-[STAGLrVM]-[STAG]-[UVMA]-C-x-[AG]-x-[AG]-x-[AG]-x-[SAG]. 

NAME: Chloramphenicol acetyltransferase active site. 
CONSENSUS: Q-[LIV]-H-H-[SA]-x(2)-D-G-[Fy>H. 

NAME: Hexapeptide-repeat containing-transferases signature. 

CONSENSUS: [LrV]-[GAED]-x(2)-[STAV]-x-[LrV]-x(3)-[LlVAC]-x-LLIVJ-[GAED]-x(2)- 
CONSENSUS: [STAVR]-x-[LIV]-[GAED]-x(2)-[STAV]-x-[LIV]-x(3)-[LrV] . 

NAME: Beta-ketoacyl synthases active site. 

CONSENSUS: G-x(4)-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF]. 
NAME: Chalcone and stilbene synthases active site. 
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CONSENSUS: R-[LIVMFYS]-x-[LIVM]-x-[QHG]-x-G-C-[FYNA]-[GAj-G-[GA]-[STAV]-x-[LIVMF]- 
CONSENSUS: [RA]. 

NAME: Myristoyl-CoA:protein N-myristoyltransferase signature 1. 
CONSENSUS: E-I-N-F-L-C-x-H-K. 

NAME: Myristoyl-CoA:protein N-myristoyltransferase signature 2. 
CONSENSUS: K-F-G-x-G-D-G. 

NAME: Gamma-glutamyltranspeptidase signature. 

CONSENSUS: T-LSTAJ-H-x-[ST]-[LIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[LIVM]-[NE]- 
CONSENSUS: x(l,2)-[FY]-G. 

NAME: Transglutaminases active site. 

CONSENSUS: [GT]-Q-[CA]-W-V-x-[SA]-[GA]-[lVT]-x(2)-T-x-rLMSC]-R-[CSA]-[LV]-G. 

NAME: Phosphorylase pyridoxal-phosphate attachment site. 
CONSENSUS: E-A-[SC]-G-x-[GS]-x-M-K-x(2)-[LM]-N. 

NAME: UDP-glycosyltransferases signature. 

CONSENSUS: [FW]-x(2)-Q-x(2)-[LIVMYA]-[LIMV]-x(4,6)-[LVGAC]-[LVFYA]-[LIVMF]-[STAGCM]- 
CONSENSUS: [HNQ]-[STAGC]-G-x(2)-[STAG]-x(3)-[STAGL]-[LIVMFA]-x(4)-[PQR]-[LIVMT]- 
CONSENSUS: x(3)-[PA]-x(3)-[DES]-[QEHN]. 

NAME: Purine/pyrimidine phosphoribosyl transferases signature. 

CONSENSUS: [LIVMFYWCTA]-[LIVM]-[LrVMA)-ILrVMFC]-[DE]-D-[LrVMS]-[LrVM]-[STAVD]- 
CONSENSUS: [STAR]-[GAC]-x-[STAR], 

NAME: Glutamine amidotransferases class-I active site. 

CONSENSUS: [PASl-[LIVMFYT]-[LIVMFY]-G-[LTVMFY]-C-[LIVMFYN]-G-x-[QEH]-x-[LIVMFA]. 

NAME: Glutamine amidotransferases class-II active site. 
CONSENSUS: <x(0,U)-C-[GS]-[IV]-[LIVMFYW]-[AG]. 

NAME: Purine and other phosphorylases family 1 signature. 
CONSENSUS: [GST]-x-G-[LIVM]-G-x-[PA]-S-x-[GSTA}-I-x(3)-E-L. 

NAME: Purine and other phosphorylases family 2 signature. 

CONSENSUS: [LIV]-x(3)-G-x(2)-H-x-[LIVMFY]-x(4)-[LIVMF]-x(3)-[ATV]-x(l,2)-[LIVM]-x- 
CONSENSUS: [ATV]-x(4)-[GN]-x(3,4)-[LIVMF](2)-x(2)-[STNJ-[SAJ-x-G-[GS]-[LrVM]. 

NAME: Thymidine and pyrimidine-nucleoside phosphorylases signature. 
CONSENSUS: S-[GS]-R-[GA]-[LIV]-x(2)-[TA]-[GAJ-G-T-x-D-x-[LIV]-E. 

NAME: ATP phosphoribosyltransferase signature. 

CONSENSUS: E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[LrV]-x(2)-[STJ-G-x-T-[LM]. 

NAME: NAD:arginine ADP-ribosyltransferases signature. 
CONSENSUS: [FY]-x-rFYl-K-x(2)-H-[FY]-x-L-[ST]-x-A. 

NAME: Prolipoprotein diacylglyceryl transferase signature. 
CONSENSUS: G-R-x-[GA]-N-F-[LIVMF]-N-x-E-x(2)-G. 

NAME: S-adenosylmethionine synthetase signature 1 . 
CONSENSUS: G-A-G-D-Q-G-x(3)-G-Y. 

NAME: S-adenosylmethionine synthetase signature 2. 
CONSENSUS : G-[GA]-G-[ASC]-F-S-x-K-[DE] . 

NAME: Polyprenyi synthetases signature 1 . 

CONSENSUS: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH], 

NAME: Polyprenyi synthetases signature 2. 

CONSENSUS: [LIVMFY]-G-x(2)-[FYL]-Q-[LIVM]-x-D-D-[LIVMFY]-x-[DNG] . 
NAME: Squalene and phytoene synthases signature 1 . 

CONSENSUS: Y-[CSAM]-x(2)-[VSG]-A-[GSA]-[LIVAT]-[IV]-G-x(2)-[LMSC]-x(2)-[UV]. 
NAME: Squalene and phytoene synthases signature 2. 

CONSENSUS: [LrVM]-G-x(3)-Q-x(2,3)-N-[IF]-x-R-D-[LIVMFY]-x(2)-[DE]-x(4,7)-R-x-[FY]- 
CONSENSUS: x-P. 

NAME: Protein prenyltransferases alpha subunit repeat signature. 

CONSENSUS: [PSIAV]-x-[NDFV]-[NEQIY]-x-[LIVMAGP]-W-[NQSTHF]-[FYHQ]-[LIVMR]. 



1043 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



NAME: Riboflavin synthase alpha chain family signature. 

CONSENSUS: [LIVMFl-x(5)-G-[STADNQ]-[KREQIYW]-V-N-[LIVM]-E. 

NAME: Dihydropteroate synthase signature 1. 

CONSENSUS: [LIVM]-x-[AG]-[LIVMF](2)-N-x-T-x-D-S-F-x-D-x-[SG]. 
NAME: Dihydropteroate synthase signature 2. 

CONSENSUS: [GE]-[SA]-x-[LIVM](2)-D-[LIVM]-G-[GP]-x(2)-[STA]-x-P. 
NAME: EPSP synthase signature 1. 

CONSENSUS: [LIVM]-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMY]-x-[GSTA]. 
NAME: EPSP synthase signature 2. 

CONSENSUS: [KR]-x-[KH]-E-[CST]-[DNE]-R-[LIVM]-x-[STA]-[LlVMC]-x(2)-[EN)-[LlVMF]-x- 
CONSENSUS: [KRA]-[LIVMF]-G. 

NAME: FLAP/GST2/LTC4S family signature. 
CONSENSUS: G-x(3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C. 

NAME: Aminotransferases class-I pyridoxal-phosphate attachment site. 

CONSENSUS: [GS]-[LIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN]-[LIVMFA]-x-[GNAR]-x-R-[LIVMA]- 
CONSENSUS: [GA]. 

NAME: Aminotransferases class-II pyridoxal-phosphate attachment site. 

CONSENSUS: T-[LIVMFYW]-[STAG]-K-(SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG]. 

NAME: Aminotransferases class-Ill pyridoxal-phosphate attachment site. 

CONSENSUS: [LIVMFYWC](2)-x-D-E-(LIVMA]-x(2)-[GP]-x(0,l)-[LIVMFYWAG]-x(0,l)-lSACR]-x- 
CONSENSUS : [GSAD]-x( 12.1 6)-D-[LFVMFYWC]-x(2,3)-[GSA]-K-x(3)-[GSTADN]-[GSA J . 

NAME: Aminotransferases class-IV signature. 

CONSENSUS: E-x-[STAGCI]-x(2)-N-[LIVMFAC]-[FY]-x(6,12)-[LIVMF]-x-T-x(6,8)-[LIVM]-x- 
CONSENSUS: [GS]-[LIVMl-x-[KR]. 

NAME: Aminotransferases class-V pyridoxal-phosphate attachment site. 

CONSENSUS: [LIVFYCHT]-[DGH]-[LIVMFYAC]-[LIVMFYA]-x(2)-[GSTAC]-lGSTA]-rHQR]-K- 
CONSENSUS: x(4,6)-G-x-[GSAT]-x-[LIVMFYSAC]. 

NAME: Hexokinases signature. 

CONSENSUS: [LIVM]-G-F-[TN]-F-S-[FY]-P-x(5)-[LIVM]-[DNST]-x(3)-[LIVM]-x(2)-W-T-K-x- 
CONSENSUS: [LF]. 

NAME: Galactokinase signature. 

CONSENSUS: G-R-x-N-[LIV]-i-G-E-H-x-D-Y. 

NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: [LlVM]-lPK]-x-[GSTA]-x(0,l)-G-L-[GSl-S-S-[GSA]-[GSTAC]. 

NAME: Phosphofructokinase signature. 

CONSENSUS: [RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D-R. 

NAME: pfkB family of carbohydrate kinases signature 1 
CONSENSUS: [AG]-G-x(0. l)-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G. 

NAME: pfkB family of carbohydrate kinases signature 2 

CONSENSUS: [DNSK]-[PSTV]-x-[SAG|(2)-[GD]-D-x(3)-[SAGV]-[AG]-[LIVMFY]-[LlVMSTAP]. 
NAME: ROK family signature. 

CONSENSUS: [LIVM]-x(2)-G-[LIVMFCT]-G-x-[GA]-[LIVMFA]-x(8)-G-x(3,5)-[GATP]-x(2)- 
CONSENSUS: G-[RKH] . 

NAME: Phosphoribulokinase signature. 

CONSENSUS: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x-E. 

NAME: Thymidine kinase cellular-type signature. 

CONSENSUS: [GA]-x(l ,2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-fLIVMFYWH] . 
NAME: FGGY family of carbohydrate kinases signature 1 . 

CONSENSUS: pvIFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-[DENQTKR]-[ENQH]. 
NAME: FGGY family of carbohydrate kinases signature 2. 

CONSENSUS: [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-x(2)-[AS]-[STAIVM]- 
CONSENSUS: [JJVMFYMDEQj. 
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NAME: Protein kinases ATP-binding region signature. 

CONSENSUS: [LIV]-G-{P}-G-{P}-[FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]-{PD}-x-[GSTACLiVMFY]- 
CONSENSUS: x(5, 18)-[LIVMFYWCSTAR]-[AIVP]-tLIVMFAGCKR]-K. 

NAME: Serine/Threonine protein kinases active-site signature. 

CONSENSUS: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LWMFYCT](3). 

NAME: Tyrosine protein kinases specific active-site signature. 

CONSENSUS: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTACJ-x(2)-N-[LIVMFYC](3). 

NAME: Protein kinase domain profile. 

NAME: Casein kinase II regulatory subunit signature. 

CONSENSUS: C-P-x-[LIVMY]-x-C-x(5)-L-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x-C. 
NAME: Pyruvate kinase active site signature. 

CONSENSUS: [LIVAC]-x-[LIVM](2)-[SAPCV]-K-[LIV]-E-[NKRST]-x-[DEQH]-[GSTA]-[LrVM]. 
NAME: Shikimate kinase signature. 

CONSENSUS: [KR]-x(2)-E-x(3)-[LlVMF]-x(8,12)-[LIVMF](2)-[SA]-x-G(3)-x-[LIVMF]. 

NAME: Prokaryotic diacylglycerol kinase signature. 
CONSENSUS: E-x-[LIVM]-N-[ST]-[SA]-[LIV]-E-x(2)-V-D. 

NAME: Phosphatidylinositol 3- and 4-kinases signature 1. 

CONSENSUS: [LIVMFAC] K-x(l,3)-[DEA]-[DE]-[LIVMC]-R-Q-[DE]-x(4)-Q. 

NAME: Phosphatidylinositol 3- and 4-kinases signature 2. 

CONSENSUS: [GS]-x-[AV]-x(3)-[LrVM]-x(2)-[FYH]-[LIVM](2)-x-[LIVMF]-x-D-R-H-x(2)-N. 

NAME: Acetate and buryrate kinases family signature 1 . 
CONSENSUS: [LIVMl(2)-x-[LIVM]-N-x-G-S-[ST]-S-x-[KE]. 

NAME: Acetate and hutyrate kinases family signature 2. 

CONSENSUS: [LiVMA](2)-x(2)-H-x-G-x-G-x-[ST]-[LIVM)-x-[AV]-x(3)-G. 

NAME: Phosphoglycerate kinase signature. 

CONSENSUS: [KRHGTCV]-[VT]-[LIVMF]-[LrVMC]-R-x-D-x-N-[SACV]-P. 
NAME: Aspartokinase signature. 

CONSENSUS: [LIVM]-x-K-[FY]-G-G-[ST]-[SC]-[LIVM]. 
NAME: Glutamate 5-kinase signature. 

CONSENSUS: [GSTN]-x(2)-G-x-G-[GC]-[IM]-x-[STA]-K-[LIVM]-x-[SA]-[TCA]-x(2)-[GALV]- 
CONSENSUS: x(3)-G. 

NAME: ATP:guanido phosphotransferases active site. 
CONSENSUS: C-P-x(0, 1)-[STJ-N-[IL]-G-T. 

NAME: PTS HPR component histidine phosphorylation site signature. 
CONSENSUS: G-[LIVM]-H-[STA]-R-[PA]-tGSTA]-[STAM] . 

NAME: PTS HPR component serine phosphorylation site signature. 

CONSENSUS: [GSADE]-[KREQTV]-x(4)-[KRN]-S-[LIVMF](2)-x-[LIVM]-x(2)-[LIVM]-[GAD]. 

NAME: PTS EIIA domains phosphorylation site signature 1. 
CONSENSUS: G-x(2)-[LIVMF](3)-H-[LIVMF]-G-[Lrv , MF)-x-T-[ALV]. 

NAME: PTS EIIA domains phosphorylation site signature 2. 

CONSENSUS: [DENQ]-x(6)-[LIVMF]-[GA]-x(2)-[LIVM]-A-[LIVM]-P-H-[GAC]. 

NAME: PTS EIIB domains cysteine phosphorylation site signature. 

CONSENSUS: N-[LIVMFY]-x(5)-C-x-T-R-[LIVMF]-x-[LIVMF]-x-[LrVM]-x-[DQ] . 

NAME: Adenylate kinase signature. 

CONSENSUS: [LIVMFYW](3)-D-G-[FYI]-P-R-x(3)-[NQ]. 

NAME: Nucleoside diphosphate kinases active site. 
CONSENSUS: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE]. 

NAME: Guanylate kinase signature. 

CONSENSUS: T-[ST]-R-x(2)-[KR]-x(2)-[DE]-x(2)-G-x(2)-Y-x-[FY]-[LIVMK]. 
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NAME: Guanylate kinase domain profile. 
NAME: Phosphoribosyl pyrophosphate synthetase signature. 

CONSENSUS: D-[LI]-H-[SA]-x-Q-[IMSTJ-[QM]-G-[FY]-F-x(2)-P-[LIVMFC]-D. 

NAME: 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature. 
CONSENSUS: GLPE]-R-x(2)-D-L-D-[LIVM](2). 

NAME: Bacteriophage-type RNA polymerase family active site signature 1. 
CONSENSUS: P-[LIVM]-x(2)-D-tGA]-[STl-[AC]-[SN]lGA]-[LIVMFY]-Q. 

NAME: Bacteriophage-type RNA polymerase family active site signature 2. 
CONSENSUS: [LIVMF]-x-R-x(3)-K-x(2)-[LlVMFl-M-[PT]-x(2)-Y. 

NAME: Eukaryotic RNA polymerase II heptapeptide repeat. 
CONSENSUS: Y-[STJ-P-[ST]-S-P-[STANK]. 

NAME: RNA polymerases beta chain signature. 

CONSENSUS: G-x-K-[LrVMFA]-[STAC]-[GSTN]-x-[HSTA]-[GS]-[QNH]-K-G-[IVT]. 
NAME: RNA polymerases M / 15 Kd subunits signature. 

CONSENSUS: F-C-x-[DEKST]-C-[GNK]-[DNSA}-[LiVMH]-[LiVM]-x(8,14)-C-x(2)-C. 
NAME: RNA polymerases D / 30 to 40 Kd subunits signature. 

CONSENSUS: N-[SGA]-[LIVMF]-R-R-x(9)-[SA]-x(3)-V-x(4)-N-x-[STA]-x(3)-[DN]-E-x-[LI]- 
CONSENSUS: [GA]-x-R-[LI]-[GA]-[LIVM](2)-P. 

NAME: RNA polymerases H / 23 Kd subunits signature. 
CONSENSUS: H-[NEI]-[LIVM]-V-P-x-H-x(2)-[LIVM]-x(2)-[DE]. 

NAME: RNA polymerases K / 14 to 18 Kd subunits signature. 
CONSENSUS: [ST]-x-[FY]-E-x-[AT]-R-x-[LIVM]-[GSA]-x-R-[SA]-x-Q. 

NAME: RNA polymerases L / 13 to 16 Kd subunits signature. 

CONSENSUS: [DE](2)-H-[ST]-[LIVM]-[GAP]-N-x(l l)-V-x-[FM]-x(2)-Y-x(3)-H-P. 

NAME: RNA polymerases N / 8 Kd subunits signature. 
CONSENSUS: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G. 

NAME: DNA polymerase family A signature. 

CONSENSUS: R-x(2)-[GSAV]-K-x(3)-[LIVMFY]-[AGQ]-x(2)-Y-x(2)-[GS]-x(3)-[LIVMA]. 
NAME: DNA polymerase family B signature. 

CONSENSUS: [YA]-[GLIVMSTAC]-D-T-D-[SG]-[LIVMFTC]-x-[LIVMSTAC]. 
NAME: DNA polymerase family X signature. 

CONSENSUS: G-[SG]-[LFY]-x-R-[GE]-x(3)-[SGCL] x D-[LIVM] D-[LIVMFY](3)-x(2)-[SAP]. 

NAME: Galactose- 1 -phosphate uridyl transferase family 1 active site signature. 
CONSENSUS: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q. 

NAME: Galactose-l-phosphate uridyl transferase family 2 signature. 
CONSENSUS: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G. 

NAME: ADP-glucose pyrophosphorylase signature 1 . 

CONSENSUS: [AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]. 

NAME: ADP-glucose pyrophosphorylase signature 2. 
CONSENSUS: W-[FY]-x-G-[ST]-A-[DNSH]-[AS]-[LrVMFYW]. 

NAME: ADP-glucose pyrophosphorylase signature 3. 

CONSENSUS: [APV]-[GS]-M-G-[LIVMN]-Y-[rVC]-[LIVMFY]-x(2)-[DENPHK]. 
NAME: Phosphatidate cytidylyltransferase signature. 

CONSENSUS: S-x-[LIVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-[LI]-[PG]-x-H-G-G-[LIVM]-x-D-R- 
CONSENSUS: [LrVMFTJ-D. 

NAME: Ribonuclease PH signature. 

CONSENSUS: C-[DE]-[LIVM](2)-Q-[GTA]-D-G-[SG]-x(2)-[TA]-A. 

NAME: 2'-5'-oligoadenylate synthetases signature 1. 

CONSENSUS: G-G-S-x-[AG]-[KR]-x-T-x-L-[KR]-[GST]-x-S-D-[AG]. 

NAME: 2'-5'-oligoadenylate synthetases signature 2. 
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CONSENSUS: R-P-V-I-L-D-P-x-[DE]-P-T. 

NAME: CDP-alcohol phosphatidyltransferases signature. 
CONSENSUS: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D. 

NAME: PEP-utilizing enzymes phosphorylation site signature. 

CONSENSUS: G-[GA]-x-[TN]-x-H-[STA]-[STAV]-[LIVM](2)-[STAVl-[RG]. 

NAME: PEP-utilizing enzymes signature 2. 

CONSENSUS: [DEQS]-x-[LIVMF]-S-[LIVMF]-G-[STl-N-D-[LIVM]-x-Q-[LIVMFYGT]-[STALIV]- 
CONSENSUS: [LIVMF]-[GAS]-x(2)-R. 

NAME: Rhodanese signature 1. 

CONSENSUS: [FY]-x(3)-H-[LIV]-P-G-A-x(2)-[LIVF]. 
NAME: Rhodanese C-terminal signature. 

CONSENSUS: [AV]-x(2)-[FY]-[DEAP]-G-[GSA]-[WF]-x-E-tFVW]. 
NAME: CoA transferases signature 1. 

CONSENSUS: [DN]-[GN]-x(2)-[LIVMFA](3)-G-G-F-x(3)-G-x-P. 

NAME: CoA transferases signature 2. 

CONSENSUS: [LF]-[HQ]-S-E-N-G-[LIVF](2)-[GA] 

NAME: Phospholipase A2 histidine active site. 
CONSENSUS: C-C-x(2)-H-x(2)-C. 

NAME: Phospholipase A2 aspartic acid active site. 
CONSENSUS: [LIVMA]-C-{LIVMFYWPCST}-C-D-x(5)-C. 

NAME: Lipases, serine active site. 

CONSENSUS: [LIV]-x-[LIVFY]-[LrVMSTJ-G-[HYWV]-S-x-G-[GSTAC] . 

NAME: Colipase signature. 
CONSENSUS: Y-x(2)-Y-Y-x-C-x-C. 

NAME: Lipolytic enzymes "G-D-S-L" family, serine active site. 
CONSENSUS: [LIVMFYAG](4)-G-D-S-[LrVM]-x(l ,2)-[TAG]-G. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative histidine active site. 
CONSENSUS: [LIVMF](2)-x-[LiVMF]-H-G-G-[SAG]-[FY]-x(3)-[STDN]-x(2)-[STl-H. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative serine active site. 
CONSENSUS: [LIVM]-x-[LrVMF]-[SA]-G-D-S-[CA]-G-[GA]-x-L-[CA]. 

NAME: Carboxylesterases type-B serine active site. 

CONSENSUS: F-(GR]-G-x(4)-[LIVM]-x-[LIV]-x-G-x-S-[STAG]-G. 

NAME: Carboxylesterases type-B signature 2. 

CONSENSUS: [ED]-D-C-L-[YT]-[LrV]-[DNS]-[LIV]-[LIVFYW]-x-[PQR] . 
NAME: Pectinesterase signature 1. 

CONSENSUS: [GSTN]-x(5)-[LIVM]-x-[LIVM]-x(2)-G-x-Y-[DNK]-E-x-[LlVM]-x-[LIVM]. 

NAME: Pectinesterase signature 2. 
CONSENSUS: G-[STAD]-[LIVMT]-D-F-1-F-G. 

NAME: Peptidyl-tRNA hydrolase signature 1. 

CONSENSUS: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2)-[DE]. 

NAME: Peptidyl-tRNA hydrolase signature 2. 

CONSENSUS: [GS]-x(3)-H-N-G-[LIVM]-[KRl-[DNS}-[LlVMT]. 

NAME: Alkaline phosphatase active site. 

CONSENSUS: [rV]-x-D-S-[GASJ-[GASC]-[GAST]-[GA]-T. 

NAME: Histidine acid phosphatases phosphohistidine signature. 

CONSENSUS: [LIVM]-x(2)-[LIVMA]-x(2)-[LIVM]-x-R-H-[GN)-x-R-x-[PAS]. 

NAME: Histidine acid phosphatases active site signature. 

CONSENSUS: [LrVMF]-x-[LIVMFAG]-x(2)-[STAGI]-H-D-[STANQ]-x-[LrVM]-x(2)-[LIVMFY]-x(2)- 
CONSENSUS: [STA]. 

NAME: Class A bacterial acid phosphatases signature. 
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CONSENSUS: G-S-Y-P-S-G-H-T. 
NAME: 5 '-nucleotidase signature 1. 

CONSENSUS: [LIVM]-x-[LIVM](2)-[HEA]-[TI]-x-D-x-H-[GSA]-x-tLIVMF]. 

NAME: 5'-nucleotidase signature 2. 

CONSENSUS: [FYP]-x(4)-[LIVMJ-G-N-H-E-F-[DNJ. 

NAME: Fructose-l-6-bisphosphatase active site. 

CONSENSUS: [AG]-[RK]-L-x(l,2)-[LrV]-[FY]-E-x(2)-P-[LIVM]-[GSA]. 

NAME: Serine/threonine specific protein phosphatases signature. 
CONSENSUS: [LIVM]-R-G-N-H-E. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 1 . 
CONSENSUS: E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 2. 
CONSENSUS: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D. 

NAME: Protein phosphatase 2C signature. 

CONSENSUS: [LIVMFY]-[LIVMFYA]-[GSAC]-[LIVM]-[FYC]-D-G-H-[GAV]. 

NAME: Tyrosine specific protein phosphatases active site. 

CONSENSUS: [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP)-x-[LIVMFY]. 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile. 

NAME: Inositol monophosphatase family signature 1. 

CONSENSUS: [FWV]-x(0.1)-[LIVM]-D-P-[LIVMl-D-[SG]-[STl-x(2)-[FY]-x-[HKRNSTY]. 
NAME: Inositol monophosphatase family signature 2. 

CONSENSUS: [WVl-D-x-[ACl-[GSA]-[GSAPVl-x-fLIVACP]-[LIV]-[LIVACl-x(3)-[GHl-[GA]. 

NAME: Prokaryotic zinc-dependent phospholipase C signature. 
CONSENSUS: H-Y-x-[GT]-D-[LIVM]-[DNS]-x-P-x-H-[PA]-x-N. 

NAME: Phosphatidylinositol-specific phospholipase X-box domain profile. 

NAME: Phosphatidylinositol-specific phospholipase Y-box domain profile. 

NAME: 3'5'-cyclic nucleotide phosphodiesterases signature. 
CONSENSUS: H-D-[LrVMFYl-x-H-x-[AGl-x(2)-[NQ]-x-[LIVMFY|. 

NAME: cAMP phosphodiesterases class-II signature. 

CONSENSUS: H-x-H-L-D-H-[LIVM]-x-[GS]-[LIVMA]-[LIVM](2)-x-S-[AP]. 
NAME: Sulfatases signature 1 . 

CONSENSUS: [SAP]-[LIVMST]-[CS]-[STAC]-P-[STA]-R-x(2)-[LlVMFW](2)-[TR]-G. 
NAME: Sulfatases signature 2. 

CONSENSUS: G-[YV]-x-[STJ-x(2)-[IVA]-G-K-x(0.1)-[FYWK]-[HL], 

NAME: AP endonucleases family 1 signature 1. 
CONSENSUS: [APF]-D-[LIVMF](2)-x-[LlVMl-Q-E-x-K. 

NAME: AP endonucleases family 1 signature 2. 

CONSENSUS: D-[ST]-[FY]-R-[KH]-x(7,g)-[FYW]-[ST]-[FYW](2). 

NAME: AP endonucleases family 1 signature 3. 

CONSENSUS: N-x-G-x-R-[LIVM]-D-[LIVMFYH]-x-[LV]-x-S. 

NAME: AP endonucleases family 2 signature 1 . 

CONSENSUS: H-x(2)-Y-[LIVMF]-[IM]-N-[LIVMCA]-[AG]. 

NAME: AP endonucleases family 2 signature 2. 
CONSENSUS: [GR]-[LIVMF]-C-[LiVM]-D-T-C-H. 

NAME: AP endonucleases family 2 signature 3. 

CONSENSUS: [LIVMW]-H-x-N-[DE]-[SA]-K-x(3)-G-[SA]-x(2)-D. 
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NAME: Deoxyribonuclease I signature 1. 

CONSENSUS: [LIVM](2)-[AP]-L-H-[STA](2)-P-x(5)-E-[LIVMl-[DN]-x-L-x-[DE]-V. 

NAME: Deoxyribonuclease I signature 2. 
CONSENSUS: G-D-F-N-A-x-C-[SA]. 

NAME: Endonuclease III iron-sulfur binding region signature. 
CONSENSUS: C-x(3)-[KRS]-P-[KRAGL]-C-x(2)-C-x(5)-C. 

NAME: Endonuclease III family signature. 

CONSENSUS: [GSTl-x-[LrVMF]-P-x(5)-[LIVMW]-x(2,3)-[LI]-[PAS]-G-V-[GA]-x(3)-[GAC]- 
CONSENSUS: x(3)-[LIVM]-x(2)-[SALV]-[LIVMFYW]-[GANK]. 

NAME: Ribonuclease II family signature. 

CONSENSUS: [HI]-[FYE]-[GSTAM]-[LIVM]-x(4,5)-Y-[STAL]-x-[FWVAC]-[TV]-[SA]-P-[LiVMA]- 
CONSENSUS: [RQ]-[KR}-[FY]-x-D-x(3)-[HQ]. 

NAME: Ribonuclease III family signature. 

CONSENSUS: [DEQ]-[RQ]-[LM]-E-[FYW]-[LV]-G-D-[SAR]. 
NAME: Bacterial Ribonuclease P protein component signature. 

CONSENSUS: [LIVMFYS]-x(2)-A-x(2)-R-[NH]-[KRQL]-[LIVM]-[KRA]-R-x-[LIVMTA]-[KR]. 

NAME: Ribonuclease T2 family histidine active site 1. 
CONSENSUS: [FYWL]-x-[LtVM]-H-G-L-W-P 

NAME: Ribonuclease T2 family histidine active site 2. 

CONSENSUS: [LlVMF]-x(2)-[HDGTY]-[EQ]-[FYW]-x-[KR]-H-G-x-C. 

NAME: Pancreatic ribonuclease family signature. 
CONSENSUS: C-K-x(2)-N-T-F. 

NAME: DNA/RNA non-specific endonucleases active site. 
CONSENSUS: D-R-G-H-[QIL]-x(3)-A. 

NAME: Thermonuclease family signature 1 . 

CONSENSUS: D-G-D-T-[LIVM]-x-[LIVMC]-x(9,10)-R-[LIVM]-x(2)-[LIVM]-D-x-P-E. 

NAME: Thermonuclease family signature 2. 

CONSENSUS: D-[KR]-Y-[GQ]-R-x-[LV]-[GA]-x-[IV]-[FYW], 

NAME: Beta-amylase active site 1 . 
CONSENSUS: H-x-C-G G N-V-G-D. 

NAME: Beta-amylase active site 2. 

CONSENSUS: G-x-[SA]-G-E-[LrVM]-R-Y-P-S-Y. 

NAME: Glucoamylase active site region signature. 
CONSENSUS: [STN]-[GP]-x(l ,2)-[DE]-x-W-E-E-x(2)-[GS]. 

NAME: Polygalacturonase active site. 

CONSENSUS: [GSDENKRH]-x(2)-[VMFC]-x(2)-[GS]-H-G-[LIVMAG]-x(l,2)-[LIVM]-G-S. 
NAME: Clostridium cellulosome enzymes repeated domain signature. 

CONSENSUS: D-[LIVMFY]-[DNV]-x-[DNS]-x(2)-[LIVM]-[DN]-[SALM]-x-D-x(3)-[LIVMF]-x- 
CONSENSUS: [RKS]-x-[LIVMF]. 

NAME: Chitinases family 18 active site. 

CONSENSUS: [LIVMFY]-[DN]-G-[LlVMF]-[DN]-[LIVMF]-[DN]-x-E. 
NAME: Chitinases family 19 signature 1 . 

CONSENSUS: C-x(4,5)-F-Y-[ST]-x(3)-[FY]-[LrVMF]-x-A-x(3)-rYF]-x(2)-F-[GSA]. 
NAME: Chitinases family 19 signature 2. 

CONSENSUS: [LrVM]-[GSA]-F-x-[STAG](2)-rLIVMFY]-W-[FY]-W-[LrVM]. 

NAME: Alpha-lactalbumin / lysozyme C signature. 
CONSENSUS: C-x(3)-C-x(2)-[LMF]-x(3)-[DEN]-[LI]-x(5)-C. 

NAME: Alpha-galactosidase signature. 

CONSENSUS: G-[LIVMFY]-x(2)-[LIVMFY]-x-[LrVM]-D-D-x-W-x(3,4)-R-[DNSF] . 
NAME: Trehalase signature 1. 
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CONSENSUS: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y. 

NAME: Trehalase signature 2. 

CONSENSUS: Q-W-D-x-P-x-[GA]-W-[PA]-P. 

NAME: Alpha-L-fucosidase putative active site. 
CONSENSUS: P-x(2)-L-x(3)-K-W-E-x-C. 

NAME: Glycosyl hydrolases family 1 active site. 

CONSENSUS: [LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGNJ. 
NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA]. 
NAME: Glycosyl hydrolases family 2 signature 1. 

CONSENSUS: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYW](2)-x(3)-[DN]-x(2)- 
CONSENSUS: G-[LIVMFYW](4). 

NAME: Glycosyl hydrolases family 2 acid/base catalyst. 

CONSENSUS: [DENQF]-[KRVW]-N-H-[AP]-[SAC]-[LIVMF](3)-W-[GS]-x(2,3)-N-E. 
NAME: Glycosyl hydrolases family 3 active site. 

CONSENSUS: [LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LIVT]-[LIVMFl-[ST]-D-x(2)- 
CONSENSUS: [SGADNI] . 

NAME: Glycosyl hydrolases family 5 signature. 

CONSENSUS: [LlV]-[LIVMFYWGA](2)-[DNEQGl-fLIVMGST)-x-N-E-[PV]-[RHDNSTLIVFY]. 

NAME: Glycosyl hydrolases family 6 signature 1. 

CONSENSUS: V-x-Y-x(2)-P-x-R-D-C-[GSAF]-x(2)-[GSA](2)-x-G. 

NAME: Glycosyl hydrolases family 6 signature 2. 

CONSENSUS: tLIVMYA]-[LIVA]-[LIVT]-[L[V]-E-P-D-[S AL]-[LI]-[PSAG] . 
NAME: Glycosyl hydrolases family 8 signature. 

CONSENSUS: A-[STl-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LrVM]-[LIVMG]-x-A-x(3)-[FW]. 
NAME: Glycosyl hydrolases family 9 active sites signature 1. 

CONSENSUS: [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PLIVM]-H-x-R. 

NAME: Glycosyl hydrolases family 9 active sites signature 2. 
CONSENSUS: [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA]. 

NAME: Glycosyl hydrolases family 10 active site. 

CONSENSUS: [GTA]-x(2)-[LiVN]-x-[rVMF]-[ST]-E-[LIY]-[DN]-[LIVMF]. 

NAME: Glycosyl hydrolases family 11 active site signature 1. 
CONSENSUS: [PSA]-[LQ]-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHNl . 

NAME: Glycosyl hydrolases family 11 active site signature 2. 

CONSENSUS: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGS]-[SG]-[STAN]-G-x-[SAF]. 
NAME: Glycosyl hydrolases family 16 active sites. 

CONSENSUS: E-[LlVJ-D-[LlV]-x(0,l)-E-x(2)-[GQ]-[KRNF]-x-[PSTA]. 
NAME: Glycosyl hydrolases family 17 signature. 

CONSENSUS: [LIVM]-x-[LrVMFYWA](3)-tSTAG]-E-[STA]-G-W-P-[STN]-x-[SAGQ]. 
NAME: Glycosyl hydrolases family 25 active sites signature. 

CONSENSUS: D-[LIVM]-x(3)-[NQ]-[PG]-x(9,10)-G-x(4)-[LIVMFY](2)-K-x-[Sn-E-[GS]-x(2)- 
CONSENSUS: Y-x-[DN]. 

NAME: Glycosyl hydrolases family 31 active site. 
CONSENSUS: [GF]-[LrVMF]-W-x-D-M-[NSA]-E. 

NAME: Glycosyl hydrolases family 31 signature 2. 

CONSENSUS: G-[AV]-D-[LIVMT]-C-G-[FY|-x(3)-[STl-x(3)-L-C-x-R-W-x(2)-[LV]-[GS]-[SA]- 
CONSENSUS: F-x-P-F-x-R-[DN]. 

NAME: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G. 

NAME: Glycosyl hydrolases family 35 putative active site. 
CONSENSUS: G-G-P-[LIVM](2)-x(2)-Q-x-E-N-E-[FY]. 
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NAME: Glycosyl hydrolases family 39 active site. 
CONSENSUS: W-x-F-E-x-W-N-E-P-[DN] . 

NAME: Glycosyl hydrolases family 45 active site. 
CONSENSUS: [STA]-T-R-Y-[FYW]-D-x(5)-[CA]. 

NAME: Prokaryotic transglycosylases signature. 

CONSENSUS: [LIVM]-x(3)-E-S-x(3)-[AP]-x(3)-S-x(5)-G-[LIVM]-[LrVMFYW]-x-[LIVMFYW]- 
CONSENSUS: x(4)-[SAG]. 

NAME: Inosine-uridine preferring nucleoside hydrolase family signature. 
CONSENSUS: D-x-D-[PT]-[GA]-x-D-D-[TAV]-[VI]-A. 

NAME: Alkylbase DNA glycosidases alkA family signature. 

CONSENSUS: G-I-G-x-W-[ST]-[AV]-x-[LIVMFY](2)-x-[LlVM]-x(8)-[MF)-x(2)-[ED]-D. 
NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS: C-x(2,4)-C-x-[GTAQ]-x-[IV]-x(7)-R-[GSTAN]-[STA]-x-[FYI]-C-x(2)-C-Q. 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: [KR]-[LIV]-[LIVC]-[LIVM]-x-G-[QI]-D-P-Y. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 1. 

CONSENSUS: [CS]-N-x-[FYL]-S-[STJ-[QA]-[DEN]-x-[AV](2)-A-A-[LIV]-[SAV]. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 2. 
CONSENSUS: G-K-x(3)-[LIV]-x-G-Y-G-x-V-G-[KR]-G-x-A. 

NAME: Cytosol aminopeptidase signature. 
CONSENSUS: N-T-D-A-E-G-R-L. 

NAME: Aminopeptidase P and proline dipeptidase signature. 

CONSENSUS: [HA]-[GSYR]-[LtVMT]-[SG]-H-x-[LIV]-G-[LIVM]-x-[lVl-H-[DE]. 
NAME: Methionine aminopeptidase subfamily 1 signature. 

CONSENSUS: [MFY]-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]-[YWV]. 
NAME: Methionine aminopeptidase subfamily 2 signature. 

CONSENSUS: [DA]-[LIVMY]-x-K-[LrVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN]. 
NAME: Renal dipeptidase active site. 

CONSENSUS: [LIVM]-E-G-[GA]-x(2)-[LIVMF]-x(6)-L-x(3)-Y-x(2)-G-[LIVM]-R. 

NAME: Serine carboxypeptidases, serine active site. 
CONSENSUS: [LlVM]-x-[GTA]-E-S-Y-[AG]-[GS]. 

NAME: Serine carboxypeptidases, histidine active site. 

CONSENSUS: [LIVF]-x(2)-[LIVSTA]-x-[IVPST]-x-[GSDNQL]-[SAGV]-[SG]-H-x-[rVAQ]-P-x(3)- 
CONSENSUS: [PSA]. 

NAME: Zinc carboxypeptidases, zinc-binding region 1 signature. 

CONSENSUS: [PK]-x-[LrVMFY]-x-[LIVMFY]-x(4)-H-[STAG]-x-E-x-[LrVM]-[STAG]-x(6)- 
CONSENSUS: [LIVMFYTA]. 

NAME: Zinc carboxypeptidases, zinc-binding region 2 signature. 
CONSENSUS: H-[STAG]-x(3)-[LrVME]-x(2)-[LIVMFYW]-P-[FYW]. 

NAME: Serine proteases, trypsin family, histidine active site. 
CONSENSUS: [LIVM]-[ST]-A-[STAG]-H-C. 

NAME: Serine proteases, trypsin family, serine active site. 

CONSENSUS: [DNSTAGC]-[GSTAPlMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]-[LIVMFYWH]- 
CONSENSUS: [LIVMFYSTANQH] . 

NAME: Serine proteases, subtilase family, aspartic acid active site. 

CONSENSUS: [STAIV]-x-[LIVMF]-[LrVM]-D-pSTA]-G-[LIVMFC]-x(2,3)-[DNH] . 

NAME: Serine proteases, subtilase family, histidine active site. 

CONSENSUS: H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. 

NAME: Serine proteases, subtilase family, serine active site. 
CONSENSUS: G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG], 
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NAME: Serine proteases, V8 family, histidine active site. 

CONSENSUS: [ST]-G-[UVMFYW](3)-[GN]-x(2)-T-[LIVM]-x-T-x(2)-H. 

NAME: Serine proteases, V8 family, serine active site. 
CONSENSUS: T-x(2)-[GC]-[NQ]-S-G-S-x-[LIVM]-[FY]. 

NAME: Serine proteases, omptin family signature 1 . 
CONSENSUS: W-T-D-x-S-x-H-P-x-T. 

NAME: Serine proteases, omptin family signature 2. 

CONSENSUS: A-G-Y-Q-E-[ST]-R-[FYW]-S-[FYW]-[TN]-A-x-G-G-[ST]-Y. 
NAME: Prolyl endopeptidase family serine active site. 

CONSENSUS: D-x(3)-A-x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2). 
NAME: Endopeptidase Clp serine active site. 

CONSENSUS: T-x(2)-[LlVMF]-G-x-A-[SAC]-S-[MSA]-[PAG]-[STA]. 

NAME: Endopeptidase Clp histidine active site. 

CONSENSUS: R-x(3)-[EAP]-x(3)-[LIVMFYT]-M-[LiVM]-H-Q-P. 

NAME: ATP-dependent serine proteases, Ion family, serine active site. 
CONSENSUS: D-G-[PD]-S-A-[GS]-[LIVMCA]-[TA]-[LrVM]. 

NAME: Eukaryotic thiol (cysteine) proteases cysteine active site. 
CONSENSUS: Q-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC}-[STAGCV]. 

NAME: Eukaryotic thiol (cysteine) proteases histidine active site. 

CONSENSUS: [LIVMGSTANl-x-H-[GSACE]-[LIVM]-x-[LIVMAT](2)-G-x-[GSADNH]. 
NAME: Eukaryotic thiol (cysteine) proteases asparagine active site. 

CONSENSUS: [FYCH]-[WI]lLlVT]-x-[KRQAG]-N-[ST]-W-x(3)-[FYW]-G-x(2)-G-[LFYW]- 
CONSENSUS: [LIVMFYG]-x-[LIVMF]. 

NAME: Ubiquitin carboxyl-terminal hydrolase family 1 cysteine active-site. 
CONSENSUS: Q-x(3)-N-[SA]-C-G-x(3)-[LrVM](2)-H-[SA)-[LIVM]-[SA] . 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 1. 

CONSENSUS: G-[LrVMFY]-x(l,3)-[AGC]-[NASM]-x-C-[FYW]-[LrVMC]-[NST]-[SACV]-x-[LlVMS]- 
CONSENSUS: Q. 

NAME: Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 
CONSENSUS: Y-x-L-x-[SAG)-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y. 

NAME: Caspase family histidine active site. 

CONSENSUS: H-x(2,4)-[SC]-x(4)-[LIVMF](2)-[STJ-H-G. 

NAME: Caspase family cysteine active site. 
CONSENSUS: K-P-K-[LIVMF](4)-Q-A-C-[RQG]-G. 

NAME: Eukaryotic and viral aspartyl proteases active site. 

CONSENSUS: [LrVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[STJ-G-[STAV]-[STAPDENQ]-x-[LIVMFSTNC]- 
CONSENSUS: x-[LIVMFGTA]. 

NAME: Neutral zinc metallopeptidases, zinc-binding region signature. 

CONSENSUS: [GSTALIVN]-x(2)-H-E-[LrVMFYW]-{DEHRKP}-H-x-[LIVMFYWGSPQ]. 

NAME: Matrixins cysteine switch. 

CONSENSUS: P-R-C-[GN]-x-P-[DR]-[LIVSAPKQ] . 

NAME: Insulinase family, zinc-binding region signature. 

CONSENSUS: G-x(8,9)-G-x-[STAJ-H-[LIVMFY]-[LrVMC]-[DERN]-[HRKL]-[LMFAT]-x-tLFSTH}-x- 
CONSENSUS: [GSTAN]-[GST]. 

// 

AC PS01016; 

DE Glycoprotease family signature. 

CONSENSUS: [KR]-[GSAT]-x(4)-[FYWHL]-[DQNGK]-x-P-x-[LrVMFY]-x(3)-H-x(2)-[AG]-H- 
CONSENSUS: [LIVM]. 

NAME: Proteasome A-type subunits signature. 

CONSENSUS: [FY]-x(4)-[STNV]-x-[FYW]-S-P-x-G-[RKH]-x(2)-Q-tLIVM]-[DE]-Y-[SAD]-x(2)- 
CONSENSUS: [SAG]. 
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NAME: Proteasome B-type subunits signature. 

CONSENSUS: [LIVMA]-[GSA]-[LIVMF]-x-[FYLVGAC]-x(2)-[GSACFY]-[LIVMSTAC](3)-[GAC]- 
CONSENSUS: [GSTACV]-[DES]-x(15)-[RK]-x(12,13)-G-x(2)-[GSTA]-D. 

NAME: Signal peptidases I serine active site. 
CONSENSUS: [GS]-x-S-M-x-[PS]-[AT]-[LF]. 

NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY]. 
NAME: Signal peptidases I signature 3. 

CONSENSUS: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG]. 
NAME: Signal peptidases II signature. 

CONSENSUS: [GAF]-[GA]-[GAS]-[LIVMJ-[GAS]-N-[LVMFG]-[LrVMFY]-D-R-[LIMFA]. 
NAME: Peptidase family U32 signature. 

CONSENSUS: E-x-F-x(2)-G-[SA]-[LIVM]-C-x(4)-G-x-C-x-[UVM]-S. 
NAME: Amidases signature. 

CONSENSUS: G-[GA]-S-S-[GS]-G-x-[GSA]-[GSAVYl-x-fLIVM]-[GSA]-x(6)-[GSA]-x-[GA]-x-D- 
CONSENSUS: x-[GA]-x-S-[LIVM]-R-x-P-[GSAC] . 

NAME: Asparaginase / glutaminase active site signature 1. 
CONSENSUS: [LIVMJ-x(2)-T-G-G-T-[IV]-[AGS]. 

NAME: Asparaginase / glutaminase active site signature 2. 
CONSENSUS: G-x-[LIVM]-x(2)-H-G-T-D-T-[LIVM]. 

NAME: Urease nickel ligands signature. 

CONSENSUS: T-[AYl-[GA]-[GAT]-[LrVM]-D-x-H-[LIVM]-H-x(3)-P. 
NAME: Urease active site. 

CONSENSUS: [LIVM](2)-[CT]-H-[HN]-L-x(3)-[LrVM]-x(2)-D [LIVM1 -x-F-A. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 1 . 
CONSENSUS: [LIV]-[GALMY]-[LIVMF]-x-[GSA]-H-x-D-[TV]-[STAV]. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 2. 

CONSENSUS: [GSTAn-[SANQ]-D-x-K-[GSACN]-x(2)-[LIVMA]-x(2)-[LIVMFY]-x(14,17)-[LIVM]- 
CONSENSUS: x-[LrVMF]-[LiVMSTAG]-[LIVMFA]-x(2)-[DNG]-E-E-x-|GSTN]. 

NAME: Dihydroorotase signature 1. 

CONSENSUS: D-[LIVMFYWSAP]-H-[LIVA]-H-[LIVFHRN]-x-[PGN]. 

NAME: Dihydroorotase signature 2. 
CONSENSUS : [GA]-[ST]-D-x-A-P-H-x(4)-K. 

NAME: Beta-lactamase class-A active site. 

CONSENSUS: [FY]-x-[LIVMFY]-x-S-[TV]-x-K-x(4)-[AGLM]-x(2)-[LC]. 

NAME: Beta-lactamase class-C active site. 
CONSENSUS: F-E-[LIVM]-G-S-[LIVMG]-[SA]-K. 

NAME: Beta-lactamase class-D active site. 

CONSENSUS: [PA]-x-S-[ST]-F-K-[LrV]-[PAL]-x-[STA]-[Lil. 

NAME: Beta-lactamases class B signature 1. 

CONSENSUS: [LI]-x-[STN]-[HN]-x-H-[GSTA]-D-x(2)-G-[GPl-x(7,8)-[GS]. 

NAME: Beta-lactamases class B signature 2. 

CONSENSUS: P-x(3)-[LIVM](2)-x-G-x-C-[LrVMF](2)-K. 

NAME: Arginase family signature 1 . 

CONSENSUS: [LIVMF]-G-G-x-H-x-[LIVMT]-[STAV]-x-[PAG]-x(3)-[GSTA]. 

NAME: Arginase family signature 2. 

CONSENSUS: [LIVM](2)-x-[LIVMFY]-D-[AS]-H-x-D. 

NAME: Arginase family signature 3. 

CONSENSUS: [STJ-[LrVMFY]-D-[LIVM]-D-x(3)-[PAQ]-x(3)-P-[GSA]-x(7)-G. 
NAME: Adenosine and AMP deaminase signature. 
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CONSENSUS: [SA]-[LIVM]-[NGS]-[STA1-D-D-P. 

NAME: Cytidine and deoxycytidylate deaminases zinc-binding region signature. 

CONSENSUS: [CH]-[AGV]-E-x(2)-[LIVMFGAT]-[LIVM]-x(17,33)-P-C-x(2,8)-C-x(3)-[LIVM]. 

NAME: GTP cyclohydrolase I signature 1. 

CONSENSUS: [EN]-[LIVM](2)-x(2)-[KRQN]-[DN]-[LIVM]-x(3)-[STl-x-C-E-H-H. 
NAME: GTP cyclohydrolase I signature 2. 

CONSENSUS: [SA]-x-[RK]-x-Q-[LIVM]-Q-E-[RNJ-[LI]-[TSN]. 
NAME: Nitrilases / cyanide hydratase signature 1 . 

CONSENSUS: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-[LlVM]-x-G-Y-P. 
NAME: Nitrilases / cyanide hydratase active site signature. 

CONSENSUS: G-[GAQ]-x(2)-C-[WA]-E-[NH]-x(2)-[PSTJ-[LIVMFYS]-x-[KR] . 

NAME: Inorganic pyrophosphatase signature. 

CONSENSUS: D-[SGDN]-D-[PE]-[LIVMF]-D-[LIVMGAC]. 

NAME: Acylphosphatase signature 1 . 
CONSENSUS: [LIV]-x-G-x-V-Q-G-V-x-[FM]-R. 

NAME: Acylphosphatase signature 2. 

CONSENSUS: G-[FYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G. 

NAME: ATP synthase alpha and beta subunits signature. 
CONSENSUS: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature. 
CONSENSUS: [IV]-T-x-E-x(2)-[DE]-x(3)-G-A-x-[SAKR]. 

NAME: ATP synthase delta (OSCP) subunit signature. 

CONSENSUS: [LIVM]-x-[UVMFYT]-x(3)-[LIVMT]-[DENQK]-x(2)-[LIVM]-x-[GSA]-G-[LrVMFYGA]- 
CONSENSUS: x-[LIVM]-[KRHENQ]-x-[GSEN]. 

NAME: ATP synthase a subunit signature. 

CONSENSUS: [STAGN]-x-[STAG]-[LIVMF]-R-L-x-[SAGV]-N-[LIVMT]. 
NAME: ATP synthase c subunit signature. 

CONSENSUS: [GSTA]-R-[NQ]-P-x(10)-[LIVMFiT^(2)-x(3)-[LIVMFYW]-x-[DE]. 

NAME: E1-E2 ATPases phosphorylation site. 
CONSENSUS: D-K-T-G-T-[LI]-[Ti] . 

NAME: Sodium and potassium ATPases beta subunits signature 1. 

CONSENSUS: [FYW]-x(2)-[FYW]-x-[FYW]-[DN]-x(6)-[LIVM]-G-R-T-x(3)-W. 

NAME: Sodium and potassium ATPases beta subunits signature 2. 
CONSENSUS: [RK]-x(2)-C-[RKQWI]-x(5)-L-x(2)-C-[SA]-G. 

NAME: GDA1/CD39 family of nucleoside phosphatases signature, 

CONSENSUS: [LrVM]-x-G-x(2)-E-G-x-[FY]-x-[FW]-[LIVA]-[TAG]-x-N-[HY]. 

NAME: Iodothyronine deiodinases active site. 
CONSENSUS: R-P-L-V-x-N-F-G-S-[CA)-T-C-P-x-F. 

NAME: Cutinase, serine active site. 

CONSENSUS: P-x-[STA]-x-[LIV]-[IVT]-x-[GS]-G-Y-S-[QL]-G. 

NAME: Cutinase, aspartate and histidine active sites. 

CONSENSUS: C-x(3)-D-x-[IV]-C-x-G-[GST]-x(2)-[LIVMl-x(2,3)-H. 

NAME: DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site. 

CONSENSUS: S-[LIVMFYW]-x(5)-K-[LIVMFYWG](2)-x(3)-[LIVMFYW]-x-[CA]-x(2)-[LrVMFYWOJ- 
CONSENSUS: x(2)-[RK], 

NAME: Orn/Lys/Arg decarboxylases family 1 pyridoxal-P attachment site. 
CONSENSUS: [STAV]-x-S-x-H-K-x(2)-[GSTAN](2)-x-[STA]-Q-[STA](2). 

NAME: Orn/DAP/Arg decarboxylases family 2 pyridoxal-P attachment site. 

CONSENSUS: [FY]-[PA]-x-K-[SACV]-[NHCLFW]-x(4)-[LrVMF]-[LrVMTA]-x(2)-[LIVMA]-x(3)- 
CONSENSUS: [GTE]. 
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NAME: Ora/DAP/Arg decarboxylases family 2 signature 2. 

CONSENSUS: [GS]-x(2,6)-[LIVMSCP]-x(2)-[LIVMF]-[DNS]-[LIVMCA]-G-G-G-[LlVMFY]- 
CONSENSUS: [GSTPCEQJ. 

NAME: Orotidine 5-phosphate decarboxylase active site. 

CONSENSUS: [LIVMFTA]-[LrVMF]-x-D-x-K-x(2)-D-I-[GP]-x-T-|LrVMTA]. 

NAME: Phosphoenolpyruvate carboxylase active site 1. 
CONSENSUS: [VT]-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH] . 

NAME: Phosphoenolpyruvate carboxylase active site 2. 
CONSENSUS: [IV]-M-[LIVM]-G-Y-S-D-S-x-K-D-[STAG]-G. 

NAME: Phosphoenolpyruvate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N. 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature. 
CONSENSUS: L-I-G-D-D-E-H-x-W-x-[DE]-x-G-[IV]-x-N. 

NAME: Uroporphyrinogen decarboxylase signature 1. 
CONSENSUS: P-x-W-x-M-R-Q-A-G-R. 

NAME: Uroporphyrinogen decarboxylase signature 2. 

CONSENSUS: G-F-[STAGCV]-[STAGC]-x-P-[FYW]-T-[LV]-x(2)-Y-x(2)-[AE]-[GK]. 
NAME: Indole-3-glycerol phosphate synthase signature. 

CONSENSUS: fLlVMFY]-[LrVMC]-x-E-tLIVMFYC]-K-[KRSPl-ISTAK]-S-P-[ST]-x(3)-[LiVMFYSTl. 

NAME: Ribulose bisphosphate carboxylase large chain active site. 
CONSENSUS: G-x-[DN]-F-x-K-x-D-E. 

NAME: Fructose-bisphosphate aldolase class-I active site. 
CONSENSUS: [LIVM]-x-[LIVMFYW]-E-G-x-[LS]-L-K-P-[SN]. 

NAME: Fructose-bisphosphate aldolase class-II signature 1. 

CONSENSUS: [FYVM]-x(l,3)-[LIVMH]-[APN]-[LIVM]-x(l,2)-[LrVM]-H-x-D-H-[GACH]. 

NAME: Fructose-bisphosphate aldolase class-II signature 2. 
CONSENSUS: [LIVM]-E-x-E-[LIVM]-G-x(2)-tGM]-(GSTA]-x-E. 

NAME: Malate synthase signature. 

CONSENSUS: [KR]-[DENQ]-H-x(2)-G-L-N-x-G-x-W-D-Y-[LTVM]-F. 

NAME: Hydroxymethylglutaryl-coenzyme A lyase active site. 
CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 

NAME: Hydroxymethylglutaryl-coenzyme A synthase active site. 
CONSENSUS: N-x-[DN]-[IV]-E-G-[IV]-D-x(2)-N-A-C-[FYl-x-G. 

NAME: Citrate synthase signature. 

CONSENSUS: G-[FYA]-[GA]-H-x-[IV]-x(l,2)-[RKTJ-x(2)-D-[PS]-R. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 1. 
CONSENSUS: L-R-[DE]-G-x-Q-x( 1 0)-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 2. 
CONSENSUS: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x [GASL1]. 

NAME: KDPG and KHG aldolases active site. 
CONSENSUS: G-[LIVM]-x(3)-E-[LIV]-T-[LF]-R. 

NAME: KDPG and KHG aldolases Schiff-base forming residue. 
CONSENSUS: G-x(3)-[LIVMF]-K-[LF]-F-P-[SA]-x(3)-G. 

NAME: Isocitrate lyase signature. 
CONSENSUS: K-[KR]-C-G-H-[LMQ]. 

NAME: Beta-eliminating lyases pyridoxal-phosphate attachment site. 
CONSENSUS: Y-x-D-x(3)-M-S-tGA]-K-K-D-x-[LIVM](2)-x-[LrVM]-G-G. 

NAME: DNA photolyases class 1 signature 1. 

CONSENSUS: T-G-x-P-[LIVM](2)-D-A-x-M-[RA]-x-[LrVM]. 

NAME: DNA photolyases class 1 signature 2. 
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CONSENSUS: [DN]-R-x-R-[LIVM](2)-x-[STA](2)-F-[LIVMFA]-x-K-x-L-x(2,3)-W-[KRQ]. 

NAME: DNA photolyases class 2 signature 1 . 
CONSENSUS: F-x-E-E-x-[LIVM](2)-R-R-E-L-x(2)-N-F. 

NAME: DNA photolyases class 2 signature 2. 

CONSENSUS: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N. 
NAME: Eukaryotic-type carbonic anhydrases signature. 

CONSENSUS: S-E-H-x-[LIVM]-x(4)-[FYH]-x(2)-E-[LIVM]-H-[LrVMFA](2). 

NAME: Prokaryotic-type carbonic anhydrases signature 1. 
CONSENSUS: C-[SA]-D-S-R-[LIVM]-x-[AP], 

NAME: Prokaryotic-type carbonic anhydrases signature 2. 

CONSENSUS: [EQ]-Y-A-[LIVM]-x(2)-[LlVM]-x(4)-[LIVMFl(3)-x-G-H-x(2)-C-G. 

NAME: Fumarate lyases signature. 
CONSENSUS: G-S-x(2)-M-x(2)-K-x-N. 

NAME: Aconitase family signature 1 . 

CONSENSUS: [LrVM]-x(2)-[GSACIVM]-x-[LlV]-[GTIV]-[STP]-C-x(0,l)-T-N-[GSTANI]-x(4)- 
CONSENSUS: [LIVMA] . 

NAME: Aconitase family signature 2. 

CONSENSUS: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LlMPTA]-C-[LIMV]-[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 1. 
CONSENSUS: C-D-K-x(2)-P-[GA]-x(3)-[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 2. 
CONSENSUS: [SA]-L-[LIVM]-T-D-[GA]-R-[LrVMF]-S-[GA]-[GAV]-|ST]. 

NAME: Dehydroquinase class I active site. 

CONSENSUS: D-[LIVM]-[DE]-[LIVN]-x(18,20)-fLrVM](2)-x-[SC]-[NHY]-H-[DN]. 
NAME: Dehydroquinase class II signature. 

CONSENSUS: [LIVM]-[NQ]-G-P-N-[LV]-x(2)-L-G-x-R-[QED]-P-x(2)-[FY]-G. 
NAME: Enolase signature. 

CONSENSUS: [LIV](3)-K-x-N-Q-I-G-[ST]-[LIV]-[ST]-[DE]-[STA]. 

NAME: Serine/threonine dehydratases pyridoxal-phosphate attachment site. 

CONSENSUS: [DESH]-x(4,5) [STVG]-x-[AS]-[FYrj-K-[DLIFSA]-[RVMF]-[GA]-[LIVMGA]. 

NAME: Enoyl-CoA hydratase/isomerase signature. 

CONSENSUS: [LrVM]-[STAl-x-[LIVM]-[DENQRHSTAl-G-x(3)-rAGK3)-x(4)-rLIVMSTl-x-[CSTA]- 
CONSENSUS: PQHP]-[LIVMFY]. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 1. 

CONSENSUS: [LrVMY]-[DE]-x-H-H-x(2)-E-x(2)-[GCA]-[LIVM]-[STAC]-[LIVM] . 

NAME: Imidazoleglycerol-phosphate dehydratase signature 2 . 
CONSENSUS: G-x-[DN]-x-H-H-x(2)-E-[STAGC]-x-[FY]-K. 

NAME: Tryptophan synthase alpha chain signature. 

CONSENSUS: [LIVM]-E-fLIVM]-G-x(2)-[FYC]-[ST]-[DE}-[PA]-[LIVMY]-[AGLI]-[DE]-G. 

NAME: Tryptophan synthase beta chain pyridoxal-phosphate attachment site. 
CONSENSUS: [LIVM]-x-H-x-G-[STA]-H-K-x-N. 

NAME: Delta-aminolevulinic acid dehydratase active site. 
CONSENSUS: G-x-D-x-[LIVM](2)-[IV]-K-P-[GSA]-x(2)-Y. 

NAME: Urocanase active site. 
CONSENSUS: F-Q-G-L-P-x-R-I-C-W. 

NAME: Prephenate dehydratase signature 1. 

CONSENSUS: [Fil-x-[LIVM]-x(2)-[LIVM]-x(5)-[Drfl-x(5)-T-R-F-[Lmvn\n-x-[LrVM]. 

NAME: Prephenate dehydratase signature 2. 
CONSENSUS: [LIVM]-[ST]-[KR]-[LIVM]-E-[STJ-R-P. 

NAME: Dihydrodipicolinate synthetase signature 1. 



1056 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 
CONSENSUS: [GSA]-[LIVM]-[LIVMFY]-x(2)-G-[STl-[TG]-G-E-[GASNF]-x(6)-[EQJ. 
NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS: Y-rDNSl-rLIVMF]-P-x(2)-[STl-x(3)-[LIVM]-x(13,14)-[LIVMJ-x-[SGA]-[LIVMFl- 
CONSENSUS: K-[DEQAF]-[STAC] . 

NAME: RsuA family of pseudouridine synthase signature. 
CONSENSUS: G-R-L-D-x(2)-[ST]-x-G-[LIVMF](4)-[STJ-tDNT]. 

NAME: Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site. 
CONSENSUS: K-x-E-x(3)-[PA]-[STAGC]-x-S-riVAP]-K-x-R-x-[STAG]-x(2)-[LiVM]. 

NAME: Phenylalanine and histidine ammonia-lyases signature. 

CONSENSUS: G-[STG]-[LIVM]-[STG]-[AC]-S-G-[DH]-L-x-P-L-[SA]-x(2)-[SA]. 

NAME: Porphobilinogen deaminase cofactor-binding site. 

CONSENSUS: E-R-x-[LIVMFA]-x(3)-[LIVMF]-x-G-[GSA]-C-x-[IVT]-P-[UVMF]-[GSA]. 
NAME: Cys/Met metabolism enzymes pyridoxal-phosphate attachment site. 

CONSENSUS: [DQ]-[LTVMF]-x(3)-[STAGC]-[STAGCI]-T-K-[FYWQ]-[LlVMF]-x-G-[HQ]-[SGNH]. 
NAME: Glyoxalase I signature 1. 

CONSENSUS: tHQ]-[IVTJ-x-[LIVFY]-x-[IV]-x(5)-[STA]-x(2)-F-[YM]-x(2,3)-[LMF]-G-[LMF]. 
NAME: Glyoxalase I signature 2. 

CONSENSUS: G-[NTKQ]-x(0,5)-[GA]-[LVFY]-[GH]-H-[IVF]-[CGA]-x-[STAGL]-x(2)-[DNC]. 

NAME: Cytochrome c and cl heme lyases signature 1. 
CONSENSUS: H-N-x(2)-N-E-x(2)-W-[NQKR]-x(4)-W-E. 

NAME: Cytochrome c and cl heme lyases signature 2. 
CONSENSUS: P-F-D-R-H-D-W. 

NAME: Adenylate cyclases class-I signature 1. 
CONSENSUS : E-Y-F-G-[SA)(2)-L-W-x-L-Y-K. 

NAME: Adenylate cyclases class-I signature 2. 

CONSENSUS: Y-R-N-x-W-[NS]-E-[LIVM1-R-T-L-H-F-x-G. 

NAME: Guanylate cyclases signature. 

CONSENSUS: G-V-[LrVM]-x(0,l)-G-x(5)-[FYl-x-[LIVM]-[FYW]-[GS]-[DNTHKW]-(DNT]-[IV]- 
CONSENSUS: [DNTA]-x(5)-[DE]. 

NAME: Chorismate synthase signature 1. 

CONSENSUS: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-[DE]-G-x-[PV]. 
NAME: Chorismate synthase signature 2. 

CONSENSUS: [GE]-R-[SA](2)-[SAG]-R-[EV]-[STJ-x(2)-[RHJ-V-x(2)-G. 
NAME: Chorismate synthase signature 3. 

CONSENSUS: R-[SHl-D-[PSV]-[CSAV]-x(4)-[GAIl-x-[IVGSP]-[LIVM]-x-E-|STAH]-[LrVM]. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 1 . 
CONSENSUS: C-N-N-x(2)-G-H-G-H-N-Y. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 2. 
CONSENSUS: D-H-K-N-L-D-x-D. 

NAME: Ferrochelatase signature. 

CONSENSUS: [LIVMF](2)-x-S-x-H-[GS]-[LIVM]-P-x(4,5)-[DENQKR]-x-G-D-x-Y. 

NAME: Alanine racemase pyridoxal-phosphate attachment site. 
CONSENSUS: V-x-K-A-[DN]-[GA]-Y-G-H-G. 

NAME: Aspartate and glutamate racemases signature 1 . 

CONSENSUS: [IVA]-[LrVM]-x-C-x(0, 1)-N-[STJ-[MSA]-[STH]-[LIVFYSTANK]. 
NAME: Aspartate and glutamate racemases signature 2. 

CONSENSUS: [LIVM](2)-x-[AG]-C-T-[DEH]-[LIVMFYJ-[PNGRS]-x-LLIVM]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 1. 
CONSENSUS: A-x-[SAG](2)-[LIVM]-[DEJ-x-A-x(2)-D-x(2)-[GA]-[KR]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 2. 
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CONSENSUS: G-x(7)-D-x(9)-A-x(14HLIVM]-E-[DENQl-P-x(4)-[DENQ] . 
NAME: Ribulose-phosphate 3-epimerase family signature 1. 

CONSENSUS: [LIVMF|-H-[LIVMFY]-D-[LIVM]-x-D-x(l,2)-[Fin-[LIVM]-x-N-x-[STAV]. 
NAME: Ribulose-phosphate 3-epimerase family signature 2. 

CONSENSUS: [LlVMA]-x-[LIVM]-M-[ST]-[VS}-x-P-x(3)-G-Q-x-F-x(6)-[NK]-[LIVMC]. 

NAME: Aldose 1-epimerase putative active site. 
CONSENSUS: [NS]-x-T-N-H-x-Y-[FW]-N-[LI] . 

NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: [FY]-x(2)-[STCNLV]-x-F-H-[RH]-[LIVMN]-[LIVM]-x(2)-F-[LlVM]-x-Q-[AG]-G. 
NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase profile. 
NAME: FKBP-type peptidyl-prolyl cis-trans isomerase signature 1. 

CONSENSUS: [LIVMC]-x-[YF]-x-[GVL]-x(l,2)-[LFT]-x(2)-G-x(3)-[DE]-[STAEQK]-[STANl. 
NAME: FKBP-type peptidyl-prolyl cis-trans isomerase signature 2. 

CONSENSUS: [LIVMFY]-x(2)-[GA]-x(3.4)-[LIVMF]-x(2)-[LIVMFHK]-x(2)-G-x(4)-[LIVMFl- 
CONSENSUS: x(3)-[PSGAQ]-x(2)-[AG]-[FY]-G. 

NAME: FKBP-type peptidyl-prolyl cis-trans isomerase domain profile. 

NAME: PpiC-type peptidyl-prolyl cis-trans isomerase signature. 

CONSENSUS: F-[GSADEi>x-[LVAQ]-A-x(3)-[ST]-x(3,4)-[STQ]-x(3,5)-[GER]-G-x-[LrVM]- 
CONSENSUS: [GS]. 

NAME: Triosephosphate isomerase active site. 
CONSENSUS : [AV]-Y-E-P-[LIVM] -W-[SA)-I-G-T-[GK}. 

NAME: Xylose isomerase signature 1 . 
CONSENSUS: [LI]-E-P-K-P-x(2)-P. 

NAME: Xylose isomerase signature 2. 

CONSENSUS: [FL]-H-D-x-D-[LIV]-x-[PD]-x-[GDE]. 

NAME: Phosphomannose isomerase type I signature I. 
CONSENSUS: Y-x-D-x-N-H-K-P-E. 

NAME: Phosphomannose isomerase type I signature 2. 

CONSENSUS: H-A-Y-[LIVM]-x-G-x(2)-[LIVM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A-G-x-T-P-K. 
NAME: Phosphoglucose isomerase signature 1. 

CONSENSUS: [DENS]-x-[LIVM]-G-G-R-[FY]-S-[LIVMT]-x-[STA]-[PSAC]-[LIVMA]-G. 
NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: [GS]-x-[LIVM]-[LIVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x(2)-K. 

NAME: Glucosamine/galactosamine-6-phosphate isomerases signature. 

CONSENSUS: [LIVM]-x(3)-G-x-[LIT]-x-[LIV]-x-[LIVM]-x-G-[LrVM]-G-x-[DEN]-G-H. 

NAME: Phosphoglycerate mutase family phosphohistidine signature. 
CONSENSUS: [LIVM]-x-R-H-G-[EQ]-x(3)-N. 

NAME: Phosphoglucomutase and phosphomannomutase phosphoserine signature. 
CONSENSUS: [GSA]-[LIVM]-x-[LIVM]-[STl-[PGA]-S-H-x-P-x(4)-[GNHE] . 

NAME: Methylmalonyl-CoA mutase signature. 

CONSENSUS: R-I-A-R-N-[TQ]-x(2)-[LIVMFY](2)-x-[EQ]-E-x(4)-[KRN]-x(2)-D-P-x-[GSA]- 
CONSENSUS: G-S. 

NAME: Terpene synthases signature. 

CONSENSUS: [DE]-G-S-W-x-G-x-W-[GA]-[LrVM]-x-[FY]-x-Y-[GA]. 

NAME: Eukaryotic DNA topoisomerase I active site. 

CONSENSUS: [DEN]-x(6)-[GSJ-[IT]-S-K-x(2)-Y-[LIVM]-x(3)-[LIVM]. 

NAME: Prokaryotic DNA topoisomerase I active site. 

CONSENSUS: [EQJ-x-L-Y-[DEQT]-x(3, 12)-[LI]-[ST]-Y-x-R-[ST]-[DEQS]. 

NAME: DNA topoisomerase II signature. 
CONSENSUS: [LIVMA]-x-E-G-[DN]-S-A-x-[STAG]. 
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NAME: Aminoacyl-transfer RNA synthetases class-I signature. 

CONSENSUS: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G-[HNTG]- 
CONSENSUS: [LIVMFYSTAGPC]. 

NAME: Aminoacyl-transfer RNA synthetases class-II signature 1 . 
CONSENSUS: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE]. 

NAME: Aminoacyl-transfer RNA synthetases class-II signature 2. 

CONSENSUS: [GSTALVF]-{DENQHRKP}-[GSTA]-rLIVMFl-[DE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY]. 
NAME: WHEP-TRS domain signature. 

CONSENSUS: [QY]-G-[DNEA]-x-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)-[LIV]-[DENK]- 
CONSENSUS: x(2)-[IV]-x(2)-L-x(3)-K. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 1. 

CONSENSUS: S-[KR]-S-G-[GT]-[LIVM]-[GST]-x-[EQ]-x(8.10)-G-x(4)-[LlVM]-[GA]-[LIVM]-G- 
CONSENSUS: G-D. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active site. 
CONSENSUS: G-x(2)-A-x(4,7)-[RQT]-[LIVMF]-G-H-[AS]-[GH]. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 3. 

CONSENSUS: G-x-[IV]-x(2)-[LIVMF]-x-[NA]-G-[GA]-G-[LA]-[STAV]-x(4)-D-x-[LIVM]-x(3)- 
CONSENSUS: G-[GRE]. 

NAME: Glutamine synthetase signature 1. 

CONSENSUS: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-[DE]-x(2)-[LIVMFY]. 

NAME: Glutamine synthetase putative ATP-binding region signature. 
CONSENSUS: K-P-[LIVMFYA]-x(3,5)-[NPAT]-G-[GSTAN]-G-x-H-x(3)-S. 

NAME: Glutamine synthetase class-I adenylation site. 
CONSENSUS: K-[LIVM]-x(5)-[LIVMA]-D-[RK]-[DN]-[LI]-Y. 

NAME: D-alanine-D-alanine ligase signature 1 . 

CONSENSUS: H-G-x(2)-G-E-D-G-x-[LIVMA]-[QSA]-[GSA]. 

NAME: D-alanine-D-alanine ligase signature 2. 

CONSENSUS: [LIV]-x(3)-[GA]-x-[GSAIV]-R-[LIVCA]-D-[LrVMFl(2)-x(7,9)-[LI]-x-E- 
CONSENSUS: [LIVA]-N-[STP]-x-P-[GA] . 

NAME: SAICAR synthetase signature 1. 

CONSENSUS: [LIVMF](2)-P-[LIVM]-E-x-[LIVMl-[LIVMCAl-R-x(3)-[TA]-G-S. 

NAME: SAICAR synthetase signature 2. 

CONSENSUS: [LIVMJ-[LIVMA]-D-x-K-[LrVMFY]-E-F-G. 

NAME: Folylpolyglutamate synthase signature 1. 

CONSENSUS: [LIVMFY]-x-[LrVM]-[STAG]-G-T-[NK]-G-K-x-[ST]-x(7)-[LIVM](2)-x(3)-[GSK], 
NAME: Folylpolyglutamate synthase signature 2. 

CONSENSUS: [LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2). 

NAME: Ubiquitin-activating enzyme signature 1. 
CONSENSUS: K-A-C-S-G-K-F-x-P. 

NAME: Ubiquitin-activating enzyme active site. 
CONSENSUS: P-[LrVM]-C-T-[LIVM]-[KRH]-x-[FT]-P. 

NAME: Ubiquitin-conjugating enzymes active site. 

CONSENSUS: [FYWLSP]-H-[PC]-[NH]-[LIV]-x(3,4)-G-x-lLrV]-C-[LIV]-x-[LIV]. 

NAME: Formate-tetrahydrofolate ligase signature 1. 
CONSENSUS: G-[LIVM]-K-G-G-A-A-G-G-G-Y. 

NAME: Formate-tetrahydrofolate ligase signature 2. 
CONSENSUS: V-A-T-[IV]-R-A-L-K-x-[HNl-G-G. 

NAME: Adenylosuccinate synthetase GTP-binding site. 
CONSENSUS: Q-W-G-D-E-G-K-G. 

NAME: Adenylosuccinate synthetase active site. 
CONSENSUS: G-I-[GR]-P-x-Y-x(2)-K-x(2)-R. 
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NAME: Argininosuccinate synthase signature 1. 
CONSENSUS: A-[FY]-S-G-G-L-D-T-S. 

NAME: Argininosuccinate synthase signature 2. 
CONSENSUS: G-x-T-x-K-G-N-D-x(2)-R-F. 

NAME: Phosphoribosylglycinamide synthetase signature. 
CONSENSUS: R-F-G-D-P-E-x-[QM], 

NAME: Carbamoyl-phosphate synthase subdomain signature 1. 

CONSENSUS: [FYV]-[PS]-[LIVMC]-[LiVMA]-[LiVM]-[KR]-tPSA]-[STA]-x(3)-[SG}-G-x-[AG]. 
NAME: Carbamoyl-phosphate synthase subdomain signature 2. 

CONSENSUS: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] . 

NAME: ATP-dependent DNA ligase AMP-binding site. 
CONSENSUS: [EDQH]-x-K-x-[DN]-G-x-R-[GACIVM]. 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-[LIVMA]-[LIVM](2)-[KR]-x(5,8)-[YWJ-[QNEK]-x(2,6)-[KRH]-x(3 > 5)-K- 
CONSENSUS: [LIVMFY]-K. 

NAME: NAD-dependent DNA ligase signature 1. 

CONSENSUS: K-[LIVM]-D-G-[LIVMl-[SA]-x(4)-Y-x(2)-G-x-L-x(4)-[ST]-R-G-[DN]-G-x(2)-G- 
CONSENSUS: [DE]-[DENL]. 

NAME: NAD-dependent DNA ligase signature 2. 

CONSENSUS: [IV]-G-[KR]-[ST]-G-x-[LIVM]-[STNK]-x-[VT]-x(2)-L-x-[PS]-V. 

NAME: RNA 3'-terminal phosphate cyclase signature. 
CONSENSUS: [RH]-G-x(2)-P-x-G(3)-x-[LIV], 

NAME: Lipoate-protcin ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-[FYW]-H-x(2)-[GH]-Q-x-[LIV]-x-Y. 

NAME: Isopenicillin N synthetase signature 1. 
CONSENSUS: [RK]-x-[STA]-x(2)-S-x-C-Y-[SLJ. 

NAME: Isopenicillin N synthetase signature 2. 

CONSENSUS: [LIVM](2)-x-C-G-lSTA]-x(2)-[STAG]-x(2)-T-x-[DNG]. 

NAME: Site-specific recombinases active site. 
CONSENSUS: Y-[LIVAC]-R-[VA]-S-[ST]-x(2)-Q. 

NAME: Site-specific recombinases signature 2. 

CONSENSUS: G-[DE]-x(2)-[LIVM]-x(3)-[LIVM]-[DT]-R-[LIVM]-[GSA] . 
NAME: Transposases, Mutator family, signature. 

CONSENSUS: D-x(3)-G-[LIVMF]-x(6)-[STAV]-[LrVMFYW]-[PT]-x-[STAV]-x(2)-|QR]-x-C-x(2)- 
CONSENSUS: H. 

NAME: Transposases, IS30 family, signature. 

CONSENSUS: R-G-x(2)-E-N-x-N-G-[LrVM](2)-R-[QEl[LIVMFY](2)-P-K. 
NAME: Autoinducers synthetases family signature. 

CONSENSUS: [LMFY]-R-x(3)-F-x(2)-[KR]-x(2)-W-x-[LIVM]-x(6,9)-E-x-D-x-[FY]-D. 
NAME: Thiamine pyrophosphate enzymes signature. 

CONSENSUS: [LrVMF]-[GSA]-x(5)-P-x(4)-[LIVMFYW]-x-[LIVMF]-x-G-D-[GSA]-[GSAC]. 
NAME: Biotin-requiring enzymes attachment site. 

CONSENSUS: [GN]-[DEQTR]-x-[LlVMFY]-x(2)-[LIVM]-x-[ArV]-M-K-[LMAT]-x(3)-[LIVM]-x- 
CONSENSUS: [SAV]. 

NAME: 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site. 
CONSENSUS: [GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-[LrVFA]-x(3)-K-[STAIV]-[STAVQDN]- 
CONSENSUS: x(2)-[LIVMFS]-x(5)-[GCN]-x-[LIVMFY]. 

NAME: Putative AMP-binding domain signature. 

CONSENSUS: [LrVMFY]-x(2)-[STG]-[STAG]-G-[ST)-[STEI]-[SG]-x-[PASLIVM]-tKR]. 

NAME: Molybdenum cofactor biosynthesis proteins signature 1. 
CONSENSUS: [LIVM](3)-[LIT](2)-G-G-T-G-x(4)-D. 
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NAME: Molybdenum cofactor biosynthesis proteins signature 2. 

CONSENSUS : S-x-[GS]-x(2)-D-x(5)-[LIVW]-x( 10, 12)-[LIV]-x(2)-[KR]-P-G-[KRL]-P-x(2)- 
CONSENSUS: [LIVMF]-[GA]. 

NAME: moaA / nifB / pqqE family signature. 

CONSENSUS: [LIV]-x(3)-C-[NP]-[LrVMF]-[QRS]-C-x-[FYMJ-C. 

NAME: Radical activating enzymes signature. 

CONSENSUS: [GV]-x-G-x-[KR]-x(3)-F-x(2)-G-x(0,l)-C-x(3)-C-x(2)-C-x-[NL]. 

NAME: Tpx family signature. 

CONSENSUS: S-x-D-L-P-F-A-x(2)-[KR]-[FW]-C. 

NAME: Cytochrome c family heme-binding site signature. 
CONSENSUS: C-{CPWHF}-{CPWR}-C-H-{CFYW}. 

NAME: Cytochrome b5 family, heme-binding domain signature. 
CONSENSUS: [FY]-[LIVMK]-x(2)-H-P-[GA]-G. 

NAME: Cytochrome b/b6 heme-ligand signature. 

CONSENSUS: [DENQ]-x(3)-G-[FYWMQ]-x-[LIVMF]-R-x(2)-H. 

NAME: Cytochrome b/b6 Qo site signature. 
CONSENSUS: P-[DE]-W-[FY]-[LFY](2). 

NAME: Cytochrome b559 subunits heme-binding site signature. 

CONSENSUS: [LIV]-x-[ST]-[LIVF]-R-[FYW]-x(2)-[IV]-H-[STGA]-[LlV]-[STGA]-[IV]-P. 
NAME: Nickel-dependent hydrogenases b-type cytochrome subunit signature 1 . 

CONSENSUS: R-[LIVMFYWl-x-H-W-[LIVM]-x(2)-[LIVMF]-[STAC]-[LIVM]-x(2)-L-x-[LIVM]-T-G. 

NAME. Nickel-dependent hydrogenases b-type cytochrome subunit signature 2. 

CONSENSUS: [RH]-[STA]-[LIVMFYW]-H-[RH]-[L[VM]-x(2)-W-x-[LIVMF]-x(2)-F-x(3)-H. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 1. 

CONSENSUS: R-P-[LIVMT]-x(3)-[LIVM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x-[ST]. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 2. 
CONSENSUS: H-x(3)-[GA]-[LIVMT]-R-[HF]-[LIVMF]-x-[FYWM]-D-x-[GVA]. 

NAME: Thioredoxin family active site. 

CONSENSUS: [LrVMF]-[LIVMSTA]-x-[LIVMFYC]-[FYWSTHE]-x(2)-[FYWGTN]-C-[GATPLVE]- 
CONSENSUS: [PHYWSTA]-C-x(6)-[LrVMFYWT]. 

NAME: Glutaredoxin active site. 

CONSENSUS: [LrVDl-[FYSA]-x(4)-C-[PV]-[FYW]-C-x(2)-[TAV]-x(2,3)-[LIV], 
NAME: Type-1 copper (blue) proteins signature. 

CONSENSUS: [GA]-x(0,2)-[YSA]-x(0,t)-[VFY]-x-C-x(l,2)-[PG]-x(0,l)-H-x(2,4)-[MQ]. 

NAME: 2Fe-2S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-{C}-{C}-[GA]-{C}-C-[GAST]-{CPDEKRHFYW}-C. 

NAME: Adrenodoxin family, iron-sulfur binding region signature. 
CONSENSUS: C-x(2)-[STAQ]-x-[STAMV]-C-[STAl-T-C-[HR]. 

NAME: 4Fe-4S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-x(2)-C-x(2)-C-x(3)-C-[PEG]. 

NAME: High potential iron-sulfur proteins signature. 
CONSENSUS: C-x(6,9)-[LIVM]-x(3)-G-[YW]-C-x(2)-[FYW]. 

NAME: Rieske iron-sulfur protein signature 1 . 
CONSENSUS: C-[TK]-H-L-G-C-[LIVT| . 

NAME: Rieske iron-sulfur protein signature 2. 
CONSENSUS: C-P-C-H-x-[GSA]. 

NAME: Flavodoxin signature. 

CONSENSUS: [LIV]-[LIVFY]-[FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIV]. 

NAME: Rubredoxin signature. 

CONSENSUS: [LIVM]-x(3)-W-x-C-P-x-C-[AGD]. 
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NAME: Electron transfer flavoprotein alpha-subunit signature. 

CONSENSUS: [LI]-Y-[LIVM]-[AT|-x-G-[IV]-[SD]-G-x-[IV]-Q-H-x(2)-G-x(6)-[IV]-x-A- 
CONSENSUS: [IVJ-N. 

NAME: Electron transfer flavoprotein beta-subunit signature. 

CONSENSUS: [IVA]-x-[KR]-x(2)-[DE]-[GD]-[GDE]-x(l,2)-[EQ]-x-[LIV]-x(4)-P-x-[LIVM](2)- 
CONSENSUS: [TAC]. 

NAME: Vertebrate metallothioneins signature. 

CONSENSUS: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K. 

NAME: Ferritin iron-binding regions signature 1 . 

CONSENSUS: E-x-[KR]-E-x(2)-E-[KR]-[LF]-[LIVMA]-x(2)-Q-N-x-R-x-G-R. 
NAME: Ferritin iron-binding regions signature 2. 

CONSENSUS: D-x(2)-[LIVMF]-[STAC]-[DH]-F-[LI]-[EN]-x(2)-[FY]-L-x(6)-[LIVM]-[KN]. 
NAME: Bacterioferritin signature. 

CONSENSUS: < M-x-G-x(3)-V-[LIV]-x(2)-[LM]-x(3)-L-x(3)-L. 

NAME: Transferrins signature 1 . 

CONSENSUS: Y-x(0,l)-[VAS]-V-[IVAC]-[IVA]-[!VA]-[RKH]-[RKS]-[GDENSA]. 
NAME: Transferrins signature 2. 

CONSENSUS: Y-x-G-A-[FL]-[KRHNQ]-C-L-x(3,4)-G-[DENQ]-V-[GA]-[FYW]. 
NAME: Transferrins signature 3. 

CONSENSUS: [DENQ]-[YF]-x-[LYl-L-C-x-[DN]-x(5,8)-[LIV]-x(4,5)-C-x(2)-A-x(4)-[HQR]-x- 
CONSENSUS: [LIVMFYW]-[LlVM]. 

NAME: Globins profile. 

NAME: Protozoan/cyanobacterial globins signature. 

CONSENSUS: F-[LF]-x(5)-G-[PA]-x(4)-G-[KRA]-x-[LIVM]-x(3)-H. 

NAME: Plant hemoglobins signature. 
CONSENSUS: [SN]-P-x-L-x(2)-H-A-x(3)-F. 

NAME: Hemerythrins signature. 
CONSENSUS: W-L-x-[NQ]-H-I-x(3)-D-F. 

NAME: Arthropod hemocyanins / insect LSPs signature 1 . 
CONSENSUS: Y-[FYW]-x-E-D-[LIVM]-x(2)-N-x(6)-H-x(3)-P. 

NAME: Arthropod hemocyanins / insect LSPs signature 2. 
CONSENSUS: T-x(2)-R-D-P-x-[FY]-[FYW] . 

NAME: Heavy-metal-associated domain. 

CONSENSUS: [LIVN]-x(2)-[LIVMFA]-x-C-x-[STAGCDNH]-C-x<3)-[LIVFG]-x(3)-[LIV]-x(9, 1 1 )- 
CONSENSUS: [IVA]-x-[LVFYS]. 

NAME: ABC transporters family signature. 

CONSENSUS: [LIVMFYC]-[SA]-[SAPGLVFYKQH]-G-[DENQMW]-[KRQASPCLIMFW]-[KRNQSTAVM]- 
CONSENSUS: [KRACLVM]-[LIVMFYPAN]-{PHY}-[LIVMFW]-[SAGCLrVP]-{FrWHP}-{KRHP}- 
CONSENSUS: [LIVMFYWSTA] . 

NAME: Binding-protein-dependent transport systems inner membrane comp. sign. 

CONSENSUS: [LIVMFY]-x(8)-[EQR]-[STAGV]-[STAGl-x(3)-G-[LIVMFYSTAC]-x(5)-[LIVMFYSTA]- 
CONSENSUS: x(4)-[LIVMFY]-[PKR]. 

NAME: ABC-2 type transport system integral membrane proteins signature. 

CONSENSUS: [LrMST]-x(2)-[LIMW]-x(2)-[LIMCA]-[GSTC]-x-[GSArV]-x(6)-[LIMGA]-tPGSNQ]- 
CONSENSUS: x(9, 12)-P-[LIMFT]-x-[HRSY]-x(5)-[RQ]. 

NAME: Bacterial extracellular solute-binding proteins, family 1 signature. 

CONSENSUS: [GAP]-[LIVMFA]-[STAVDN]-x(4)-[GSAV]-[LrVMFY](2)-Y-[ND]-x(3)-[LrVMF]-x- 
CONSENSUS: [KNDE], 

NAME: Bacterial extracellular solute-binding proteins, family 3 signature. 

CONSENSUS: G-[FYIL]-[DE]-[UVMT]-[DE]-[LIVMF]-x(3)-[LIVMA]-rYAGC]-x(2)-[LrVMAGN]. 
NAME: Bacterial extracellular solute-binding proteins, family 5 signature. 

CONSENSUS: [AG]-x(6,7)-[DNEG]-x(2)-[STAVE]-[LrVMFYWA]-x-[LrVMFY]-x-[LIVM]-[KR]- 
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CONSENSUS: [KRHDE]-[GDN]-[LIVMA]-[KNGSP]-[FW]. 
NAME: Serum albumin family signature. 

CONSENSUS: [FY]-x(6)-C-C-x(7)-C-[LFY]-x(6)-[LrVMFYW]. 
NAME: Transthyretin signature 1 . 

CONSENSUS: S-K-C-P-L-M-V-K-V-L-D-[AS]-V-R-G. 
NAME: Transthyretin signature 2. 

CONSENSUS: S-P-[FY]-S-[FY]-S-T-T-A-[LIVM]-V-[ST]-x-P. 
NAME: Avidin / Streptavidin family signature. 

CONSENSUS: [DEN]-x(2)-[KR]-[STA]-x(2)-V-G-x-[DN]-x-[FW]-T-[KR] . 

NAME: Eukaryotic cobalamin-binding proteins signature. 
CONSENSUS: [SN]-V-D-T-[GA]-A-[LrVM]-A-x-L-A-rLIVMF]-T-C. 

NAME: Lipocalin signature. 

CONSENSUS: [DENGJ-x-[DENQGSTARK]-x(0,2)lDENQARK]-[LIVFY]-{CP}-G-{C}-W-[FYWLRH]-x- 
CONSENSUS: [LIVMTA], 

NAME: Cytosolic fatty-acid binding proteins signature. 

CONSENSUS: [GSAIVK]-x-[FYW]-x-[LIVMF]-x(4)-[NHG]-[FY]-[DE]-x-[LIVMFY]-[LIVM]-x(2)- 
CONSENSUS: [LIVMAKR]. 

NAME: Acyl-CoA-binding protein signature. 

CONSENSUS: P-[STA]-x-[DEN]-x-[LIVMF]-x(2)-[L[VMFY]-Y-[GSTA]-x-[FY]-K-Q-[STA](2)-x-G. 
NAME: LBP / BPI / CETP family signature. 

CONSENSUS: [PA]-[GA]-[LrVMC]-x(2)-R-[IV]-[ST]-x(3)-L-x(5)-[EQ]-x(4)-[LIVM]-[EQK]- 
CONSENSUS: x(8)-P. 

NAME: Phosphatidylethanolamine-binding protein family signature. 
CONSENSUS: [FY]-x-[LIVMF](3)-x-[DC]-P-D-x-P-[SN]-x(10)-H. 

NAME: Plant lipid transfer proteins signature. 

CONSENSUS: [LIVM]-[PA]-x(2)-C-x-[LIVM]-x-[LIVM]-x-[LrVMFY)-x-[LIVM]-[ST]-x(3)- 
CONSENSUS: [DN]-C-x(2)-[LIVM], 

NAME: Uteroglobin family signature 1. 

CONSENSUS: [GA]-x(3)-I-C-P-x-[LIVMF]-x(3)-[LIVM]-[DE]-x-tLrVMF](2). 
NAME: Uteroglobin family signature 2. 

CONSENSUS: [DEQ]-x(4)-[SN]-x(5)-[DEQ]-x-I-x(2)-S-[PSE]-[LS]-C . 

NAME: Mitochondrial energy transfer proteins signature. 

CONSENSUS: P-x-[DE]-x-[LIVAT]-[RK]-x-[LRH]-[LIVMFY]-[QMAIGV]. 

NAME: Sugar transport proteins signature 1. 

CONSENSUS: [LrVMSTAG]-[LIVMFSAG]-x(2)-[LIVMSA]-[DE]-x-[LIVMFYWA]-G-R-[RK]-x(4,6)- 
CONSENSUS: [GSTA]. 

NAME: Sugar transport proteins signature 2. 

CONSENSUS: [LIVMF]-x-G-[LIVMFA]-x(2)-G-x(8)-[LlFY]-x(2)-[EQ]-x(6)-[RK]. 

NAME: LacY family proton/sugar symporters signature 1. 

CONSENSUS: G-[LIVM](2)-x-D-[RK]-L-G-L-[RK](2)-x-[LIVM](2)-W. 

NAME: LacY family proton/sugar symporters signature 2. 

CONSENSUS: P-x-[LIVMF](2)-N-R-[LrVM]-G-x-K-N-[STA]-[LIVM](3). 

NAME: PTR2 family proton/oligopeptide symporters signature 1. 

CONSENSUS: [GA]-[GAS]-[LIVMFYWA]-[LIVM]-[GASl-D-x-[LIVMFYWT]-[LrVMFYW]-G-x(3)-[TAV]- 
CONSENSUS: [IV]-x(3)-[GSTAV]-x-[LIVMF]-x(3)-[GA]. 

NAME: PTR2 family proton/oligopeptide symporters signature 2. 

CONSENSUS: [FYT]-x(2)-[LMFY]-[FYV]-[UVMFYWA]-x-aVG]-N-[LrVMAG]-G-[GSA]-[LIMF]. 
NAME: Amiloride-sensitive sodium channels signature. 

CONSENSUS: Y-x(2)-[EQTF]-x-C-x(2)-[GSTDNL]-C-x-[QT]-x(2)-[LIVMTJ-[LIVMS]-x(2)-C-x-C. 
NAME: Sodium:alanine symporter family signature. 

CONSENSUS: G-G-x-[GA](2)-[LrVM]-F-W-M-W-[LIVM]-x-[STAV]-[LrVMFA](2)-G. 
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NAME: Sodium:dicarboxylate symporter family signature 1. 

CONSENSUS: P-x(0,l)-G-[DE]-x-[LIVMF](2)-x-[LIVM](2)-[KREQ]-[LIVM](3)-x-P. 
NAME: Sodium:dicarboxylate symporter family signature 2. 

CONSENSUS: P-x-G-x-[STA]-x-[NT]-[LrVMC]-D-G-[STAN]-x-[LIVM]-[FY]-x(2)-[LIVM]-x(2)- 
CONSENSUS: [LIVM]-[FY]-[LI]-[SA]-Q. 

NAME: Sodium:galactoside symporter family signature. 

CONSENSUS: D-x(3)-G-x(3)-[DN]-x(6,8)-G-[KH]-F-[KR]-P-[FYW]-[LlVM](2)-x-[GSTA](2). 

NAME: Sodiummeurotransmitter symporter family signature 1. 
CONSENSUS: W-R-F-[GP]-Y-x(4)-N-G-G-G-x-[FY] . 

NAME: Sodium:neurotransmitter symporter family signature 2. 

CONSENSUS: Y-[LIVMFYJ-x(2)-[SC]-[LIVMFY]-[STQ]-x(2)-L-P-W-x(2)-C-x(4)-N-[GST]. 
NAME: Sodium:solute symporter family signature 1. 

CONSENSUS: [GS]-x(2)-[LIY]-x(3)-[LrVMFYWSTAGJ(10)[LrY]-[TAV]-x(2)-G-G-[LMF]-x- 
CONSENSUS: [SAP]. 

NAME: Sodium:solute symporter family signature 2. 

CONSENSUS: [GAST]-[LIVM]-x(3)-[KR]-x(4)-G-A-x(2)-[GAS]-[LlVMGS]-[LrVMW]-[LIVMGAT]-G- 
CONSENSUS: x-[LIVMG]. 

NAME: Sodium: sulfate symporter family signature. 

CONSENSUS: [STACP]-S-x(2)-F-x(2)-P-[LtVM]-[GSA]-x(3)-N-x-[LIVM]-V. 

NAME: glpT family of transporters signature. 
CONSENSUS: R-G-x(5)-W-N-x(2)-H-N-x-G-G. 

NAME: Ammonium transporters signature. 

CONSENSUS: D-[FYWS]-A-G-[GSC]-x(2)-[IV]-x(3)-[SAG](2)-x(2)-[SAG]-[LIVMF]-x(3)- 
CONSENSUS: [LIVMFYWA](2)-x-[GK]-x-R. 

NAME: BCCT family of transporters signature. 
CONSENSUS: [GSDN]-W-T-[LIVM]-x-[FY]-W-x-W-W. 

NAME: Flagellar motor protein motA family signature. 
CONSENSUS: A-[LMF]-x-[GAT]-T-[LrVF]-x-G-x-[LIVMF]-x(7)-P. 

NAME: Formate and nitrite transporters signature 1 . 

CONSENSUS: [LIVMA]-[LIVMY]-x-G-[GSTA]-[DES]-L-[FI]-ITN]-[GS]. 

NAME: Formate and nitrite transporters signature 2. 
CONSENSUS: [GA]-x(2)-[CA]-N-[LIVMFYW](2)-V-C-[LV]-A 

NAME: Prokaryotic sulfate-binding proteins signature 1 . 
CONSENSUS: K-x-[NQEK]-[GT]-G-[DQ]-x-[LIVM]-x(3)-Q-S. 

NAME: Prokaryotic sulfate-binding proteins signature 2. 
CONSENSUS: N-P-K-[ST]-S-G-x-A-R. 

NAME: Sulfate transporters signature. 

CONSENSUS: P-x-Y-[GS]-L-Y-[STAG](2)-x(4)-[LlVMFY]{3)-x(3)-[GSTA](2)-S-[I{R]. 
NAME: Amino acid permeases signature. 

CONSENSUS: [STAGC]-G-[PAG]-x(2,3)-[LIVMFYWA](2)-x-[LrVMFYW]-x-[LiVMFWSTAGC](2)- 
CONSENSUS: [STAGC]-x(3)-[LIVMFYW]-x-[LlVMST]-x(3)-[LIMCTA]-[GA]-E-x(5)-[PSAL]. 

NAME: Aromatic amino acids permeases signature. 

CONSENSUS: I-G-[GA]-G-M-[LF]-[SA]-x-P-x(3)-[SAl-G-x(2)-F. 

NAME: Xanthine/uracil permeases family signature. 

CONSENSUS: [LrVM]-P-x-[PASIF]-V-[LIVM]-G-G-x(4)-[LIVM]-[FY]-[GSA]-x-[L]VM]-x(3)-G. 

NAME: Anion exchangers family signature 1. 

CONSENSUS: F-G-G-[LIVM](2)-[KR]-D-[LIVM]-[RK]-R-R-Y. 

NAME: Anion exchangers family signature 2. 
CONSENSUS: [FTJ-L-I-S-L-I-F-l-Y-E-T-F-x-K-L. 

NAME: MIP family signature. 

CONSENSUS: [HNQA]-x-N-P-[STA]-[LIVMF]-[STl-[LrVMF]-[GSTAFY]. 



1064 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 PCT/IB00/01496 
NAME: General diffusion Gram-negative porins signature. 

CONSENSUS: [LrVMFYJ-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SN]-[STAV]-[LIVMFYW]-V. 
NAME: OmpA-like domain. 

CONSENSUS: [LIVMA]-x-[GT]-x-[TA]-[DA]-x(2)-[DG]-[GSTP]-x(2)-[LFYDE]-[NQS]-x(2)- 
CONSENSUS: [LI]-[SG]-[QE]-[KRQE]-R-A-x(2)-|LV]-x(3)-[LiVMF]-x(4,5)-[LIVM]-x(4)- 
CONSENSUS: [LIVM]-x(3)-[SG]-x-G. 

NAME: Eukaryotic mitochondrial porin signature. 

CONSENSUS: [YH]-x(2)-D-[SPA]-x-[STA]-x(3)-[TAG]-[KR]-[LIVMF]-[DNSTA]-[DNS]-x(4)- 
CONSENSUS: [GSTANJ-[LIVMAJ-x-[LIVMY]. 

NAME: Insulin-like growth factor binding proteins signature. 
CONSENSUS: G-C-[GS]-C-C-x(2)-C-A-x(6)-C. 

NAME: GPRl/FUN34/yaaH family signature. 
CONSENSUS: N-P-[AV]-P-[LF]-G-L-x-[GSA]-F. 

NAME: GNS1/SUR4 family signature. 
CONSENSUS: L-x-F-L-H-x-Y-H-H. 

NAME: 43 Kd postsynaptic protein signature. 
CONSENSUS: G-Q-D-Q-T-K-Q-Q-I. 

NAME: Actins signature 1 . 

CONSENSUS: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQ](2)-G. 
NAME: Actins signature 2. 

CONSENSUS: W-[IV]-[STAl-[RK]-x-[DE]-Y-[DNE]-[DE] . 
NAME: Actins and actin-related proteins signature. 

CONSENSUS: [LM]-[LIVM]-T-E-[GAPQ]-x-[LlVMFYWHQ]-N-[PSTAQ]-x(2)-N-[KR]. 
NAME: Annexins repeated domain signature. 

CONSENSUS: [TG]-[STV]-x(8)-[LIVMF]-x(2)-R-x(3)-[DEQNH]-x(7)-[IFY]-x(7)-[LIVMFl- 
CONSENSUS: x(3)-[LIVMF]-x(l l)-[LIVMFA]-x(2)-[LIVMF]. 

NAME: Caveolins signature. 
CONSENSUS: F-E-D-V-I-A-E-P. 

NAME: Clathrin light chain signature 1 . 
CONSENSUS: F-L-A-Q-Q-E-S. 

NAME: Clathrin light chain signature 2. 

CONSENSUS: [KR]-D-x-S-tKR]-[LIVM]-[KR]-x-[LlVM](3)-x-L-K. 

NAME: Clusterin signature 1 . 
CONSENSUS: C-K-P-C-L-K-x-T-C. 

NAME: Clusterin signature 2. 

CONSENSUS: C-L-[RK]-M-[RK]-x-[EQ]-C-[ED]-K-C. 
NAME: Connexins signature 1 . 

CONSENSUS: C-[DN]-T-x-Q-P-G-C-x(2)-V-C-Y-D. 
NAME: Connexins signature 2. 

CONSENSUS: C-x(3,4)-P-C-x(3)-[LrVM]-[DEN]-C-[FY]-[LIVM]-[SA]-[KR]-P. 
NAME: Crystallins beta and gamma 'Greek key' motif signature. 

CONSENSUS: [LIVMFYWA]-x-{DEHRKSTP}-[FY]-[DEQHKY]-x(3)-[FY]-x-G-x(4)-[LIVMFCST]. 
NAME: Dynamin family signature. 

CONSENSUS: L-P-[RK]-G-[STN]-[GN]-[LIVM]-V-T-R. 

NAME: Dynein light chain type 1 signature. 

CONSENSUS: H-x-I-x-G-[KR]-x-F-[GA]-S-x-V-[ST]-[HY]-E. 

NAME: FtsZ protein signature 1. 

CONSENSUS: N-[STJ-D-x-Q-x-L-x(16,18)-G-x-G-[ATV]-G-[GSAN]-x-P-x(2)-G. 
NAME: FtsZ protein signature 2. 

CONSENSUS: [DNHKR]-[LrVMF]-x-[LrVMF](2)-rVSTAC]-[STAC]-G-x-G-[GK]-G-T-G-[STJ-G- 
CONSENSUS: [GSAR]-[STA]-P-[LIVMFT]-[LrVMF]-[SGAV]. 
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NAME: Fungal hydrophobins signature. 

CONSENSUS: [GN]-[DNQPSA]-x-C-[GSTANK]-[GSTADNQ]-[STNQI]-[PTIV]-x-C-C-[DENQKPST]. 

NAME: Intermediate filaments signature. 

CONSENSUS: [IV]-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE]. 

NAME: Involucrin signature. 

CONSENSUS: <M-S-[QH]-Q-x-T-[LV]-P-V-T-[LV]. 
NAME: Kinesin motor domain signature. 

CONSENSUS: [GSA]-[KRHPSTQVM]-[LIVMF]-x-[LIVMFHIVCl-D-L-[AH]-G-rSAN]-E. 
NAME: Kinesin motor domain profile. 
NAME: Kinesin light chain repeat. 

CONSENSUS: [DEQR]-A-L-x(3)-[GEQ]-x(3)-G-x-[DNS]-x-P-x-V-A-x(3)-N-x-L-[ASJ- 
CONSENSUS: x(5)-[QR]-x-[KR]-[Fy|-x(2)-[AV]-x(4)-[HKNQ]. 

NAME: Myelin basic protein signature. 
CONSENSUS: V-V-H-F-F-K-N. 

NAME: Myelin P0 protein signature. 

CONSENSUS: S-[KR]-S-x-K-[AGJ-x-[SA]-E-K-K-[STA]-K. 

NAME: Myelin proteolipid protein signature 1. 
CONSENSUS: G-[MV]-A-L-F-C-G-C-G-H. 

NAME: Myelin proteolipid protein signature 2. 

CONSENSUS: C-x-[STl-x-lDE]-x(3)-[ST]-[FY]-x-L-[FYj-I-x(4)-G-A. 

NAME: Neuromodulin (GAP-43) signature 1. 
CONSENSUS: <M-L-C-C-[LIVM]-R-R. 

NAME: Neuromodulin (GAP-43) signature 2. 
CONSENSUS: S-F-R-G-H-I-x-R-K-K-rUVM], 

NAME: Osteopontin signature. 

CONSENSUS: [KQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K. 

NAME: Peripherin / rom-1 signature. 

CONSENSUS: D-[GS]-V-P-F-[ST]-C-C-N-P-x-S-P-R-P-C. 

NAME: Profilin signature. 

CONSENSUS: <x(0,l)-[STA]-x(0,l)-W-[DENQH]-x-[YI]-x-IDEOJ. 

NAME: Surfactant associated polypeptide SP-C palmitoylation sites. 
CONSENSUS: I-P-C-C-P-V. 

NAME: Synapsins signature 1 . 
CONSENSUS: L-R-R-R-L-S-D-S. 

NAME: Synapsins signature 2. 
CONSENSUS : G-H-A-H-S-G-M-G-K-V-K. 

NAME: Synaptobrevin signature. 

CONSENSUS: N-[LIVM]-[DENS]-rKL]-V-x-[DEQ]-R-x(2)-[KR]-[LIVM]-[STDE]-x-[LIVM]-x-[DE]- 
CONSENSUS: [KR]-[TA]-[DE]. 

NAME: Synaptophysin / synaptoporin signature. 
CONSENSUS: L-S-V-[DE]-C-x-N-K-T. 

NAME: Tropomyosins signature. 
CONSENSUS: L-K-E-A-E-x-R-A-E. 

NAME: Tubulin subunits alpha, beta, and gamma signature. 
CONSENSUS: [SAG]-G-G-T-G-[SA]-G. 

NAME: Tubulin-beta mRNA autoregulation signal. 
CONSENSUS: <M-R-[DE]-[IL]. 

NAME: Tau and MAP proteins tubulin-binding domain signature. 
CONSENSUS: G-S-x(2)-N-x(2)-H-x-[PA]-[AG]-G(2). 

NAME: Neuraxin and MAP1B proteins repeated region signature. 
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CONSENSUS: [STAGDN]-Y-x-Y-E-x(2)-[DE]-[KR]-[STAGCI]. 

NAME: F-actin capping protein alpha subunit signature 1. 
CONSENSUS: V-H-[FY](2)-E-D-G-N-V. 

NAME: F-actin capping protein alpha subunit signature 2. 
CONSENSUS: F-K-[AE]-L-R-R-x-L-P. 

NAME: F-actin capping protein beta subunit signature. 
CONSENSUS: C-D-Y-N-R-D. 

NAME: Vinculin family talin-binding region signature. 

CONSENSUS: [KR]-x-[LIVMF]-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q-E-L. 

NAME: Vinculin repeated domain signature. 
CONSENSUS: [LIVM]-x-[QA]-A-x(2)-W-[IL]-x-[DN]-P. 

NAME: Amyloidogenic glycoprotein extracellular domain signature. 
CONSENSUS: G-[VT]-E-[FY]-V-C-C-P. 

NAME: Amyloidogenic glycoprotein intracellular domain signature. 
CONSENSUS: G-Y-E-N-P-T-Y-[KR]. 

NAME: Cadherins extracellular repeated domain signature. 
CONSENSUS: [LrVl-x-[LrV]-x-D-x-N-D-[NH]-x-P. 

NAME: Insect cuticle proteins signature. 

CONSENSUS: G-x(7)-[DEN]-G-x(6)-Y-x-A-[DNG]-x(2,3)-G-[FY]-x-[AP]. 
NAME: Gas vesicles protein GVPa signature 1. 

CONSENSUS: [LIVM]-x-[DE]-[LIVMFYT]-[LIVM]-[DE]-x-[LrVM](2)-[DKR](2)-G-x-[LIVM](2). 

NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R-[LIVA](3)-A-[GSl-[LrVMFY]-x-T-x(3)-Y-fAG]. 

NAME: Gas vesicles protein GVPc repeated domain signature. 
CONSENSUS: F-L-x(2)-T-x(3)-R-x(3)-A-x(2)-Q-x(3)-L-x(2)-F. 

NAME: Bacterial microcompartiments proteins signature. 

CONSENSUS: D-x(0,l)-M-x-K-[SAG](2)-x-[IV]-x-[LrVM]-[LrVMA]-[GCS]-x(4)-[GD]-[SGPD]- 
CONSENSUS: [GA]. 

NAME: Flagella basal body rod proteins signature. 

CONSENSUS: [GTARYQl-x(9)-[LIVMYSTA](2)-[GSTA]-[STADEN]-N-[LIVM]-[SAN]-N-x-[SADNFR]- 
CONSENSUS: [STV]. 

NAME: Flagella transport protein fliP family signature 1. 

CONSENSUS: [PA]-A-[FY]-x-[LIVT]-[STH]-[EQ]-[LI]-x(2)-[GA]-F-[KREQ]-[IM]-G-[LIF|. 

NAME: Flagella transport protein fliP family signature 2. 
CONSENSUS: P-[LrVMF]-K-[LIVMF](5)-x-[LIVMA|-[DNGSJ-G-W. 

NAME: Plant viruses icosahedral capsid proteins 'S' region signature. 

CONSENSUS: [FYW]-x-[PSTA]-x(7)-G-x-[LrVM]-x-[LIVMl-x-[FYWl]-x(2)-D-x(5)-P. 

NAME: Potexviruses and carlaviruses coat protein signature. 

CONSENSUS: [RK]-[FYW]-A-[GAP]-F-D-x-F-x(2)-[LV]-x(3)-[GAST](2). 

NAME: Neurotransmitter-gated ion-channels signature. 
CONSENSUS: C-x-[LIVMFQ]-x-[LIVMF]-x(2)-[FY]-P-x-D-x(3)-C. 

NAME: ATP P2X receptors signature. 

CONSENSUS: G-G-x-[LIVM]-G-[LrVM]-x-[TV]-x-W-x-C-[DN]-L-D-x(5)-C-x-P-x-Y-x-F. 
NAME: G-protein coupled receptors signature. 

CONSENSUS: [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LrVMNQGA]-x(2)-[LrVMFT]- 
CONSENSUS: [GSTANC]-[LrVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)-[LIVM]. 

NAME: G-protein coupled receptors family 2 signature 1. 

CONSENSUS: C-x(3)-[FYWLIV]-D-x(3,4)-C-[FW]-x(2)-[STAGV]-x(8,9)-C-[PF]. 
NAME: G-protein coupled receptors family 2 signature 2. 

CONSENSUS: Q-G-[LMFCA]rj J IVMFn-[LIV]-x-[LrVFST]-[LIF]-[VFYH]-C-[LFY]-x-N-x(2)-V. 
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NAME: G-protein coupled receptors family 3 signature 1. 

CONSENSUS: [LV]-x-N-[LIVM](2)-x-L-F-x-I-[PA]-Q-[LIVM]-[STA]-x-[STA](3)-[STAN]. 
NAME: G-protein coupled receptors family 3 signature 2. 

CONSENSUS: C-C-[FYW]-x-C-x(2)-C-x(4)-[FYW]-x(2,4)-pN]-x(2)-[STAH]-C-x(2)-C. 

NAME: G-protein coupled receptors family 3 signature 3. 
CONSENSUS: F-N-E-[STA]-K-x-I-[STAG]-F-[STJ-M. 

NAME: Visual pigments (opsins) retinal binding site. 

CONSENSUS: [LIVMWAC]-[PGAC]-x(3)-[SAC]-K-[STALIMR]-[GSACPNV]-[STACP]-x(2)-[DENF]- 
CONSENSUS: [AP]-x(2)-[IY]. 

NAME: Bacterial rhodopsins signature 1. 

CONSENSUS: R-Y-x-[DT]-W-x-[LIVMF]-[ST]-T-P-[LIVM](3). 
NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: [FYIV]-x-[FYVG]-[UVM]-D-[LIVMF]-x-[STA]-K-x(2)-[FYl. 

NAME: Receptor tyrosine kinase class II signature. 
CONSENSUS: [DN]-[LIV]-Y-x(3)-Y-Y-R. 

NAME: Receptor tyrosine kinase class III signature. 
CONSENSUS: G-x-H-x-N-[LIVM]-V-N-L-L-G-A-C-T. 

NAME: Receptor tyrosine kinase class V signature 1. 

CONSENSUS: F-x-[DN]-x-[GAW]-[GA]-C-[LrVM]-[SAl-[LIVM](2)-[SA]-[LV]-[KRHQ]-[LIVA]- 
CONSENSUS: x(3)-[KR]-C-[PSAW]. 

NAME: Receptor tyrosine kinase class V signature 2. 

CONSENSUS: C-x(2)-[DE]-G-[DEQ]-W-x(2,3)-[PAQ]-[LIVMTl-[GT]-x-C-x-C-x(2)-G-[HFY]- 
CONSENSUS: |EQJ. 

NAME: Growth factor and cytokines receptors family signature 1 . 
CONSENSUS: C-[LVFYR]-x(7,8)-[STIVDN]-C-x-W. 

NAME: Growth factor and cytokines receptors family signature 2. 
CONSENSUS: [STGL]-x-W-[SG]-x-W-S. 

NAME: TNFR/NGFR family cysteine-rich region signature. 

CONSENSUS: C-x(4,6)-[FYH]-x(5, 10)-C-x(0,2)-C-x(2,3)-C-x(7,l l)-C-x(4,6)-[DNEQSKP]- 
CONSENSUS: x(2)-C. 

NAME: TNFR/NGFR family cysteine-rich region domain. 

NAME: Integrins alpha chain signature. 
CONSENSUS: [FYWS]-[RK]-x-G-F-F-x-R. 

NAME: Integrins beta chain cysteine-rich domain signature. 
CONSENSUS: C-x-[GNQ]-x(l,3)-G-x-C-x-C-x(2)-C-x-C. 

NAME: Natriuretic peptides receptors signature. 
CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W . 

NAME: Photosynthetic reaction center proteins signature. 

CONSENSUS: [NH]-x(4)-P-x-H-x(2)-[SAG]-x(ll)-[SAGC]-x-H-[SAG](2). 

NAME: Antenna complexes alpha subunits signature. 

CONSENSUS: [LIVFAG]-x-[GASV]-[LIVFA]-x-[IV]-H-x(3)-[LIVM]-[GSTAE]-[STANH]-x(l,3)- 
CONSENSUS: [STN]-W-[LIVMFYW]. 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: [EOJ-x(4)-H-x(5)-[GSTA]-x(3)-[FY]-x(3)-[AG]-x(2)-[AV]-H-x(7)-P. 

NAME: Photosystem I psaA and psaB proteins signature. 
CONSENSUS: C-D-G-P-G-R-G-G-T-C. 

NAME: Photosystem I psaG and psaK proteins signature. 

CONSENSUS: G-F-x-[LIVM]-x-[DEA]-x(2)LGAJ-x-[GTA]-[SA]-x-G-H-x-[LIVM]-[GA]. 

NAME: Phytochrome chromophore attachment site signature. 
CONSENSUS: [RGS]-[GSAl-[PV]-H-x-C-H-x(2)-Y. 

NAME: Phytochrome chromophore attachment site domain profile. 
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NAME: Speract receptor repeated domain signature. 

CONSENSUS: G-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G. 
NAME: TonB-dependent receptor proteins signature 1 . 

CONSENSUS: <x(10,l 15)-[DENF]-[ST]-[UVMF]-[LIVSTEOJ-V-x-[AGP]-[STANEQPK]. 

NAME: TonB-dependent receptor proteins signature 2. 

CONSENSUS: [LYGSTANE]-x(3)-[GSTAENQ]-x-[PGE]-R-x-[LIVFYWA]-x-[LlVMFTA]-[STAGNQ]- 
CONSENSUS: [LIVMFYGTA]-x-[LIVMFYWGTADQ]-x-F > . 

NAME: Transmembrane 4 family signature. 

CONSENSUS: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF](2)-G-C-x-[GA]-[STA]-x(2)-[EG]-x(2)- 
CONSENSUS: [CWN]-[LIVM](2). 

NAME: Bacterial chemotaxis sensory transducers signature. 

CONSENSUS: R-T-E-[EQ]-Q-x(2)-[SA]-[LIVM]-x-[EQ]-T-A-A-S-M-E-Q-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1 . 
CONSENSUS: G-I-S-x-[KR]-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y. 

NAME: ER lumen protein retaining receptor signature 2. 
CONSENSUS: L-E-[SA]-V-A-I-[LM]-P-Q-L. 

NAME: Ephrins signature. 

CONSENSUS: [KRQ]-[LF]-[CSTl-x-K-[IF]-Q-x-[FY]-[ST]-[PA]-x(3)-G-x-E-F-x(5)-[FYl(2)- 
CONSENSUS: x(2)-[SA], 

NAME: Granulins signature. 

CONSENSUS : C-x-D-x(2)-H-C-C-P-x(4)-C . 

NAME: HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-[STAGP]-x(6,7)-[DE]-C-x-[FM]-x-E-x(6)-Y. 
NAME: PTN/MK heparin-binding protein family signature 1. 

CONSENSUS: S-[DE]-C-x-[DE]-W-x-W-x(2)-C-x-P-x-[SN]-x-D-C-G-[LIVMA]-G-x-R-E-G. 
NAME: PTN/MK heparin-binding protein family signature 2. 

CONSENSUS: C-[KR]-[LIVM]-P-C-N-W-K-K-x-F-G-A-[DE]-C-K-Y-x-F-fEQJ-x-W-G-x-C. 

NAME: Nerve growth factor family signature. 

CONSENSUS: G-C-(KR]-G-[LIV]-[DE]-x(3)-[YW]-x-S-x-C. 

NAME: Platelet-derived growth factor (PDGF) family signature. 
CONSENSUS: P-[PS]-C-V-x(3)-R-C-[GSTA]-G-C-C 

NAME: Small cytokines (intercrine/chemokine) C-x-C subfamily signature. 

CONSENSUS: C-x-C-[LIVM]-x(5,6)-[LIVMFY]-x(2)-[RKSEQ]-x-[LIVM]-x(2HLiVM]-x(5)- 

CONSENSUS: [SAG]-x(2)-C-x(3)-[EQ]-[LIVM](2)-x(9,10)-C-L-[DN]. 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily signature. 

CONSENSUS: C-C-[LIFYT]-x(5,6)-[LI]-x(4)-[LrVMF]-x(2)-[FYW]-x(6,8)-C-x(3,4)-[SAG]- 

CONSENSUS: [LIVM](2)-[FL]-x(8)-C-[STA], 

NAME: TGF-beta family signature. 

CONSENSUS: [LIVM]-x(2)-P-x(2)-[FY]-x{4)-C-x-G-x-C . 
NAME: TNF family signature. 

CONSENSUS: [LV]-x-[LIVM]-x(3)-G-[LIVMF]-Y-[LrVMFY](2)-x(2)-[QEKHL]-[LrVMGT]-x- 
CONSENSUS: [LrVMFY] . 

NAME: TNF family profile. 

NAME: Wnt-1 family signature. 

CONSENSUS: C-K-C-H-G-[LIVMT]-S-G-x-C. 

NAME: Interferon alpha, beta and delta family signature. 

CONSENSUS: [FYH]-[FY]-x-[GNRC]-[LIVM]-x(2)-[FY]-L-x(7)-[CY]-A-W. 

NAME: Granulocyte-macrophage colony-stimulating factor signature. 
CONSENSUS: C-P-[LP]-T-x-E-[STJ-x-C. 

NAME: Interleukin-1 signature. 

CONSENSUS: [FC]-x-S-[ASLV]-x(2)-P-x(2)-[FYLIV]-[Lrj-[SCA]-T-x(7)-[LrVM]. 
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NAME: Interleukin-2 signature. 

CONSENSUS: T-E-[LF]-x(2)-L-x-C-L-x(2)-E-L. 

NAME: Interleukins -4 and -13 signature. 

CONSENSUS: L-x-E-[LIVM](2)-x(4,5)-[LIVM]-(TL]-x(5,7)-C-x(4)-[IVA]-x-[DNS]-[LIVMA]. 

NAME: Interleukin-6 / G-CSF / MGF signature. 
CONSENSUS: C-x(9)-C-x(6)-G-L-x(2)-[FY]-x(3)-L. 

NAME: Interleukin-7 and -9 signature. 
CONSENSUS: N-x-[LAP]-[SCT]-F-L-K-x-L-L. 

NAME: Interleukin- 10 family signature. 

CONSENSUS: [GS]-C-x(2)-[LV]-x(2)-[UVM](2)-x-F-Y-L-x(2)-V. 
NAME: LIF / OSM family signature. 

CONSENSUS: [PST]-x(4)-F-[NQ]-x-K-x(3)-C-x-[LF]-L-x(2)-Y-[HK]. 

NAME: Macrophage migration inhibitory factor family signature. 
CONSENSUS: [DE]-P-C-A-x(3)-[LIVM]-x-S-I-G-x-[LIVM]-G. 

NAME: Adipokinetic hormone family signature. 
CONSENSUS: Q-[LV]-[NT]-[FY]-[ST]-x(2)-W. 

NAME: Bombesin-like peptides family signature. 
CONSENSUS: W-A-x-G-[SH]-[LF]-M. 

NAME: Calcitonin / CGRP / IAPP family signature. 

CONSENSUS: C-[SAGDN]-[STN]-x(0, l)-[SA]-T-C-[VMA]-x(3)-[LYF]-x(3)-[LYF]. 
NAME: Corticotropin-releasing factor family signature. 

CONSENSUS: [PQ]-x-[LIVM]-S-[LrVM]-x(2)-[PST]-[LiVMF]-x-[LIVM]-L-R-x(2)-[LIVM]. 

NAME: Crustacean CHH/MIH/GIH neurohormones family signature. 
CONSENSUS: C-[DENK]-D-C-x-N-[LIV]-[FY]-R-x(7)-C-[KR]-x(2)-C. 

NAME: Erythropoietin / thrombopoeitin signature. 
CONSENSUS: P-x(4)-C-D-x-R-[LIVM](2)-x-[KR]-x(14)-C. 

NAME: Granins signature 1 . 

CONSENSUS: [DE]-[SN]-L-[SAN]-x(2)-[DE]-x-E-L. 
NAME: Granins signature 2. 

CONSENSUS: C-[LIVM](2)-E-tLIVM](2)-S-[DN]-[STA]-L-x-K-x-S-x(3)-[LIVM]-[STA]-x-E-C. 
NAME: Galanin signature. 

CONSENSUS: G-W-T-L-N-S-A-G-Y-L-L-G-P-H . 

NAME: Gastrin / cholecystokinin family signature. 
CONSENSUS: Y-x(0,l)-[GD]-[WHl-M-[DR]-F. 

NAME: Glucagon / GIP / secretin / VIP family signature. 

CONSENSUS: [YH]-[STArVGD]-[DEQ]-[AGF]-[LIVMSTE]-[FYLR]-x-[DENSTAK]-[DENSTAl- 
CONSENSUS: [LIVMFYG]-x(9)-[KREQL]-[KRDENQL]-[LVFYWG]-[LIVQ]. 

NAME: Glycoprotein hormones alpha chain signature 1. 
CONSENSUS: C-x-G-C-C-[FYJ-S-R-A-[FY]-P-T-P. 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-T-x-C-x-C-x-T-C-x(2)-H-K. 

NAME: Glycoprotein hormones beta chain signature 1. 
CONSENSUS: C-[STAGM]-G-[HFYL]-C-x-[ST]. 

NAME: Glycoprotein hormones beta chain signature 2. 

CONSENSUS: [PA]-V-A-x(2)-C-x-C-x(2)-C-x(4)-[STD]-[DEY]-C-x(6,8)-[PGSTAVM]-x(2)-C. 

NAME: Gonadotropin-releasing hormones signature. 
CONSENSUS: Q-H-[FYW]-S-x(4)-P-G. 

NAME: Insulin family signature. 

CONSENSUS: C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[HVMFS]-x(3)-C. 
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NAME: Natriuretic peptides signature. 
CONSENSUS: C-F-G-x(3)-D-R-I-x(3)-S-x(2)-G-C . 

NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-[LIFY](2)-x-N-[CS]-P-x-G. 

NAME: Neuromedin U signature. 
CONSENSUS: F-[LIVMF]-F-R-P-R-N. 

NAME: Endogenous opioids neuropeptides precursors signature. 

CONSENSUS: C-x(3)-C-x(2)-C-x(2)-[KRH]-x(6,7)-[LIF]-[DN]-x(3)-C-x-[LIVM]-[EQ]-C- 
CONSENSUS: [EQ]-x(8)-W-x(2)-C. 

NAME: Pancreatic hormone family signature. 

CONSENSUS: [FY]-x(3)-[LIVM]-x(2)-Y-x(3)-[LIVMFY]-x-R-x-R-[YF]. 

NAME: Parathyroid hormone family signature. 
CONSENSUS: V-S-E-x-Q-x(2)-H-x(2)-G. 

NAME: Pyrokinins signature. 
CONSENSUS: F-[GSTV]-P-R-L-[G > ]. 

NAME: Somatotropin, prolactin and related hormones signature 1. 

CONSENSUS: C-x-[ST]-x(2)-[LIVMFY]-x-[LIVMSTA]-P-x(5)-[TALIV]-x(7)-[LIVMFY]-x(6)- 
CONSENSUS: [LIVMFY]-x(2)-[STA]-W. 

NAME: Somatotropin, prolactin and related hormones signature 2. 

CONSENSUS: C-[LIVMFY]-x(2)-D-[LIVMFYSTA]-x(5)-[LrVMFY]-x(2)-[LiVMFYT]-x(2)-C. 

NAME: Tachykinin family signature. 
CONSENSUS: F-[IVFY1-G-[LM]-M-[G > ]. 

NAME: Thymosin beta-4 family signature. 
CONSENSUS: K-L-K-K-T-E-T-Q-E-K-N . 

NAME: Urotensin II signature. 
CONSENSUS: C-F-W-K-Y-C. 

NAME: Cecropin family signature. 

CONSENSUS: W-x(0,2)-[KDN]-x(2)-K-[KRE]-[LI]-E-[RKN]. 

NAME: Mammalian defensins signature. 
CONSENSUS: C-x-C-x(3,5)-C-x(7)-G-x-C-x(9)-C-C. 

NAME: Arthropod defensins signature. 

CONSENSUS: C-x(2,3)-[HN]-C-x(3,4)-[GR]-x(2)-G-G-x-C-x(4,7)-C-x-C 
NAME: Cathelicidins signature 1. 

CONSENSUS: Y-x-[ED]-x-V-x-[RQ]-A-[LIVMA]-[DQG]-x-[LIVMFY]-N-[EQ], 
NAME: Cathelicidins signature 2. 

CONSENSUS: F-x-[LIVM]-K-E-T-x-C-x(10)-C-x-F-[KR]-[KE]. 

NAME: Endothelin family signature. 
CONSENSUS: C-x-C-x(4)-D-x(2)-C-x(2)-[FY]-C . 

NAME: Plant thionins signature. 
CONSENSUS: C-C-x(5)-R-x(2)-|FY]-x(2)-C. 

NAME: Gamma-thionins family signature. 

CONSENSUS: [KR]-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-x-C-x(5)-C-x(3)-C. 
NAME: Snake toxins signature. 

CONSENSUS: G-C-x(l,3)-C-P-x(8,10)-C-C-x(2)-[PDEN]. 
NAME: Myotoxins signature. 

CONSENSUS: K-x-C-H-x-K-x(2)-H-C-x(2)-K-x(3)-C-x(8)-K-x(2)-C-x(2)-[RK]-x-K-C-C-K-K. 
NAME: Scorpion short toxins signature. 

CONSENSUS: C-x(3)-C-x(6,9)-[GAS]-K-C-[IMQT]-x(3)-C-x-C. 

NAME: Heat-stable enterotoxins signature. 
CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C. 
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NAME: Aerolysin type toxins signature. 
CONSENSUS: [KT]-x(2)-N-W-x(2)-T-[DN]-T. 

NAME: Shiga/ricin ribosomal inactivating toxins active site signature. 

CONSENSUS: [LIVMA]-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-[FY]-[RKNQS]-x-[LIVM]-[EQS]- 
CONSENSUS: x(2)-[LIVMF]. 

NAME: Channel forming colicins signature. 
CONSENSUS: T-x(2)-W-x-P-[LIVMFY](3)-x(2)-E. 

NAME: Hok/gef family cell toxic proteins signature. 

CONSENSUS : [LIVMA](4)-C-[LIVMFA]-T-[LIVMA](2)-x(4)-[LIVM]-x-[RG]-x(2)-L-[CY] . 

NAME: Staphylococcal enterotoxin/Streptococcal pyrogenic exotoxin signature 1 . 
CONSENSUS: Y-G-G-[LlV]-T-x(4)-N. 

NAME: Staphyloccocal enterotoxin/Streptococcal pyrogenic exotoxin signature 2. 
CONSENSUS: K-x(2)-[LIV]-x(4)-[LIV]-D-x(3)-R-x(2)-L-x(5)-[LIVl-Y. 

NAME: Thiol-activated cytolysins signature. 
CONSENSUS: [RK]-E-C-T-G-L-x-W-E-W-W-[RK]. 

NAME: Membrane attack complex components / perforin signature. 
CONSENSUS: Y-x(6)-[FY]-G-T-H-[FY]. 

NAME: Pancreatic trypsin inhibitor (Kunitz) family signature. 
CONSENSUS: F-x(3)-G-C-x(6)-[FY]-x(5)-C. 

NAME: Bowman-Birk serine protease inhibitors family signature. 

CONSENSUS: C-x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-[NDKS]-[DEKRHSTA]-C. 

NAME: Kazal serine protease inhibitors family signature. 
CONSENSUS: C-x(7)-C-x(6)-Y-x(3)-C-x(2,3)-C. 

NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature. 
CONSENSUS: [LIVM]-x-D-x-[EDNTY]-[DG]-[RKHDENQ]-x-[LIVM]-x(5)-Y-x-[LIVM]. 

NAME: Serpins signature. 

CONSENSUS: [LrVMFY]-x-[LIVMFYAC]-[DNQ]-[RKHQS]-[PSTl-F-[LIVMFY]-[LIVMFYC]-x- 
CONSENSUS: [LIVMFAH]. 

NAME: Potato inhibitor I family signature. 

CONSENSUS: [FYW]-P-[EQH]-[LIV]<2)-G-x<2)-[STAGV]-x(2)-A. 

NAME: Squash family of serine protease inhibitors signature. 
CONSENSUS: C-P-x(5)-C-x(2)-D-x-D-C-x(3)-C-x-C. 

NAME: Streptomyces subtilisin-type inhibitors signature. 
CONSENSUS: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L. 

NAME: Cysteine proteases inhibitors signature. 

CONSENSUS: [GSTEQKRV]-Q-[LIVT]-[VAF]-[SAGQ]-G-x-[LIVMNK]-x(2)-[LrVMFY]-x-|LIVMFYA]- 
CONSENSUS: [DENQKRHSIV]. 

NAME: Tissue inhibitors of metalloproteinases signature. 
CONSENSUS: C-x-C-x-P-x-H-P-Q-x-A-F-C. 

NAME: Cereal trypsin/alpha-amylase inhibitors family signature. 

CONSENSUS: C-x(4)-[SAGD]-x(4)-[SPAL]-[LF]-x(2)-C-[RH]-x-[LrVMFY](2)-x(3,4)-C. 

NAME: Alpha-2-macroglobulin family thiolester region signature. 
CONSENSUS: [PG]-x-[GS]-C-[GA}-E-[EQ]-x-[LIVM]. 

NAME: Disintegrins signature. 

CONSENSUS: C-x(2)-G-x-C-C-x-[NQRS]-C-x-[FM]-x(6)-C-[RK] . 

NAME: Lambdoid phages regulatory protein CUI signature. 
CONSENSUS: E-S-x-L-x-R-x(2)-[KR]-x-L-x(4)-[KR](2)-x(2)-[DE]-x-L. 

NAME: Chaperonins cpn60 signamre. 
CONSENSUS: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA] . 

NAME: Chaperonins cpnlO signature. 

CONSENSUS: [LIVMFY]-x-P-[lLT]-x-[DEN]-[KR]-[LrVMFA](3)-[KREQ]-x(8,9)-[SG]-x- 



1072 



12/13/10, EAST Version: 2.4.2.1 



WO 01/12659 



PCT/IB00/01496 



CONSENSUS: [LIVMFY](3). 
NAME: Chaperonins TCP-1 signature 1. 

CONSENSUS: [RKEL]-[ST]-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMF](2). 
NAME: Chaperonins TCP-1 signature 2. 

CONSENSUS: [LIVM}-[TS]-[NK]-D-[GA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)-[LIVM]-x-[LIVM]-x- 
CONSENSUS: [SNH]-[PQH]. 

NAME: Chaperonins TCP-1 signature 3. 

CONSENSUS: Q-[DEK]-x-x-[UVMGTA]-[GA]-D-G-T. 

NAME: Heat shock hsp20 proteins family profile. 

NAME: Heat shock hsp70 proteins family signature 1. 
CONSENSUS : [IV]-D-L-G-T-[ST]-x-[SC] . 

NAME: Heat shock hsp70 proteins family signature 2. 

CONSENSUS: [LIVMF]-[LIVMFY]-[DN]-[LIVMFS]-G-[GSH]-[GS]-[AST]-x(3)-[ST]-[LIVM]- 
CONSENSUS: [LIVMFC]. 

NAME: Heat shock hsp70 proteins family signature 3. 

CONSENSUS: [LrVMY]-x-[LrVMF]-x-G-G-x-[ST]-x-[LIVM]-P-x-[LIVM]-x-[DEQKRSTA]. 

NAME: Heat shock hsp90 proteins family signature. 
CONSENSUS: Y-x-[NQH]-K-[DEJllVA)-F-L-R-[ED]. 

NAME: Chaperonins clpA/B signature 1 . 

CONSENSUS: D-[An-[SGA]-N-[LIVMF](2)-K-[PT]-x-L-x(2)-G. 
NAME: Chaperonins clpA/B signature 2. 

CONSENSUS: R-rLIVMFY]-D-x-S-E-[LrVMFY]-x-E-[KRQ]-x-[STA]-x-[STA]-[KR]-[LIVM]-x-G- 
CONSENSUS: [ST A]. 

NAME: Nt-dnaJ domain signature. 

CONSENSUS: [FY]-x(2)-[LIVMA]-x(3)-[FYWHNT]-[DENQSA]-x-L-x-[DN]-x(3)-[KR]-x(2)-[FYI]. 

NAME: dnaJ domain profile. 

NAME: CXXCXGXG dnaJ domain signature. 

CONSENSUS: C-[DEGSTHKR]-x-C-x-G-x-tGK]-[AGSDM]-x(2)-[GSNKR]-x(4,6)-C-x(2,3)-C-x-G-x-G. 
NAME: grpE protein signature. 

CONSENSUS: [FL]-[DNl-tPHEA]-x(2)-[HM]-x-A-[LIVMTN]-x(16,20)-G-[FY]-x(3)-[DEG]-x(2)- 
CONSENSUS: [LIVM]-[RI]-x-[SA]-x-V-x-(IV]. 

NAME: Bacterial type II secretion system protein C signature. 

CONSENSUS: P-x(6)-F-x(4)-L-x(3)-D-[LIVM]-A-[LIVM]-x-[LIVM]-N-x-[LIVM]-x-L. 
NAME: Bacterial type II secretion system protein D signature. 

CONSENSUS: [GR]-[DEQKG]-[STVM]-[LIVMA](3)-[GA]-G-[LIVMFY]-x(ll)-[LIVM]-P- 
CONSENSUS: [LrVMFYWGS]-[LIVMF|-[GSAE]-x-[L[VM]-P-[LIVMFYW](2)-x(2)-[LV]-F. 

NAME: Bacterial type II secretion system protein E signature. 
CONSENSUS: [LIVM]-R-x(2)-P-D-x-rLIVMl(3)-G-E-[LIVM]-R-D. 

NAME: Bacterial type II secretion system protein F signature 

CONSENSUS: [KRQ]-[LIVMA]-x(2)-[SAIV]-[LIVM]-x-[TY]-P-x(2)-[LIVM]-x(3)-[STAGV]-x(6)- 
CONSENSUS: [LMYJ-x(3)-[LIVMF](2)-P. 

NAME: Bacterial type II secretion system protein N signature. 
CONSENSUS: G-T-L-W-x-G-x(l l>-L-x(4)-W. 

NAME: Bacterial export FHIPEP family signature. 

CONSENSUS: R-[LIVM]-[GSA]-E-V-[GSA]-A-R-F-[STV]-L-D-[GSA]-M-P-G-K-Q-M-[GSA]-I-D- 
CONSENSUS: [GSA]-D. 

NAME: Protein secA signatures. 

CONSENSUS: [rV]-x-[IV]-[SA]-T-[NQ]-M-A-G-R-G-x-D-I-x-L. 
NAME: Protein secY signature 1. 

CONSENSUS: [GST]-[LIVMF](2)-x-[LIVM]-G-[LiVM]-x-P-[LIVMFY](2)-x-[AS]-[GSTQ]- 
CONSENSUS: [LIVMFAT](3)-Q-[L1VMFA](2). 
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NAME: Protein secY signature 2. 

CONSENSUS: [UVMFYW](2)-x-[DE]-x-[LIVMF]-[STN]-x(2)-G-[UVMF]-[GST]-[NST]-G-x-[GST]- 
CONSENSUS: [LIVMF](3). 

NAME: Protein secE/sec61 -gamma signature. 

CONSENSUS: [LIVMFY]-x(2)-[DENQGA]-x(4)-[LIVMTA]-x-[KRV]-x(2)-[KW]-P-x(3)-[SEQ]-x(7)- 
CONSENSUS: [LIVT]-[LIVGA]-[LIVFGAST1. 

NAME: Gram-negative pili assembly chaperone signature. 

CONSENSUS: [LIVMFY]-[APN]-x-[DNS]-[KREQ]-E-[STR]-[LIVMAR]-x-[FYWT]-x-[NC]-[LIVM]- 
CONSENSUS: x(2)-[LIVM]-P-[PAS]. 

NAME: Fimbrial biogenesis outer membrane usher protein signature. 

CONSENSUS: [VL]-[PASQ]-[PAS]-G-[PAD]-[FY]-x-[U]-[DNQSTAP]-[DNH]-[LIVMFY]. 
NAME: SRP54-type proteins GTP-binding domain signature. 

CONSENSUS: P-[LIVM]-x-[FYL]-[LIVMAT]-[GS]-x-[GS]-[EQ]-x(4)-[LIVMF]. 

NAME: Cytochrome c oxidase assembly factor COXlO/ctaB/cyoE signature. 
CONSENSUS: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G. 

NAME: Cyclin-dependent kinases regulatory subunits signature 1. 

CONSENSUS: Y-S-x-[KR]-Y-x-[DE}(2)-x-[FY]-E-Y-R-H-V-x-[LV]-[PT]-[KRP]. 

NAME: Cyclin-dependent kinases regulatory subunits signature 2. 
CONSENSUS: H-x-P-E-x-H-[IV]-L-L-F-[KR]. 

NAME: Pentaxin family signature. 
CONSENSUS: H-x-C-x-[STJ-W-x-[ST]. 

NAME: Immunoglobulins and major histocompatibility complex proteins signature. 
CONSENSUS: [FY]-x-C-x-[VA]-x-H. 

NAME: Prion protein signature 1. 

CONSENSUS : A-G-A-A-A-A-G-A-V- V-G-G-L-G-G-Y. 
NAME: Prion protein signature 2. 

CONSENSUS: E-x-rEDl-x-K-fLIVMl(2)-x-[KR]-rLIVMl(2)-x-rQEl-M-C-x(2)-Q-Y. 
NAME: Cyclins signature. 

CONSENSUS: R-x(2)-[LIVMSA]-x(2)-tFYWS]-[LrVMl-x(8)-[LlVMFC]-x(4)-rLlVMFYAl-x(2)- 
CONSENSUS: [STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LIVMFY]-D-[RKH]-[LIVMFYW] . 

NAME: Proliferating cell nuclear antigen signature 1. 

CONSENSUS: [GA]-[LIVMF]-x-[LIVMA]-x-[SAV]-[Lr^M]-D-x-[NSAE]-[HKR]-[VI]-x-tLY]- 
CONSENSUS: [VGA]-x-[LIVM]-x-[LIVM]-x(4)-F. 

NAME: Proliferating cell nuclear antigen signature 2. 

CONSENSUS: [RKA]-C-[DE]-[RH]-x(3)-[LIVMF]-x(3)-[LIVM]-x-[SGAN]-[LrVMF]-x-K- 
CONSENSUS: [LIVMF](2). 

NAME: Actin-depolymerizing proteins signature. 

CONSENSUS: P-[DE]-x-[SA]-x-[LIVMT]-[KR]-x-[KR]-M-[LrVM]-[YA]-[STA](3)-x(3)-[LiVMF]- 
CONSENSUS: [KR]. 

NAME: BCL2-like apoptosis inhibitors (spans part of BH3, BH1 and BH2). 
NAME: Apoptosis regulator, Bcl-2 family BH1 domain signature. 

CONSENSUS: [LVME]-[FT]-x-[GSD]-[GL]-x(l,2)-[NS]-[YW]-G-R-[LrV]-[LrVC]-[GAT]- 
CONSENSUS : [LIVMF](2)-x-F-[GSAE]-[GSARY] . 

NAME: Apoptosis regulator, Bcl-2 family BH2 domain signature. 
CONSENSUS: W-rLIM]-x(3)-[GR]-G-[WQ]-[DENSAV]-x-[FLGA]-[LlVFTC]. 

NAME: Apoptosis regulator, Bcl-2 family BH3 domain signature. 

CONSENSUS: [LIVAT]-x(3)-L-[KARQ]-x-[IVAL]-G-D-[DESG]-[LIMFV]-[DENSHOJ-[LVSHROJ- 
CONSENSUS: [NSR] . 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain signature. 

CONSENSUS: [DS]-[NTJ-R-[AE]-[LI]-V-x-[KD]-[FY]-[LIV]-[GHS]-Y-K-L-[SR]-Q-[RK]-G- 
CONSENSUS: [HY]-x-[CW]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain profile. 
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NAME: Arrestins signature. 

CONSENSUS: [FY]-R-Y-G-x-[DE](2)-x-[DE]-[LIVM](2)-G-[LIVM]-x-F-x-[RK]-[DEQ]-[LiVM]. 
NAME: AAA-protein family signature. 

CONSENSUS: [LIVMT]-x-[LrVMT]-[LIVMF]-x-[GATMC]-[ST]-[NS]-x(4)-[LIVM]-D-x-A-[LIFA]- 
CONSENSUS: x-R. 

NAME: Ubiquitin domain signature. 

CONSENSUS: K-x(2)-[LIVM]-x-[DESAK]-x(3)-[LiVM]-[PA]-x(3)-Q-x-[LIVM]-[LrVMC]- 
CONSENSUS: [LIVMFY]-x-G-x(4)-[DE] . 

NAME: Ubiquitin domain profile. 

NAME: ADP-ribosylation factors family signature. 

CONSENSUS: [HRQT]-x-[FYWI]-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)-[GSA]-[LrVMF]-x- 
CONSENSUS: [WK]-[LIVM]. 

NAME: GTP-binding nuclear protein ran signature. 
CONSENSUS: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE1-G-Y-Y. 

NAME: SARI family signature. 

CONSENSUS: R-x-[LIVM]-E-V-F-M-C-S-[LIVM]{2)-x-[KRQ]-x-G-Y-x-E-[AG]-[FI]-x-W-[LIVM]- 
CONSENSUS: x-Q-Y. 

NAME: Band 7 protein family signature. 

CONSENSUS: R-x(2)-[LIV]-[SAN]-x(6)-[LIV]-D-x(2)-T-x(2)-W-G-[LIV]-[KRH]-[LIV]-x- 
CONSENSUS: [KR]-[LIV]-E-[LIV]-[KR1 . 

NAME: Trp-As'p (WD) repeats signature 

CONSENSUS: [LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]-x(2)- 
CONSENSUS: [LIVMWSTAC]-x-[LIVMFSTAG] W [DENJ [LIVMFSTAGCNJ. 

NAME: G-protein gamma subunit profile. 

NAME: Ras GTPase-activating proteins signature. 

CONSENSUS: [GSN]-x-[LrVMF]-[FY]-[UVMFY]-R-rLIVMFY)(2)-[GACN]-P-[AV]-[LrV](2)- 
CONSENSUS: [SGAN]-P. 

NAME: Ras GTPase-activating proteins profile. 

NAME: Guanine-nucleotide dissociation stimulators CDC24 family signature. 

CONSENSUS: L-x(2)-tLrVMFYW]-L-x(2)-P-[LIVM]-x(2)-[UVM]-x-[KRS]-x(2)-L-x-[UVM]-x- 

CONSENSUS: [DEQ]-[LIVM]-x(3)-(STJ. 

NAME: Guanine-nucleotide dissociation stimulators CDC25 family signature. 
CONSENSUS: [GAP]-[CT]-V-P-[FY]-x(4)-[LIVMFY]-x-[DN]-[LIVM]. 

NAME: MARCKS family signature 1. 
CONSENSUS: G-Q-E-N-G-H-V-[KR]. 

NAME: MARCKS family phosphorylation site domain. 

CONSENSUS: E-T-P-K(5)-x(0,l)-F-S-F-K-K-x-F-K-L-S-G-x-S-F-K-|KR]-[NS]-[KR]-K-E. 

NAME: Stathmin family signature 1. 

CONSENSUS: P-[KQ]-[KR](2)-[DE]-x-S-L-[EG]-E. 

NAME: Stathmin family signature 2. 
CONSENSUS: A-E-K-R-E-H-E-[KR]-E-V. 

NAME: GTP-binding elongation factors signature. 

CONSENSUS: D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]-[IV]-x(2)- 
CONSENSUS: [GSTACKRNQ] . 

NAME: Elongation factor 1 beta/beta '/delta chain signature 1. 
CONSENSUS: [DE]-[DEG]-[DE](2)-[LIVMF]-D-L-F-G. 

NAME: Elongation factor 1 beta/beta '/delta chain signature 2. 
CONSENSUS: V-Q-S-x-D-[LIVM]-x-A-[FWM]-[NOJ-K-[LIVM]. 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1 . 

CONSENSUS: L-R-x(2)-T-[GDQ]-x-[GS]-[LIVMF]-x(0,l)-tDENKAC]-x-K-[KRNEQS]-[AV]-L. 
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NAME: Elongation factor Ts signature 2. 

CONSENSUS: E-[LIVM]-N-[SCV]-[QE]-T-D-F-V-[SA]-[KRN) . 
NAME: Elongation factor P signature. 

CONSENSUS: K-x-A-x(4)-G-x(2)-[LIV]-x-V-P-x(2)-rLIV]-x(2)-G. 
NAME: Eukaryotic initiation factor 1A signature. 

CONSENSUS: [IM]-x-G-x-[GS]-[KRH]-x(4)-[CL]-x-D-G-x(2)-R-x(2)lRHJ-I-x-G. 
NAME: Eukaryotic initiation factor 4E signature. 

CONSENSUS: [DE]-[IFY]-x(2)-F-[KRl-x(2)-[LIVM]-x-P-x-W-E-[DV]-x(5)-G-G-[KR]-W. 

NAME: Eukaryotic initiation factor 5A hypusine signature. 
CONSENSUS: [PTJ-G-K-H-G-x-A-K. 

NAME: Initiation factor 2 signature. 

CONSENSUS: G-x-[LIVM]-x(2)-L-[KR]-[KRHNS]-x-K-x(5)-[LIVM]-x(2)-G-x-[DEN]-C-G. 
NAME: Initiation factor 3 signature. 

CONSENSUS: [KR]-[LIVM](2)-[DN]-[FY]-[GSN]-[KR]-[LrVMFYS]-x-[FY]-[DEQT]-x(2)-[KR]. 

NAME: Translation initiation factor SUI1 signature. 

CONSENSUS: [LIVM]-[EQ]-[LIVM]-Q-G-[DEN]-[KHQ]-[KRV]. 

NAME: Prokaryotic-type class I peptide chain release factors signature. 
CONSENSUS: [AR]-[STA]-x-G-x-G-G-Q-[HNGCS]-V-N-x(3)-[ST]-A-[IV]. 

NAME: Transcription termination factor nusG signature. 
CONSENSUS: [LIVM}-F-G-[KRW] -x-T-P-[I V]-x-[LIVM] . 

NAME: Calponin family repeat. 

CONSENSUS: [LIVM]-x-[LS]-Q-[MAS]-G-[STY]-[NT]-[KRQJ-x(2)-[STN]-Q-x-G-x(3,4)-G. 

NAME: CAP protein signature 1. 

CONSENSUS: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E. 

NAME: CAP protein signature 2. 

CONSENSUS: D-[LrVMFY]-x-E-x-[PA]-x-P-E-Q-(LIVMFY]-K. 
NAME: Calreticulin family signature 1 . 

CONSENSUS: [KRHN]-x-[DEQN]-[DEQNK]-x(3)-C-G-G-[AG]-[FY]-[LIVM]-[KN]-[LIVMFY](2). 

NAME: Calreticulin family signature 2. 
CONSENSUS: [LIVM](2)-F-G-P-D-x-C-[AG|. 

NAME: Calreticulin family repeated motif signature. 

CONSENSUS: [IV]-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DEN] . 

NAME: Calsequestrin signature 1 . 

CONSENSUS: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V. 
NAME: Calsequestrin signature 2. 

CONSENSUS: [DE]-L-E-D-W-[LIVM)-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D. 
NAME: S-100/ICaBP type calcium binding protein signature. 

CONSENSUS: [LIVMFYW](2)-x(2)-[LK]-D-x(3)-[DN]-x(3)-[DNSG]-(FY]-x-[ES]-[FYVC]-x(2)- 
CONSENSUS: [LIVMFS]-[LIVMF]. 

NAME: Hemolysin-type calcium-binding region signature. 
CONSENSUS: D-x-[LI]-x(4)-G-x-D-x-[LIJ-x-G-G-x(3)-D. 

NAME: HlyD family secretion proteins signature. 

CONSENSUS: [LIVM]-x(2)-G-[LM]-x(3)-[STGAV]-x-[LrVMT]-x-[LIVMT]-[GE]-x-[KR]-x- 
CONSENSUS: [LIVMFYW](2)-x-[LIVMFYW](3). 

NAME: P-II protein urydylation site. 
CONSENSUS: Y-[KR]-G-[AS]-[AE]-Y. 

NAME: P-II protein C-terminal region signature. 

CONSENSUS: [ST]-x(3)-G-[DY]-G-[KR]-[IV]-[FW]-[LIVM]-x(2)-[LrVM] . 
NAME: 14-3-3 proteins signature 1. 

CONSENSUS: R-N-L-[LIV]-S-rVG]-[GA]-Y-[KN]-N-[iVA]. 
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NAME: 14-3-3 proteins signature 2. 

CONSENSUS: Y-K-[DE]-S-T-L-I-[IM]-Q-L-[LF]-[RHC]-D-N-[LF]-T-[LS]-W-[TAN]-[SAD]. 

NAME: ATP1G1 / PLM / MAT8 family signature. 

CONSENSUS: [DNS]-x-F-x-Y-D-x(2)-[ST]-[LIVM]-[RQ]-x(2)-G. 

NAME: BTG1 family signature 1 . 

CONSENSUS: Y-x(2)-[HP]-W-[FY]-[AP]-E-x-P-x-K-G-x-[GA]-[FY]-R-C-trV]-[RH]-[IV]. 
NAME: BTG1 family signature 2. 

CONSENSUS: [LV]-P-x-[DE]-[LM]-[ST|-[LlVM]-W-[iV]-D-P-x-E-V-[SC]-x-[RQ]-x-G-E. 
NAME: Cullin family signature. 

CONSENSUS: [LIV]-K-x(2)-[LIV]-x(2)-L-I-[DEQ]-[KRHNQ]-x-Y-[UVM]-x-R-x(6,7)-[FYJ-x- 
CONSENSUS: Y-x-[SA]>. 

NAME: Cullin family profile. 

NAME: Enhancer of rudimentary signature. 

CONSENSUS: Y-D-I-[SA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S. 
NAME: G10 protein signature 1. 

CONSENSUS: L-C-C-x-[KR]-C-x(4)-[DE]-x-N-x(4)-C-x-C-R-V-P. 

NAME: G10 protein signature 2. 

CONSENSUS: C-x-H-C-G-C-[KRH]-G-C-[SA] . 

NAME: Glucokinase regulatory protein family signature. 

CONSENSUS: G-[PA]-E-x-[LIV]-[STA]-G-S-[ST]-R-[LIVM]-K-[STGA](3)-x(2)-K. 
NAME: GTP1/OBG family signature. 

CONSENSUS: D-[LlVM]-P-G-[LIVM](2)-[DEYl-[GN]-A-x(2)-G-x-G. 
NAME: HIT family signature. 

CONSENSUS: [NQA]-x(4)-[GAV]-x-[QF]-x-[LIVM]-x-H-[LIVMFYT]-H-[LIVMFT]-H-[LrVMF](2)- 
CONSENSUS: [PSGA]. 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C-L-[LVl-A-x-A-[LVF]-A. 

NAME: Clathrin adaptor complexes medium chain signature 1. 

CONSENSUS: [IVT]-[GSP]-W-R-x(2,3)-[GAD]-x(2)-[HY]-x(2)-N-x-[LIVMAFY](3)-D-[LrVM]- 
CONSENSUS: [LIVMTJ-E. 

NAME: Clathrin adaptor complexes medium chain signature 2. 
CONSENSUS: [LIV]-x-F-I-P-P-x-G-x-lLIVMFY]-x-L-x(2)-Y. 

NAME: Clathrin adaptor complexes small chain signature. 
CONSENSUS: [LIVM](2)-Y-[KRJ-x(4)-L-Y-F. 

NAME: Ependymins signature 1 . 

CONSENSUS: F-E-E-G-x-[LrVMF]-Y-[ED]-I-D-x(2)-N-[QE]-S-C-[RKH](2). 
NAME: Ependymins signature 2. 

CONSENSUS: [QE]-[LIVMA]-F-x(2)-P-rSTAl-fFY]-C-[DE]-[GA]-[LrVM]-x(2)-[DE](2). 
NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: [RQ]-x(3)-[LIVMA]-x(2)-[LIVM]-[ESH]-x(2)-[LIVMT]-x-[DEVM]-[LrVM]-x(2)- 
CONSENSUS: [LrVM]-tFS]-x(2)-[LIVM]-x(3)-[LIVT]-x(2)-Q-[GADEQ]-x(2)-[LlVM]-[DNQT]-x- 
CONSENSUS: [LIVMF]-[DESV]-x(2)-[LIVM]. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 1. 
CONSENSUS: [GDER]-H-[FYWH]-T-Q-[LIVM](2)-W-x(2)-[STN]. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2. 

CONSENSUS: [LIVMFYH]-[LIVMFY]-x-C-[NQRHS]-Y-x-[PARH]-x-lGLJ-N-[LlVMFYWDN]. 
NAME: Fetuin family signature 1 . 

CONSENSUS: C-x(56)-C-x(10)-C-x(13)-C-x( 17, 18)-C-x(13)-C-x(2)-C-x(58)-C-x( 10, 1 1)- 
CONSENSUS: C-x(10, 12)-C-x(16,22)-C. 

NAME: Fetuin family signature 2. 
CONSENSUS : L-E-T-x-C-H-x-L-D-P-T-P. 
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NAME: Legume lectins beta-chain signature. 
CONSENSUS: [LIV]-[STAG]-V-[DEQV]-[FLr|-D-[ST]. 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LIV]-G-[LF]-[ST]. 
NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: W-[GEK]-x-[EQ]-x-[KRE]-x(3,6)-[PCTFJ-[LIVMF]-[NQEGSKV]-x-[GH]-x(3)- 
CONSENSUS: [DENKHS] -[LIVMFC] . 

NAME: Lysosome-associated membrane glycoproteins duplicated domain signature. 
CONSENSUS: [STA]-C-[LIVM]-[LrVMFYW]-A-x-[LIVMFYW]-x(3)-[LIVMFYW]-x(3)-Y. 

NAME: LAMP glycoproteins transmembrane and cytoplasmic domain signature. 

CONSENSUS: C-x(2)-D-x(3,4)-[LrVM](2)-P-[LIVM]-x-[LIVM]-G-x(2)-[LIVM]-x-G-[LIVM](2)- 

CONSENSUS: x-[LIVM](4)-A-tFY]-x-[LIVM]-x(2)-[KR]-[RH]-x(1,2)-[STAG](2)-Y-[EQ]. 

NAME: Glycophorin A signature. 

CONSENSUS: I-I-x-[GAC]-V-M-A-G-[LIVM](2). 

NAME: PMP-22 / EMP / MP20 family signature 1 . 

CONSENSUS: [LIVMF](4)-[SA]-T-x(2)-[DNKS]-x-W-x(9,13)-[LIVJ W-x(2)-C. 
NAME: PMP-22 / EMP / MP20 family signature 2. 

CONSENSUS: [RQ]-[AV]-x-M-[rV]-L-S-x-[LI]-x(4)-[GSA]-[LrVMF](3). 

NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A. 

NAME: Yeast PIR proteins repeats signature. 

CONSENSUS: S-Q-[IV]-[STGNH]-D-G-Q-[LIV]-Q-[AIV]-[STA] . 

NAME: Seminal vesicle protein I repeats signature. 

CONSENSUS: [IVM]-x-G-Q-D-x-V-K-x(5)-[KN]-G-x(3)-[STLV]. 

NAME: Seminal vesicle protein II repeats signature. 
CONSENSUS: [GSAl-Q-x-K-S-[FY]-x-Q-x-K-[SA] . 

NAME: Serum amyloid A proteins signature. 

CONSENSUS: A-R-G-N-Y-[ED]-A-x-[QKR]-R-G-x-G-G-x-W-A. 

NAME: Spermadhesins family signature 1 . 
CONSENSUS: C-G-x(2)-[LI]-x(4)-G-x-I-x(9)-C-x-W-T. 

NAME: Spermadhesins family signature 2. 

CONSENSUS: C-x-K-E-x-[LIVM]-E-[LIVM]-x-[DE]-x(3)-[GS]-x(5)-K-x-C. 

NAME: Stress-induced proteins SRP1/TIP1 family signature. 
CONSENSUS: P-W-Y-[ST](2)-R-L. 

NAME: Glypicans signature. 

CONSENSUS: C-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2)-[FY]-C-x(2)-[LrVM]-x(2)-G-C. 
NAME: Syndecans signature. 

CONSENSUS: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S-Y. 
NAME: Tissue factor signature. 

CONSENSUS: W-K-x-K-C-x(2)-T-x-[DEN]-T-E-C-D-[LIVM]-T-D-E. 
NAME: Translationally controlled tumor protein signature 1. 

CONSENSUS: [TA]-G-[GAS]-N-[PA]-S-A-E-[GDE]-[PAGE]-x(0,l)-[DEG]-x-[DEN]-x(2)-[DE]. 
NAME: Translationally controlled tumor protein signature 2. 

CONSENSUS: [FL]-[FY]-[IVT]-G-E-x-[MA]-x(2,5)-[DEN]-[GAS]-x-[LV]-[AV]-x(3)-[FYl-[KR]- 
CONSENSUS: [DE]. 

NAME: Tub family signature 1 . 

CONSENSUS: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-Q. 
NAME: Tub family signature 2. 

CONSENSUS: A-F-[AG]-I-[SAC]-[LIVM]-[ST]-S-F-x-[GST]-K-x-A-C-E. 

NAME: HCP repeats signature. 
CONSENSUS: H-R-H-R-G-H-x(2)-[DE](7). 
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NAME: Bacterial ice-nucleation proteins octamer repeat. 
CONSENSUS: A-G-Y-G-S-T-x-T. 

NAME: Cell cycle proteins ftsW / rodA / spoVE signature. 

CONSENSUS: [NV]-x(5)-[GTR]-[LIVMA]-x-P-[PTLIVM]-x-G-[LiVM]-x(3)-[LIVMFW](2)-S-[YSA]- 
CONSENSUS: G-G-[STN]-[SA1. 

NAME: Enterobacterial virulence outer membrane protein signature 1 . 
CONSENSUS: G-[LIVMFY]-N-[LIVM]-K-Y-R-Y-E. 

NAME: Enterobacterial virulence outer membrane protein signature 2. 
CONSENSUS: [FYW]-x(2)-G-x-G-Y-[KR]-F> . 

NAME: Hydrogenases expression/synthesis hypA family signature. 

CONSENSUS: F-[CSA]-[FY]-[DE]-[LIVA](2)-x(3)-[ST]-[LIVMJ-x(16)-C-x(2)-C-x(12,15)- 
CONSENSUS: C-P-x-C. 

NAME: Hydrogenases expression/synthesis hupF/hypC family signature. 
CONSENSUS: <M-C-[LIV]-[GA]-[LIV]-P-x-[QKR]-[LIV]. 

NAME: Staphylocoagulase repeat signature. 

CONSENSUS: A-R-P-x(3)-K-x-S-x-T-N-A-Y-N-V-T-T-x(2)-[DN]-G-x(3)-Y-G. 

NAME: 1 1-S plant seed storage proteins signature. 

CONSENSUS: N-G-x-[DE](2)-x-[LIVMF]-C-[ST]-x(ll,12)-[PAG]-D. 

NAME: Dehydrins signature 1. 

CONSENSUS: S(5)-[DE]-x-[DE]-G-x(l,2)-G-x(0,l)-[KR](4). 

NAME: Dehydrins signature 2. 

CONSENSUS: [KR]-[LIM]-K-[DE]-K-[LIM]-P-G. 

NAME: Germin family signature. 

CONSENSUS : G-x(4)-H-x-H-P-x-A-x-E-[LIVM] . 

NAME: Oleosins signature. 

CONSENSUS: [AGJ-[STJ-x(2)-[AG]-x(2)-[LlVMJ-lSAD]-T-P-[LIVMFJ(4)-F-S-P-[LIVMJ(3)- 
CONSENSUS: P-A. 

NAME: Small hydrophilic plant seed proteins signature. 
CONSENSUS: G-[EOJ-T-V-V-P-G-G-T. 

NAME: Pathogenesis-related proteins Betvl family signature. 

CONSENSUS: G-x(2)-[LrVMF]-x(4)-E-x(2)-[CSTAEN]-x(8,9)-[GND]-G-[GS]-[CS]-x(2)-K-x(4)- 
CONSENSUS: [FY]. 

NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: [EQ]-G-x-V-Y-C-D-T-C-R. 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-[GF]-x-C-x-T-[GA]-D-C-x(l ,2)-G-x(2,3)-C- 
NAME: Mrp family signature. 

CONSENSUS: W-x(2)-[LrVM]-D-tLrVMY](4)-D-x-P-P-G-T-[GS]-D. 

NAME: Glucose inhibited division protein A family signature 1 . 
CONSENSUS: [GS]-P-x-Y-C-P-S-[LrVM]-E-x-K-[LIVM]-x-[KR]-F. 

NAME: Glucose inhibited division protein A family signature 2. 

CONSENSUS: A-G-Q-x-[NT]-G-x(2)-G-Y-x-E-[SAG](3)IQS]-G-lLrVM](2)-A-G-[LrVMT]-N-A. 
NAME: NOLl/NOP2/sun family signature. 

CONSENSUS: [FV]-D-[KRA]-[LIVMA]-L-x-D-[AV]-P-C-[ST]-[GAl. 
NAME: PET1 12 family signature. 

CONSENSUS: [DN]-x-[DN]-R-x(3)-P-L-[LrV]-E-[LiV]-x-[STJ-x-P. 
NAME: Protein smpB signature. 

CONSENSUS: [TA]-G-[LIVM]-x-L-x-G-x-E-[LlVM]-rKOJ-[SA]-[LrVM]. 
NAME: Hypothetical cof family signature 1 . 

CONSENSUS: tLIVFYAN]-[LiVMFA]-x(2)-D-[LIVMF]-[ND]-G-T-ILrV]-[LVYl-[STANLM]. 
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NAME: Hypothetical cof family signature 2. 

CONSENSUS: [LlVMFC]-G-D-[GSANOJ-x-N-D-x(3)-[LIMFY]-x(2)-[AV]-x(2)-[GSCP]-x(2)- 
CONSENSUS: [LMP]-x(2)-[GAS] . 

NAME: RIO1/ZK632.3/MJ0444 family signature. 
CONSENSUS: [LIVM]-V-H-[GA]-D-L-S-E-[FY]-N-x-[LIVM]. 

NAME: SUA5/yciO/yrdC family signature. 

CONSENSUS: [LrVMTA](3)-[LIVMFYC]-[PG]-T-[DE]-[STA]-x-[FY]-[GA]-[LlVM]-[GS]. 

NAME: Uncharacterized protein family UPFQ001 signature. 
CONSENSUS: [FW]-H-[FM]-[IV]-G-x-[LIV]-Q-x-[NKR]-K-x(3)-[LrV]. 

NAME: Uncharacterized protein family UPF0003 signature. 

CONSENSUS: G-x-V-x(2)-[LIV]-x(3)-[SA]-x(6)-D-x(3)-[LIVT](3)-P-N-x{2)-[LrVMF](2)- 
CONSENSUS: x(5)-N. 

NAME: Uncharacterized protein family UPF0004 signature. 

CONSENSUS: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-[FYJ-C-x-[UVM]-x(4)-G. 
NAME: Uncharacterized protein family UPF0005 signature. 

CONSENSUS: G-[LIVM](2)-[SA]-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAG]-x(4,6)- 
CONSENSUS: [LIVM](2)-x(2)-A-x(3)-T-A-[LIVM](2)-F. 

NAME: Uncharacterized protein family UPF0006 signature 1. 
CONSENSUS: [LIVMFY](2)-D-[STA]-H-x-H-[LIVMF]-[DN]. 

NAME: Uncharacterized protein family UPF0006 signature 2. 
CONSENSUS: P-[LIVM]-x-[LIVM]-H-x-R-x-[TA]-x-[DE] . 

NAME: Uncharacterized protein family UPF0006 signature 3. 

CONSENSUS: [LVSA]-[L[VA]-x(2)-[LIVM]-[PS]-x(3)-L-[LrVM]-[LIVMS]-E-T-D-x-P. 

NAME: Uncharacterized protein family UPF0007 signature. 
CONSENSUS: V-L-[IV]-H-D-[GA]-A-R. 

NAME: Uncharacterized protein family UPF001 1 signature. 
CONSENSUS: S-D-A-G-x-P-x-[L[V]-[SN]-D-P-G. 

NAME: Uncharacterized protein family UPF0012 signature. 
CONSENSUS: [GTA]-x(2)-[IVT]-C-Y-D-[LIVM]-x-F-P-x(9)-G. 

NAME: Uncharacterized protein family UPF0015 signature. 

CONSENSUS: [DE]-[LIVMF](3)-R-T-[SG]-G-x(2)-R-x-S-x-[FY]-[LIVM](2)-W-Q. 

NAME: Uncharacterized protein family UPF0016 signature. 
CONSENSUS: E-(LIVM]-G-D-K-T-F-[LIVMF](2)-A. 

NAME: Uncharacterized protein family UPF0017 signature. 

CONSENSUS: D-x(8)-[GN]-[LFY]-x(4)-[DET]-[LY]-Y-x(3)-[ST]-x(7)-[IV]-x(2)-[PS]-x- 
CONSENSUS: [LIVM]-x-[LIVM]-x(3)-[DN]-D. 

NAME: Uncharacterized protein family UPF0019 signature. 

CONSENSUS: L-P-V-[VT]-[NQL]-F-[AT]-A-G-G-[LIV]-A-T-P-A-D-A-A-ILM]. 

NAME: Uncharacterized protein family UPF0020 signature. 
CONSENSUS: D-P-[LIVMFl-C-G-[ST]-G-x(3)-[LI]-E. 

NAME: Uncharacterized protein family UPF0021 signature. 
CONSENSUS: C-K-x(2)-F-x(4)-E-x(22,23)-S-G-G-K-D. 

NAME: Uncharacterized protein family UPF0023 signature. 
CONSENSUS: D-x-D-E-[LIV>L-x(4)-V-F-x(3)-S-K-G. 

NAME: Uncharacterized protein family UPF0024 signature. 
CONSENSUS: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]. 

NAME: Uncharacterized protein family UPF0025 signature. 
CONSENSUS: D-V-[LIV]-x(2)-G-H-[ST]-H-x(12)-[LIVMF]-N-P-G. 

NAME: Uncharacterized protein family UPF0027 signature. 

CONSENSUS: Q-[LIVM]-x-N-x-A-x-[LrVM]-P-x-I-x(6)-rLrVM]-P-D-x-H-x-G-x-G-x(2)-[IV]-G. 
NAME: Uncharacterized protein family UPF0028 signature. 
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CONSENSUS: [GA]-[GS]-G-[GA]-A-R-G-x-[SA]-H-x-G-x(9)-[rV]-x-[IV]-D-x(2)-[GA]-G-)t-S- 
CONSENSUS: x-G. 

NAME: Uncharacterized protein family UPF0029 signature. 

CONSENSUS: G-x(2)-[LIVM](2)-x(2)-[LIVM]-x(4)-[LTVM]-x(5)-[UVM](2)-x-R-[FYW](2)-G- 
CONSENSUS: G-x(2)-[LIVM]-G. 

NAME: Uncharacterized protein family UPF0030 signature. 
CONSENSUS: [GA]-L-I-[LIV]-P-G-G-E-S-T-[STA], 

NAME: Uncharacterized protein family UPF0031 signature 1. 

CONSENSUS: [SAV]-[IVW]-[LVA]-[LIV]-G-[PNSl-G-L-[GP]-x-[DENQT]. 

NAME: Uncharacterized protein family UPF0031 signature 2. 
CONSENSUS: [GA]-G-x-G-D-[TV]-[LTJ-[STA|-G-x-[UVM]. 

NAME: Uncharacterized protein family UPF0O32 signature. 

CONSENSUS: Y-x(2)-F-[LIVMA](2)-x-L-x(4)-G-x(2) F-[EQ]-[LrVMFJ-P-[LIVM]. 

NAME: Uncharacterized protein family UPF0033 signature. 
CONSENSUS: L-[DN]-x(2)-[TAG]-x(2)-C-P-x-P-x-[LIVM]. 

NAME: Uncharacterized protein family UPF0034 signature. 

CONSENSUS: [LlVM]-[DNG]-[LIVM]-N-x-G-C-P-x(3)-[LIVMASQ]-x(5)-G-[SAC]. 

NAME: Uncharacterized protein family UPF0035 signature. 
CONSENSUS: L-L-T-x-R-[SA]-x(3)-R-x(3)-G-x(3)-F-P-G-G. 

NAME: Uncharacterized protein family UPF0036 signature. 

CONSENSUS: H-x-S-G-H-[GA]-x(3)-[DE]-x<3)-[LM]-x(5)-P-x(3)-[LIVM]-P-x-H-G-[DE]. 

NAME: Uncharacterized protein family UPF0038 signature. 
CONSENSUS: G-x-[LI]-x-R-x(2)-L-x(4)-F-x(8)-[LrV]-x(5)-P-x-[LIV]. 

NAME: Uncharacterized protein family UPF0044 signature. 

CONSENSUS: L-[ST]-x(3)-K-x(3)-[KR]-[SGA]-x-[GA]-H-x-L-x-P-[LrV]-x(2)-[LrV]-[GA]- 
CONSENSUS: x(2)-G. 

NAME: Uncharacterized protein family UPF0047 signature. 
CONSENSUS: S-X(2)-[LIV]-x-[LrV]-x(2)-G-x(4)-G-T-W-Q-x-[LIV]. 

NAME: Uncharacterized protein family UPF0054 signature. 
CONSENSUS: H-[GS]-x-L-H-L-[LI]-G-[FYW]-D-H. 

NAME: Uncharacterized protein family UPF0057 signature. 

CONSENSUS: [LrVl-x-[STA]-[LIVF](3)-P-P-[LIVAl-[GA]-[IV]-x(4)-[GKN]. 

NAME: Hypothetical YER057c/yjjV family signature. 

CONSENSUS: ■ P-[AT]-R-[SA]-x-[LrVMY]-x(2)-[AKl-x-L-P-x(4)-[LIVM]-E. 

NAME: Hypothetical hesB/yadR/yfhF family signature. 

CONSENSUS: F-x-[LrVMFY]-x-N-[PG]-[NSK]-x(4)-C-x-C-[GS]-x-S-F. 

NAME: Hypothetical yabO/yceC/sfhB family signature. 

CONSENSUS: [NHY]-R-[LI]-D-x(2)-T-[STJ-G-tLlVMA]-[LIVMF](2)-[LIVMFG]-[SGAC]. 
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We claim : 

1. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2 16112; hfbr2_16k22; hfbr2 16112; hfbr2_22f21; hfbr2 22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; 
hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hlO; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; 
hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hfbrlJO; hfbr2_82ml6;; hfbrl_10; 
hfbr2_82m6;; hfbrl_10; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; 
hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; 
hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; 
hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl_lall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
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htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel 1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

2. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; 
hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; 
hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; hfbrl_10e4; hfbr2_82gl4; 
hfbrl_10gl4; hfbr2_82il7; hfbrl lO; hfbr2_82i24; hfbrl lO; hfbr2_82ml6; hfbrl lO; 
hfbr2_82m6; hfbrl lO; their complements; and variants thereof. 

3. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16f21; hfbr2_16k22; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23f2; ; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; hibr2_2gl8; hfbr2_2hl; 
hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; hfbr2_64al 1; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; 
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hfbr2_72dl3; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; 
hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; and hfbrl_10. 

4. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24al5; 
hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; 
hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; their complements; and 
variants thereof. 

5. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; their complements; and 
variants thereof. 

6. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lall; hmcfl_lc23; 
hmcfl_lel5; hmcfl_lgl3; their complements; and variants thereof. 

7. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcfl_lgl3; their 
complements; and variants thereof. 

8. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hhtes3_ln3; htes3_14g5; 
htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; htes3_15c6; 
htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21U6; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
htes3_2m20; htes3_2n9; htes3_2ol3; htes3 30f4; Htes3 35b4; htes3 35b5; htes3_35e21; 
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htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; 
Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3 8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

9. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: htes3_14g5; htes3_14pl4; 
htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; htes3_17n0; Htes3_18f3; 
htes3_19fl9; htes3_19jl7; htes3_20c21; htes3_21n23; htes3_22c23; htes3_22nl3; 
Htes3_23nl9; htes3_27ol4; htes3_28dl4; htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; 
htes3_2hl5; htes3_2119; htes3_2m20; htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; 
htes3_35pl7; htes3_4b4; htes3_4fl7; htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; 
htes3_6b21; htes3_6dl6; htes3_72kl 1; htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

10. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16gl8; hfbr2_2kl4; 
Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; hutel_20mll; their complements; and 
variants thereof. 

11. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_2b5; 
htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; hutel_19g22; hutel_24j6; 
their complements; and variants thereof. 

12. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_2dl5; htes3_35e21; 
hutel_2h3; their complements; and variants thereof. 

13. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23124; hfbr2_2il7; 
hfbr2_41ml5; hfbr2_62fl0; hfbr2_62U9; hfbr2_64jl8; 
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hflcd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; their 
complements; and variants thereof. 

14. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_3g8; hfbr2_62ol7; 
hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd2_3ol7; hfkd2_46j20; htes3_17117; 
htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; htes3_35kl6; htes3 35nl2; 
htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; their complements; and variants 
thereof. 

15. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23bl0; hfbr2_3cl8; 
hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2_72112; hfbr2_82i24(hfbrl_10); 
htes3_14h21; Htes3 15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; htes3_7p9; 
htes3_8ml0; hutel_18U; their complements; and variants thereof. 

16. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23b21; hfbr2_23nl6; 
hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 (hfbrl_10e4); hfbr2_82il7 
(hfbrl_10); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3_15kll; htes3_lcl; hhtes3_ln3; 
htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; htes3_8e24; hutel_20g21; 
hutel_22d2; hutel_22el2; their complements; and variants thereof. 

17. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16il2; hfbr2_16112; 
hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; hfbr2_82c20 (hfbrl_10c20); 
hfbr2_82el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl 10gl4); hfkd2_24al5; hfkd2_3il3; 
hfkd2_4mll; hmcfl_lall; hmcfl_lel5; htes3_15c6; htes3_2ol3; htes3_27k4; htes3_2hl; 
htes3_35k24; hutel_19fl9; and hutel_24cl9; their complements; and variants thereof. 

18. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_46kl9; hfkd2_47a4; 
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htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; hutel_li2; their complements; and 
variants thereof. 

19. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; hutel_19gl9; hutel_19g22; 
hutel_19hl7; hutel_19jll; hutel_li2; hutel_20bl9; hutel_20g21; hutel 20hl3; 
hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; hutel_22el2; hutel_22n2; 
hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; hutel_24ell; hutel_24j6; 
hutel_2h3; their complements; and variants thereof. 

20. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18i4; hutel_19gl9; hutel_19jl 1; hutel_22n2; hutel_21dl5; hutel_22o2; 
hutel_23gll; their complements; and variants thereof. 

21. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2 2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hlO; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62U9; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; 
hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hfbrl_10; 
hfbr2 82ml6;; hfbrl lO; hfbr2_82m6;; bibrl lO; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; 
hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7 ; hfkd2_46a6; 
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hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; 
hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfllall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

22. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16U2; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hlO; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
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hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; 
hfbrl_10e4; hfbr2_82gl4; hfbrl_10gl4; hfbr2_82il7; hfbrllO; hfbr2_82i24; hfbrl_10; 
hfbr2_82ml6; hfbrllO; hfbr2_82m6; hfbrl 10; complements of the nucleic acid 
sequences; and variants thereof. 

23. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16£21; hfbr2_16k22; hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23f2; ; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2k19; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; 
hfbr2_64al 1; hfbr2_64cl6; hfbr2 64c4; hfbr2 64h6; hfbr2_64i20; hfbr2_64k24; 
hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2 71o20; hfbr2J72dl3; hfbr2_72ml6; 
hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; 
hfbrl lO; complements of the nucleic acid sequences; and variants thereof. 

24. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; complements of the nucleic acid sequences; and variants thereof. 

25. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: hfkd2_lj9; 
hfkd2_24e23; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; 
complements of the nucleic acid sequences; and variants thereof. 

26. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hmcfl_lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; complements of the nucleic acid 
sequences; and variants thereof. 

27. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hmcfl_lc23; hmcfl_lgl3; complements of the nucleic acid sequences; and variants thereof. 

28. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; 
Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; 
htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; 
htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; 
htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; 
htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; 
htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; 
htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; 
htes3_2119; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; 
htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; 
htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; 
htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; 
htes3_6dl6; htes3_72kll; Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; 
htes3_7j8; htes3_7pl0; htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; complements of the nucleic acid 
sequences; and variants thereof. 

29. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: htes3_14g5; 
htes3_14pl4; htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; 
htes3_17fl0; htes3_17nl8; Htes3_18f3; htes3_19fl9; htes3_19jl7; htes3_20c21; 
htes3_21n23; htes3_22c23; htes3_22nl3; Htes3_23nl9; htes3_27ol4; htes3_28dl4; 
htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; htes3_2hl5; htes3_2119; htes3_2m20; 
htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; htes3_35pl7; htes3_4b4; htes3_4fl7; 
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htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; htes3_6b21; htes3_6dl6; htes3_72kl 1; 
htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; complements of the nucleic acid sequences; and variants thereof. 

30. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16gl8; hfbr2_2kl4; Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; 
hutel_20mll; complements of the nucleic acid sequences; and variants thereof. 

31. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_2b5; htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3_7b22; 
hutel_19g22; hutel_24j6; complements of the nucleic acid sequences; and variants thereof. 

32. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_2dl5; htes3_35e21; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 

33. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23124; hfbr2_2il7; hfbr2_41ml5; hfbr2_62fl0; hfbr2_62U9; hfbr2 64jl8; 
hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; 
complements of the nucleic acid sequences; and variants thereof. 

34. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_3g8; hfbr2_62ol7; hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd2_3ol7; 
hfkd2_46j20; htes3_17117; Htes3_17nl8; htes3_27dl; htes3 2al7; htes3_35b5; 
htes3_35kl6; htes3_35nl2; htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; 
complements of the nucleic acid sequences; and variants thereof. 

35. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hfbr2_23M0; hfbr2_3cl8; hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2_72112; 
hfbr2_82i24(hfbrl_10)i_htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; 
htes3_7p9; htes3_8ml0; hutel_1811; complements of the nucleic acid sequences; and 
variants thereof. 

36. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23b21; hfbr2_23nl6; hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 
(hfbrl_10e4); hfbr2_82il7 (hfbrl_10); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3_15kll; 
htes3_lcl; hhtes3_ln3; htes3_20k2; htes3_21d4; htes3 23nl9; htes3_4f5; htes3_6cll; 
htes3_8e24; hutel_20g21; hutel_22d2; hutel_22el2; complements of the nucleic acid 
sequences; and variants thereof. 

37. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16il2; hfbr2_16112; hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; 
hfbr2_82c20 (hfbrl_10c20);.hfbr2_82el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); 
hfkd2_24al5; hfkd2_3il3; hfkd2_4mll; hmcfl lall; hmcfl_lel5; htes3_15c6; 
htes3_2ol3; htes3_27k4; htes3_2hl; htes3_35k24; hutel_19fl9; and hutel_24cl9; 
complements of the nucleic acid sequences; and variants thereof. 

38. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; hfkd2_47a4; htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; 
hutel_li2; complements of the nucleic acid sequences; and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; 
hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel li2; hutel_20bl9; 
hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; 
hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; 
hutel_24ell; hutel_24j6; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 
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40. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18i4; hutel_19gl9; hutel_19jl 1; hutel_22n2; 
hutel_21dl5; hutel_22o2; hutel_23gll; complements of the nucleic acid sequences; and 
variants thereof. 

41 . A nucleic acid molecule having the sequence of a clone selected from the 
group consisting of hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; 
hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; 
hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; 
hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; 
hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; 
hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; 
hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; 
hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; 
hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; 
hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; 
hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; 
hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; 
hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; 
hfbrl lO; hfbr2_82i24;; hfbrl lO; hfbr2_82ml6;; hfbrl lO; hfbr2_82m6;; hfbrl lO; 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; hmcfl lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; hhtes3_ln3; 
htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; 
htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
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htes3_2m20; htes3 2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; 
Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7plO; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; hutel 17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

42. A polypeptide encoded by the nucleic acid molecule according to claim 41. 

43. An antibody or fragment thereof that is capable of binding to a specific portion 
of the peptide according to claim 42. 

44. A pharmaceutical composition, comprising (a) an effective amount of a 
pharmaceutical agent, wherein said pharmaceutical agent is selected from the group consisting 
of the polypeptide according to claim 42, variants or functional derivatives thereof, and 
antibodies thereto; and (2) a physiologically acceptable carrier or excipient. 

45. An expression vector comprising the nucleic acid molecule of claim 41 or a 
fragment thereof, and optionally a promoter operably linked to said nucleic acid molecule or 
said fragment. 

46. A method for recombinantly producing a desired peptide, comprising expressing 
in a host cell a peptide encoded by the nucleic acid molecule according to claim 4 1 . 
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